- The paper introduces SGDe, a teacher-student framework that compiles deterministic SLM harnesses through per-node, trace-driven workflow optimization.
- It demonstrates significant empirical gains, achieving up to 99.3% accuracy on minimal training samples through prompt refinement, code offloading, and structural consensus.
- The framework leverages PAC learning guarantees to decouple optimization from inference, effectively eliminating epistemic asymmetry for reliable enterprise deployment.
Semantic Gradient Descent for Agentic Harness Compilation in SLMs
Introduction
"Compiling Deterministic Structure into SLM Harnesses" (2604.17450) addresses fundamental challenges in deploying small LLMs (SLMs) for enterprise tasks where epistemic asymmetry and cost constraints preclude effective use of frontier LLMs. The paper introduces Semantic Gradient Descent (SGDe), a teacher-student framework focused on agentic harness engineering—shifting the locus of optimisation from continuous weight tuning to discrete workflow compilation. Unlike standard distillation or prompt optimisation, SGDe enables per-node selection between LLM calls, deterministic code, and structured voting ensembles, yielding high-fidelity, low-cost agentic workflows with strong theoretical convergence guarantees.
Motivation and Limitations of Prior Work
Enterprise deployment often prioritises narrow, highly predictable domains where large LLMs provide no cost-effective advantage over domain-specialised SLMs. However, existing approaches for leveraging SLMs via knowledge distillation or prompt optimisation fail to address the discrete structural orchestration fundamental to agentic reliability. These approaches are constrained by two bottlenecks:
- Epistemic asymmetry: SLMs lack the capacity to reliably self-detect and correct their own reasoning failures (Zhang et al., 2024, Huang et al., 2023).
- Intra-substrate optimisation: Existing frameworks (e.g., DSPy (Khattab et al., 2023), OPRO (Yang et al., 2023), AFlow [z5uVAKwmjf]) optimise prompt text or DAG wiring but hold the substrate assignment—the division between LLM calls and deterministic code—fixed. Program-aided approaches like PAL (Wang et al., 2023) and PoT (Chen et al., 2023) globally offload specific tasks as code, but do not flexibly adapt this boundary at runtime.
SGDe generalises substrate assignment to a per-node, trace-driven optimisation target, mediating both harness structure and execution partitioning using the teacher LLM as an active, offline compiler.
SGDe Framework: Architecture and Algorithm
SGDe operationalises harness engineering via a two-agent system:
- Student (S): An SLM constrained to execute an offline-compiled workflow comprising a DAG topology (G), node-specific prompts (P), and deterministic code modules (C). The student never participates in its own optimisation.
- Teacher (T): A high-capacity LLM that, given execution traces, critiques workflow failures (producing a "semantic gradient" gsem​) and synthesises updated harness plans through discrete actions—prompt refinement, capability offloading, and structural consensus insertion.
Compilation Actions
At each DAG node, T chooses among:
- Prompt refinement: Node remains an LLM call with improved prompt text.
- Capability offloading: Node is replaced by deterministic Python code if the subtask is reliably executable as such.
- Structural consensus: Node is replicated in fan-out/fan-in subgraphs, wrapping multiple prompt-perturbed SLM calls with deterministic voting aggregation to reduce output variance.
These actions are mutually exclusive per-node and exhaustively partition the substrate selection space for each sub-computation.
Granularity Control
SGDe supports dynamic decomposition and fusion within the workflow graph, adapting subtask boundaries based on observed cognitive failures (e.g., context length limitations or redundant computation).
Optimisation Mechanism
The teacher applies a greedy hill-climbing search: iteratively critiquing and recompiling candidate execution plans, only accepting updates that reduce empirical semantic risk R^ on a held-out task batch. This process is grounded in discrete, rather than continuous, optimisation; no metric gradient exists, and candidate plan generation mirrors language-model-powered mutation rather than projection.
Theoretical Analysis
PAC Learning Guarantees
SGDe directly formalises workflow compilation in a Probably Approximately Correct (PAC) framework. By bounding the hypothesis class not by the unmanageably large textual/code space, but by the combinatorics of subtask substrate assignment, the authors derive:
∣ΘT​∣=O(3k)
where k is the number of distinct subtask types. The sample complexity bound thus becomes:
G0
This reflects high data efficiency, as G1 is typically much smaller than the raw parameter or prompt/code space, and the teacher’s strong statistical prior rapidly excludes unrealisable or spurious plans.
Epistemic Asymmetry Elimination
By decoupling optimisation from inference and employing deterministic offloading, SGDe fully eliminates epistemic asymmetry during execution—SLMs are never required to self-diagnose or self-correct.
Empirical Results
The authors validate SGDe on an adversarially synthesised subset of GSM-Hard, focusing on cases where SLMs predictably fail. Using Qwen-2.5-1.5B as the SLM and Kimi 2.5 as the teacher, they report:
- At G2 training samples: SGDe achieves G3 accuracy (range: G4), a +26.3% absolute gain over DSPy and a +43% gain over baseline zero-shot SLM.
- At G5 training samples: Accuracy peaks at G6 (small-G7 regime), a +34.3% gain over DSPy.
- Both code offloading and structural consensus are required for top performance. Neither mechanism alone yields full gains.
Importantly, sample efficiency matches the theoretical PAC bound, and the small required G8 values make practical, low-cost deployment feasible.
When the training set is drawn from divergent task families (structural heterogeneity), performance degrades due to topological overfitting, consistent with the theoretical limits of the PAC assumption.
Practical and Theoretical Implications
SGDe demonstrates that offline harness compilation—where node-level substrate selection is driven by per-instance empirical failure—enables SLM deployment in cost- and governance-constrained settings without sacrificing correctness. The teacher’s role as a structural prior effects learning-theoretic compression, concentrating the workflow search on a tractable, well-characterised hypothesis space. This is distinct from simply optimising prompt text or static program-aided prompting, as SGDe dynamically partitions agentic tasks per observed capability and variance.
At a theoretical level, the reframing of workflow optimisation as a PAC-bounded search over discrete plans establishes a foundation for agentic harness engineering that admits formal convergence analysis and practical robustness certifiability.
Future Directions
The paper identifies topological overfitting as an inherent limitation in structurally heterogeneous task regimes. To address this, the authors propose mixture-of-topologies: orchestrating an initial routing stage to direct tasks into optimally compartmentalised, per-domain harnesses rather than forcing a monolithic execution plan. Recent developments in mixture-of-experts and routing for agentic systems (Ong et al., 2024, Wang et al., 2024) are directly relevant. Further theoretical refinement of the PAC analysis to support multi-topology, multi-domain deployments is a promising next step.
Conclusion
"Compiling Deterministic Structure into SLM Harnesses" introduces SGDe, a framework that operationalises agentic workflow compilation as a discrete, teacher-driven optimisation process. By unifying prompt, code, and ensemble selection within a single learning-theoretic framework, the approach yields SLM harnesses that deliver near-frontier accuracy at enterprise-viable costs and data scales. The results establish substrate compilation as a decisive innovation for certifiable agentic systems and provide a concrete foundation for extending SLM deployment into more complex, heterogeneous workflows.