Papers
Topics
Authors
Recent
Search
2000 character limit reached

Compiling Deterministic Structure into SLM Harnesses

Published 19 Apr 2026 in cs.AI | (2604.17450v1)

Abstract: Enterprise deployment of small LLMs (SLMs) is constrained by epistemic asymmetry: SLMs cannot self-correct reasoning errors, while frontier LLMs are prohibitively costly and face data sovereignty limits for high-volume use. We propose Semantic Gradient Descent (SGDe), a teacher-student framework that compiles agentic workflows into discrete execution plans comprising DAG topologies, system prompts, and deterministic executable code. The trailing "e" distinguishes SGDe from stochastic gradient descent. SGDe operates in a discrete semantic space where a frontier teacher generates natural-language critiques acting as directional gradients to iteratively refine the SLM's workflow artefacts. We formalise SGDe within a PAC learning framework, establishing sample-complexity bounds that enable convergence with as few as three training examples on targeted synthetic tasks by leveraging the teacher as a statistical prior. On a GSM-Hard-derived test set built via adversarial synthesis, compiled workflows reach 91.3% accuracy at m=5 and 99.3% at m=3 within the small-m regime motivated by Corollary 1, a +26.3% to +34.3% absolute improvement over state-of-the-art prompt optimisers. In the emerging paradigm of harness engineering, SGDe treats placement of deterministic code (which subtasks to delegate to a Python runtime versus retain as LLM calls) as a trace-driven, per-node optimisation target, generalising the whole-problem offloading of PAL and PoT. The teacher compiles two complementary deterministic structures: capability offloading, which delegates subtasks to Python when the SLM cannot execute them reliably, and structural consensus, which wraps variance-limited reasoning steps in fan-out/fan-in subgraphs aggregated by deterministic voting.

Summary

  • The paper introduces SGDe, a teacher-student framework that compiles deterministic SLM harnesses through per-node, trace-driven workflow optimization.
  • It demonstrates significant empirical gains, achieving up to 99.3% accuracy on minimal training samples through prompt refinement, code offloading, and structural consensus.
  • The framework leverages PAC learning guarantees to decouple optimization from inference, effectively eliminating epistemic asymmetry for reliable enterprise deployment.

Semantic Gradient Descent for Agentic Harness Compilation in SLMs

Introduction

"Compiling Deterministic Structure into SLM Harnesses" (2604.17450) addresses fundamental challenges in deploying small LLMs (SLMs) for enterprise tasks where epistemic asymmetry and cost constraints preclude effective use of frontier LLMs. The paper introduces Semantic Gradient Descent (SGDe), a teacher-student framework focused on agentic harness engineering—shifting the locus of optimisation from continuous weight tuning to discrete workflow compilation. Unlike standard distillation or prompt optimisation, SGDe enables per-node selection between LLM calls, deterministic code, and structured voting ensembles, yielding high-fidelity, low-cost agentic workflows with strong theoretical convergence guarantees.

Motivation and Limitations of Prior Work

Enterprise deployment often prioritises narrow, highly predictable domains where large LLMs provide no cost-effective advantage over domain-specialised SLMs. However, existing approaches for leveraging SLMs via knowledge distillation or prompt optimisation fail to address the discrete structural orchestration fundamental to agentic reliability. These approaches are constrained by two bottlenecks:

  1. Epistemic asymmetry: SLMs lack the capacity to reliably self-detect and correct their own reasoning failures (Zhang et al., 2024, Huang et al., 2023).
  2. Intra-substrate optimisation: Existing frameworks (e.g., DSPy (Khattab et al., 2023), OPRO (Yang et al., 2023), AFlow [z5uVAKwmjf]) optimise prompt text or DAG wiring but hold the substrate assignment—the division between LLM calls and deterministic code—fixed. Program-aided approaches like PAL (Wang et al., 2023) and PoT (Chen et al., 2023) globally offload specific tasks as code, but do not flexibly adapt this boundary at runtime.

SGDe generalises substrate assignment to a per-node, trace-driven optimisation target, mediating both harness structure and execution partitioning using the teacher LLM as an active, offline compiler.

SGDe Framework: Architecture and Algorithm

SGDe operationalises harness engineering via a two-agent system:

  • Student (S\mathcal{S}): An SLM constrained to execute an offline-compiled workflow comprising a DAG topology (G\mathcal{G}), node-specific prompts (P\mathcal{P}), and deterministic code modules (C\mathcal{C}). The student never participates in its own optimisation.
  • Teacher (T\mathcal{T}): A high-capacity LLM that, given execution traces, critiques workflow failures (producing a "semantic gradient" gsemg_{\text{sem}}) and synthesises updated harness plans through discrete actions—prompt refinement, capability offloading, and structural consensus insertion.

Compilation Actions

At each DAG node, T\mathcal{T} chooses among:

  1. Prompt refinement: Node remains an LLM call with improved prompt text.
  2. Capability offloading: Node is replaced by deterministic Python code if the subtask is reliably executable as such.
  3. Structural consensus: Node is replicated in fan-out/fan-in subgraphs, wrapping multiple prompt-perturbed SLM calls with deterministic voting aggregation to reduce output variance.

These actions are mutually exclusive per-node and exhaustively partition the substrate selection space for each sub-computation.

Granularity Control

SGDe supports dynamic decomposition and fusion within the workflow graph, adapting subtask boundaries based on observed cognitive failures (e.g., context length limitations or redundant computation).

Optimisation Mechanism

The teacher applies a greedy hill-climbing search: iteratively critiquing and recompiling candidate execution plans, only accepting updates that reduce empirical semantic risk R^\hat{R} on a held-out task batch. This process is grounded in discrete, rather than continuous, optimisation; no metric gradient exists, and candidate plan generation mirrors language-model-powered mutation rather than projection.

Theoretical Analysis

PAC Learning Guarantees

SGDe directly formalises workflow compilation in a Probably Approximately Correct (PAC) framework. By bounding the hypothesis class not by the unmanageably large textual/code space, but by the combinatorics of subtask substrate assignment, the authors derive:

∣ΘT∣=O(3k)|\Theta_\mathcal{T}| = O(3^k)

where kk is the number of distinct subtask types. The sample complexity bound thus becomes:

G\mathcal{G}0

This reflects high data efficiency, as G\mathcal{G}1 is typically much smaller than the raw parameter or prompt/code space, and the teacher’s strong statistical prior rapidly excludes unrealisable or spurious plans.

Epistemic Asymmetry Elimination

By decoupling optimisation from inference and employing deterministic offloading, SGDe fully eliminates epistemic asymmetry during execution—SLMs are never required to self-diagnose or self-correct.

Empirical Results

The authors validate SGDe on an adversarially synthesised subset of GSM-Hard, focusing on cases where SLMs predictably fail. Using Qwen-2.5-1.5B as the SLM and Kimi 2.5 as the teacher, they report:

  • At G\mathcal{G}2 training samples: SGDe achieves G\mathcal{G}3 accuracy (range: G\mathcal{G}4), a +26.3% absolute gain over DSPy and a +43% gain over baseline zero-shot SLM.
  • At G\mathcal{G}5 training samples: Accuracy peaks at G\mathcal{G}6 (small-G\mathcal{G}7 regime), a +34.3% gain over DSPy.
  • Both code offloading and structural consensus are required for top performance. Neither mechanism alone yields full gains.

Importantly, sample efficiency matches the theoretical PAC bound, and the small required G\mathcal{G}8 values make practical, low-cost deployment feasible.

When the training set is drawn from divergent task families (structural heterogeneity), performance degrades due to topological overfitting, consistent with the theoretical limits of the PAC assumption.

Practical and Theoretical Implications

SGDe demonstrates that offline harness compilation—where node-level substrate selection is driven by per-instance empirical failure—enables SLM deployment in cost- and governance-constrained settings without sacrificing correctness. The teacher’s role as a structural prior effects learning-theoretic compression, concentrating the workflow search on a tractable, well-characterised hypothesis space. This is distinct from simply optimising prompt text or static program-aided prompting, as SGDe dynamically partitions agentic tasks per observed capability and variance.

At a theoretical level, the reframing of workflow optimisation as a PAC-bounded search over discrete plans establishes a foundation for agentic harness engineering that admits formal convergence analysis and practical robustness certifiability.

Future Directions

The paper identifies topological overfitting as an inherent limitation in structurally heterogeneous task regimes. To address this, the authors propose mixture-of-topologies: orchestrating an initial routing stage to direct tasks into optimally compartmentalised, per-domain harnesses rather than forcing a monolithic execution plan. Recent developments in mixture-of-experts and routing for agentic systems (Ong et al., 2024, Wang et al., 2024) are directly relevant. Further theoretical refinement of the PAC analysis to support multi-topology, multi-domain deployments is a promising next step.

Conclusion

"Compiling Deterministic Structure into SLM Harnesses" introduces SGDe, a framework that operationalises agentic workflow compilation as a discrete, teacher-driven optimisation process. By unifying prompt, code, and ensemble selection within a single learning-theoretic framework, the approach yields SLM harnesses that deliver near-frontier accuracy at enterprise-viable costs and data scales. The results establish substrate compilation as a decisive innovation for certifiable agentic systems and provide a concrete foundation for extending SLM deployment into more complex, heterogeneous workflows.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.