- The paper introduces a hybrid compiler-runtime framework that compiles raw skill packages into minimal, boundary-guided contracts, reducing redundant context injection.
- It achieves over 50% reduction in token usage, reasoning iterations, and solve time, with cross-model artifact reuse maintaining or improving task success rates.
- The approach enables modular and scalable agent design by separating offline skill compilation from runtime execution, optimizing resource use and lowering costs.
SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces
Introduction and Motivation
The increasing adoption of skill-centric architectures in LLM-based agents has significantly improved task generalization, modularity, and developer productivity by enabling agent systems to leverage reusable, domain-specific skill packages. However, the canonical execution paradigmโloading entire skill packages into the agent's context and re-interpreting instructions in the ReAct-style reasoning loopโgenerates substantial runtime redundancy, manifesting in two core inefficiencies: the injection of irrelevant contextual data and the necessity for repeated, near-identical skill-specific reasoning across task invocations.
Figure 1: Two core sources of redundancy in skills-enabled execution: unnecessary context injection and repeated online interpretation of skill packages.
Analysis of agent execution traces on SkillsBench tasks demonstrates that, on average, over half of injected skill tokens are irrelevant at runtime, and reasoning traces for similar uses of the same skill exhibit high similarity. This evidences considerable waste in agent computation and resource utilization, motivating a systems-oriented rethinking of skill execution pathways.
Method: The SkillSmith Boundary-First Compilation Framework
SkillSmith introduces a hybrid compiler-runtime abstraction that shifts the bulk of skill interpretation offline by compiling skill packages into minimal, explicit runtime boundary contracts. These contracts formalize the operational interface exposed by a skill, including callable operators, input/output schemas, policy guards, validation evidence, and lossless fallback paths. Rather than a monolithic IR, SkillSmith targets heterogeneous skill package structuresโcovering ordered workflows, multi-call dispatchers, and reference-centric assetsโby employing source-shape classification as part of the compilation pipeline.
Figure 2: End-to-end SkillSmith pipeline, transforming raw skill packages into boundary contracts consumed by runtime agents for guarded and partial execution.
At runtime, the agent is presented only with compact, boundary-restricted skill handles; further operational detail and dispatch are gated by agent queries, policy constraints, and dynamic selectionโsubstantially reducing context size and deliberation overhead compared to naive skill injection.
The runtime interprets boundary contracts as guarded state machines where agent invocations may result in blocked, guidance, or execute outcomes. This mechanism enforces partial execution as first-class, ensures step-level provenance traceability, and defers to original skill assets only when policy or contract cannot provide direct execution.
Figure 4: Runtime interpretation of boundary contracts as guarded state machines, supporting selective operator execution, policy blocking, and fallback.
Experimental Evaluation
Quantitative Results
SkillSmith is evaluated on seven skill-centric tasks from SkillsBench using production- and research-grade harnesses and multiple LLMs (including GPT-5.5, Claude Opus 4.7, DeepSeek V4 Flash, Qwen3.6 35B). The main findings are:
Stability Across Models and Harnesses
SkillSmith demonstrates stable benefits independent of model scaling and agent harness. When compiled artifacts are produced using a strong LLM and reused by medium or small models, task correctness is preserved or improved while retaining substantial runtime efficiency gains. The results generalize across proprietary, open-source, and internal agent harnesses, with token, time, and iteration reductions holding in each configuration. Notably, cross-LLM artifact reuse enables downstream agents to leverage the distilled skill boundaries of stronger models.
Figure 5: Cross-model runtime benefit and correctness preservation: SkillSmith artifacts compiled with Claude Opus 4.7 yield both efficiency and improved task success on weaker models.
Compilation Overhead and Limitations
SkillSmith incurs a one-time compilation cost (average 3,104 tokens and 13.22s per reusable artifact), which is amortized over repeated usesโempirically, runtime savings outpace compile cost after the first few invocations per skill.
The primary limitation is the reliance on extractable, statically analysable structures in the source skills; dynamic, incomplete, or highly contextual skills may resist effective compilation, reverting the system to fallback execution. Additionally, artifact correctness and validity are bounded by the fidelity of the original skill, environmental assumptions, and reproducibility of the compile-time context.
Implications and Future Directions
SkillSmith defines a new abstraction boundary for skill execution in LLM-agent architectures, moving from free-form, fully online skill interpretation to a hybrid model where static skill contracts mediate runtime context and execution. This provides practical benefits in agent efficiency, cost, and modularity, with broader systems implications for provenance, auditability, and artifact reusability. The ability to transfer skill artifacts across model and infrastructure boundaries introduces a decoupling between skill authoring/curation and agent execution, suggesting a path toward scalable, model-agnostic agent programming.
Future developments could target formalizing contract validation, integrating dynamic skill adaptation to accommodate runtime environment drift, and extending the compiler to richer skill package formats (e.g., parameterized or compositional skills). The paradigm may be further generalized for broader classes of agent programs, supporting safe partial execution and stateful composition beyond skills.
Conclusion
SkillSmith offers a compiler-runtime framework that aggressively reduces the inefficiencies of context-heavy, repeated skill interpretation in agent systems. By compiling selectable portions of skill packages into minimal runtime interfaces and enforcing guarded, partial execution, the framework achieves substantial reductions in token usage, latency, and operational cost, while preserving or improving task correctness across models and harnesses. The introduction of formal boundary contracts opens new avenues for systems-level optimizations and agent programming methodology, with direct implications for the efficiency, reliability, and scalability of next-generation LLM-based agents (2605.15215).