SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces

Published 12 May 2026 in cs.AI and cs.SE | (2605.15215v1)

Abstract: Recently, skills have been widely adopted in LLM-based agent systems across various domains. In existing frameworks, skills are typically injected into the agent reasoning loop as contextual guidance once matched to a runtime task, enabling specialized task-solving capabilities. We find that this execution paradigm introduces two major sources of redundancy: irrelevant context injection and repeated skill-specific reasoning and planning. To this end, we propose SkillSmith, a boundary-first compiler-runtime framework that compiles skill packages offline into minimal executable interfaces. By extracting fine-grained operational boundaries from skills, SkillSmith enables agents to dynamically access and execute only the relevant components at runtime, thereby minimizing unnecessary context injection and redundant reasoning overhead. In the evaluation on SkillsBench benchmark, SkillSmith reduces solve-stage token usage by 57.44%, thinking iterations by 42.99%, solve time by 50.57% (2.02x faster), and token-proportional monetary cost by 57.44% compared with using raw-skills. Moreover, compiled artifacts produced by a stronger model can be reused by a smaller or more efficient runtime model, improving task accuracy in cases where raw skill interpretation fails. The source code and data are available at https://github.com/AetherHeart-AI/Aeloon.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper introduces a hybrid compiler-runtime framework that compiles raw skill packages into minimal, boundary-guided contracts, reducing redundant context injection.
It achieves over 50% reduction in token usage, reasoning iterations, and solve time, with cross-model artifact reuse maintaining or improving task success rates.
The approach enables modular and scalable agent design by separating offline skill compilation from runtime execution, optimizing resource use and lowering costs.

SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces

Introduction and Motivation

The increasing adoption of skill-centric architectures in LLM-based agents has significantly improved task generalization, modularity, and developer productivity by enabling agent systems to leverage reusable, domain-specific skill packages. However, the canonical execution paradigm—loading entire skill packages into the agent's context and re-interpreting instructions in the ReAct-style reasoning loop—generates substantial runtime redundancy, manifesting in two core inefficiencies: the injection of irrelevant contextual data and the necessity for repeated, near-identical skill-specific reasoning across task invocations.

Figure 1: Two core sources of redundancy in skills-enabled execution: unnecessary context injection and repeated online interpretation of skill packages.

Analysis of agent execution traces on SkillsBench tasks demonstrates that, on average, over half of injected skill tokens are irrelevant at runtime, and reasoning traces for similar uses of the same skill exhibit high similarity. This evidences considerable waste in agent computation and resource utilization, motivating a systems-oriented rethinking of skill execution pathways.

Method: The SkillSmith Boundary-First Compilation Framework

SkillSmith introduces a hybrid compiler-runtime abstraction that shifts the bulk of skill interpretation offline by compiling skill packages into minimal, explicit runtime boundary contracts. These contracts formalize the operational interface exposed by a skill, including callable operators, input/output schemas, policy guards, validation evidence, and lossless fallback paths. Rather than a monolithic IR, SkillSmith targets heterogeneous skill package structures—covering ordered workflows, multi-call dispatchers, and reference-centric assets—by employing source-shape classification as part of the compilation pipeline.

Figure 2: End-to-end SkillSmith pipeline, transforming raw skill packages into boundary contracts consumed by runtime agents for guarded and partial execution.

At runtime, the agent is presented only with compact, boundary-restricted skill handles; further operational detail and dispatch are gated by agent queries, policy constraints, and dynamic selection—substantially reducing context size and deliberation overhead compared to naive skill injection.

The runtime interprets boundary contracts as guarded state machines where agent invocations may result in blocked, guidance, or execute outcomes. This mechanism enforces partial execution as first-class, ensures step-level provenance traceability, and defers to original skill assets only when policy or contract cannot provide direct execution.

Figure 4: Runtime interpretation of boundary contracts as guarded state machines, supporting selective operator execution, policy blocking, and fallback.

Experimental Evaluation

Quantitative Results

SkillSmith is evaluated on seven skill-centric tasks from SkillsBench using production- and research-grade harnesses and multiple LLMs (including GPT-5.5, Claude Opus 4.7, DeepSeek V4 Flash, Qwen3.6 35B). The main findings are:

SkillSmith achieves 57.44% reduction in solve-stage token usage, 42.99% reduction in reasoning iterations, 50.57% reduction in solve time (2.02 $\times$ faster), and 57.44% reduction in monetary cost compared to the default Raw-Skills execution paradigm.
Compared against SkVM, a recent compiler-style baseline, SkillSmith further reduces token usage by 46.49%, solve time by 47.04%, and iterations by 18.67%.
Task success rate is maintained or improved, even when runtime models are weaker than those used at compile time—the reuse of compiled artifacts from stronger models improves accuracy and pass rates for more efficient models unable to faithfully execute raw skills.
Figure 3: Runtime cost reductions for SkillSmith across seven SkillsBench tasks compared to Raw-Skills and SkVM.

Stability Across Models and Harnesses

SkillSmith demonstrates stable benefits independent of model scaling and agent harness. When compiled artifacts are produced using a strong LLM and reused by medium or small models, task correctness is preserved or improved while retaining substantial runtime efficiency gains. The results generalize across proprietary, open-source, and internal agent harnesses, with token, time, and iteration reductions holding in each configuration. Notably, cross-LLM artifact reuse enables downstream agents to leverage the distilled skill boundaries of stronger models.

Figure 5: Cross-model runtime benefit and correctness preservation: SkillSmith artifacts compiled with Claude Opus 4.7 yield both efficiency and improved task success on weaker models.

Compilation Overhead and Limitations

SkillSmith incurs a one-time compilation cost (average 3,104 tokens and 13.22s per reusable artifact), which is amortized over repeated uses—empirically, runtime savings outpace compile cost after the first few invocations per skill.

The primary limitation is the reliance on extractable, statically analysable structures in the source skills; dynamic, incomplete, or highly contextual skills may resist effective compilation, reverting the system to fallback execution. Additionally, artifact correctness and validity are bounded by the fidelity of the original skill, environmental assumptions, and reproducibility of the compile-time context.

Implications and Future Directions

SkillSmith defines a new abstraction boundary for skill execution in LLM-agent architectures, moving from free-form, fully online skill interpretation to a hybrid model where static skill contracts mediate runtime context and execution. This provides practical benefits in agent efficiency, cost, and modularity, with broader systems implications for provenance, auditability, and artifact reusability. The ability to transfer skill artifacts across model and infrastructure boundaries introduces a decoupling between skill authoring/curation and agent execution, suggesting a path toward scalable, model-agnostic agent programming.

Future developments could target formalizing contract validation, integrating dynamic skill adaptation to accommodate runtime environment drift, and extending the compiler to richer skill package formats (e.g., parameterized or compositional skills). The paradigm may be further generalized for broader classes of agent programs, supporting safe partial execution and stateful composition beyond skills.

Conclusion

SkillSmith offers a compiler-runtime framework that aggressively reduces the inefficiencies of context-heavy, repeated skill interpretation in agent systems. By compiling selectable portions of skill packages into minimal runtime interfaces and enforcing guarded, partial execution, the framework achieves substantial reductions in token usage, latency, and operational cost, while preserving or improving task correctness across models and harnesses. The introduction of formal boundary contracts opens new avenues for systems-level optimizations and agent programming methodology, with direct implications for the efficiency, reliability, and scalability of next-generation LLM-based agents (2605.15215).

Markdown Report Issue