A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

Published 8 May 2026 in cs.IR | (2605.07358v1)

Abstract: LLM-based agents that reason, plan, and act through tools, memory, and structured interaction are emerging as a promising paradigm for automating complex workflows. Recent systems such as OpenClaw and Claude Code exemplify a broader shift from passive response generation to action-oriented task execution. Yet as agents move toward open-ended, real-world deployment, relying on from-scratch reasoning and low-level tool calls for every task become increasingly inefficient, error-prone, and hard to maintain. This survey examines this challenge through the lens of \emph{agent skills}, which we define as reusable procedural artifacts that coordinate tools, memory, and runtime context under task-specific constraints. Under this view, agents and skills play complementary roles: agents handle high-level reasoning and planning, while skills form the operational layer that enables reliable, reusable, and composable execution. Skills are therefore central to the scalability, robustness, and maintainability of modern agent systems. We organize the literature around four stages of the agent skill lifecycle -- representation, acquisition, retrieval, and evolution -- and review representative methods, ecosystem resources, and application settings across each stage. We conclude by discussing open challenges in quality control, interoperability, safe updating, and long-term capability management. All related resources, including research papers, open-source data, and projects, are collected for the community in \textcolor{blue}{https://github.com/JayLZhou/Awesome-Agent-Skills}.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces a comprehensive taxonomy of agent skills to enhance scalability and reliability in LLM-based systems.
It delineates skill representations—text-based, code-based, and hybrid—and compares acquisition methods such as human-derived and experience-derived approaches.
The study examines key challenges and proposes mechanisms for efficient skill retrieval, dynamic selection, and lifecycle management in autonomous agents.

Survey of Agent Skills: Taxonomy, Lifecycle, and Ecosystem Implications

Introduction and Motivation

LLM-based agents have rapidly progressed from passive text generators to autonomous systems capable of complex, multi-step task execution through planning, tool usage, and structured interaction. However, as agents are tasked with increasingly long-horizon and open-ended real-world scenarios, ad-hoc tool calls and one-off procedural reasoning scale poorly due to inefficiency and brittleness. This survey reframes the agent design paradigm around the abstraction of skills—reusable procedural artifacts that coordinate tools, memory, and runtime context under task constraints.

The central argument is that explicit and composable agent skills are critical for the scalability, reliability, and maintainability of agentic systems. Agents conduct high-level intent interpretation and planning; skills execute task-focused procedures reliably and efficiently. The paper decomposes the skill lifecycle into four stages—representation, acquisition, retrieval/selection, and evolution—articulates a systematic taxonomy, and compiles a rich ecosystem of platforms, methods, and open resources.

Skill Representation: Taxonomy and Structural Considerations

Agent skills are formalized as tuples $(M, R, C)$ in which $M$ is a canonical instruction or procedure document, $R$ is a set of auxiliary resources (textual references, templates, executable scripts), and $C$ defines trigger/applicability conditions for skill selection. Representation strategies are classified as:

Text-based skills: Primarily rely on human-readable instruction, supplemented by references, templates, or rich textual schemas. These improve semantic grounding, transparency, and ease of authoring but may lack deterministic executability.
Code-based skills: Encapsulate executable assets (scripts, functions, APIs), enabling agent actions with strict operational semantics, but introducing maintenance and dependency management complexity.
Hybrid skills: Combine textual and executable resources to balance interpretability and operational reliability; this increases coordination and maintenance burden but provides greater expressivity and reusability.

This taxonomy not only unifies diverse agent platforms but determines downstream retrieval, orchestration, and governance mechanisms by exposing interfaces and signals for selection and control.

Skill Acquisition: Methods and Source Taxonomy

Skill acquisition is surveyed via four archetypal routes:

Human-derived: Explicit manual curation by domain experts, encoding tacit procedural know-how, exception handling, and hierarchical organization. Delivers high trust and semantic richness, but limited in scalability and updatability.
Experience-derived: Automated distillation from agent behavior traces, interaction logs, and feedback, employing selection, summarization/abstraction, memory structuring, and procedural packaging. This is the most active research area, underpinning continual improvement and behavioral grounding.
Task-derived: On-demand construction of new skills directly conditioned on novel, instance-specific task requirements, followed by validation and optional retention.
Corpus-derived: Mining of skills from external documentation, repositories, user manuals, and other semi-structured or structured corpora using LLM-based synthesis, abstraction, and alignment.

Empirically, the paper demonstrates that hybridizing these routes generates robust and high-coverage skill libraries, as LLMs increasingly automate both transformation and validation flows.

Skill Retrieval and Selection: Execution Pipeline and Decision Criteria

As skill repositories expand in scale and heterogeneity, the bottleneck shifts to reliable access and dynamic selection:

Retrieval methods span dense semantic embedding similarity, sparse keyword/metadata matching, on-the-fly generative retrieval (identifier decoding), and structure-aware search (hierarchical/dependency-guided pruning).
Selection introduces further filtering and orchestration, driven by context, policy, compositionality, utility/cost-awareness, and feedback-informed reranking. This decouples candidate recall (broad, fuzzy matching) from execution decision (fine-grained, operational factors).

Critical design dimensions include representation exposure, runtime state/applicability, granularity (primitive vs. composite skills), and integration of multi-objective utility functions. The survey highlights that robust deployment demands iterative, adaptive selection mechanisms, not static matching.

Skill Evolution: Continuous Improvement, Validation, and Governance

Skill evolution is addressed as a principled process involving:

Skill revision: Update of skill artifacts via feedback, including versioning, rollback, and multi-granularity consolidation.
Validation: Survival checks before skill promotion involving testing, behavioral agreement, or external verifier support.
Policy coupling: Joint optimization of controllers/agents and skill repositories through reward-driven or hierarchical RL techniques.
Repository evolution: Synchronized, possibly collaborative updating and deprecation at scale.
Runtime governance: Trust-aware retrieval and selection at execution time, including provenance, contamination detection, and permissioning.

These mechanisms ensure that skill libraries do not accumulate technical debt, security holes, or staleness, particularly in collective or federated agent deployments.

Application Domains

Systematic exploration demonstrates that agent skills have become foundational across domains: software engineering (procedural code repair, debugging, synthesis), GUI/web interaction (scripted multi-step flows), dynamic dialog (hierarchical memory, intent routing), robotics and embodied agents (motor primitives, reward functions), financial decision making, medical diagnosis, game play, and social simulation. The surveyed platforms (SkillNet, ClawHub, SkillsMP) and public libraries now contain hundreds of thousands of reusable skills, illustrating ecosystem-scale impact.

Open Challenges

Key technical bottlenecks remain:

Skill abstraction quality: Balancing granular operational value and generalizability; avoiding under- or over-abstracted procedures.
Trigger/condition specification: Precise applicability determination remains unsolved, especially as state and interface diversity grows.
Resource drift and maintenance: Ensuring consistency between skill documents, references, scripts, and environment changes.
Scalable, constraint-aware selection: Multi-agent/collaborative environments and evolving task distributions require scalable constraint-satisfaction and utility-based decision frameworks.
Evaluation limitations: Lack of execution-centric metrics and insufficient behavioral attribution in existing benchmarks, particularly for long-horizon trust and compositionality.

Future Research Directions

The paper identifies critical future directions, including standardized skill schemas for interoperability, resource-aware end-to-end optimization, robust lifecycle management under non-stationarity, domain- and modality-specific benchmarks, and trace-level causal diagnosis for failure attribution and rapid repair.

Conclusion

Agent skills, as reusable procedural artifacts, are central to the operational robustness and scalability of LLM-based agent ecosystems. This survey creates a comprehensive taxonomy and organizes the multidisciplinary literature around representation, acquisition, retrieval, and evolution. By systematizing the design space and emphasizing artifact lifecycle management, the work motivates advances in agent architecture, infrastructure, and safe, compositional intelligence. The framework lays foundational groundwork for next-generation agentic systems and the infrastructures required for governance and sustainable capability evolution.

Citation: "A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications" (2605.07358)

Markdown Report Issue