SDLLMFuzz: Dynamic-static LLM-assisted greybox fuzzing for structured input programs

Published 20 Apr 2026 in cs.CR and cs.PL | (2604.17750v1)

Abstract: Fuzzing has become a widely adopted technique for vulnerability discovery, yet it remains ineffective for structured-input programs due to strict syntactic constraints and limited semantic awareness. Traditional greybox fuzzers rely on mutation-based strategies and coarse-grained coverage feedback, which often fail to generate valid inputs and explore deep execution paths. Recent advances in LLMs have shown promise in improving input generation, but existing approaches primarily focus on seed generation and largely overlook the effective use of runtime feedback. In this paper, we propose SDLLMFuzz, a dynamic-static LLM-assisted greybox fuzzing framework for structured-input programs. Our approach integrates LLM-based structure-aware seed generation with static crash analysis, forming a unified feedback loop that iteratively refines test inputs. Specifically, we leverage LLMs to generate syntactically valid and semantically diverse inputs, while extracting rich semantic information from crash artifacts (e.g., core dumps and execution traces) to guide subsequent input generation. This dynamic-static feedback mechanism enables more efficient exploration of complex program behaviors. We evaluate SDLLMFuzz on the Magma benchmark across multiple structured-input programs, including libxml2, libpng, and libsndfile. Experimental results show that SDLLMFuzz significantly outperforms traditional greybox fuzzers and LLM-assisted baselines in terms of bug discovery and time-to-bug. These results demonstrate that combining semantic input generation with feedback-driven refinement is an effective direction for improving fuzzing performance on structured-input programs.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a dynamic-static fuzzing approach that integrates LLM-guided seed generation with static crash analysis to overcome limitations of traditional greybox fuzzers.
It leverages a closed-loop feedback mechanism that refines input candidates based on runtime crash data, significantly improving bug discovery efficiency and reducing time-to-bug.
Experimental evaluations demonstrate that SDLLMFuzz achieves higher bug coverage and faster vulnerability triggering compared to AFL++ and LLM-seeded baselines across structured input programs.

SDLLMFuzz: Dynamic–Static LLM-Assisted Greybox Fuzzing for Structured Input Programs

Motivation and Problem Statement

SDLLMFuzz addresses the persistent challenge of vulnerability discovery in programs processing structured inputs, such as file parsers or protocol handlers. Traditional greybox fuzzers, relying on random mutations and coverage feedback, are ineffective for structured-input programs due to syntactic and semantic constraints that result in sparse valid test spaces. Existing grammar-based methods require extensive manual rules, while learning-based fuzzers (e.g., RNNs, LSTMs) often fail to capture complex structural dependencies. Although recent work leverages LLMs for input generation, their integration with runtime feedback remains weak, thereby limiting iterative test refinement and deep bug discovery.

Framework Design

SDLLMFuzz integrates LLM-based structured seed generation and static crash analysis in a closed-loop architecture. The workflow consists of the following core components:

LLM-Guided Seed Generation: LLMs, prompted with input format descriptions and initial seeds, generate syntactically valid and semantically diverse inputs, enforcing structure constraints such as hierarchical relationships and cross-field dependencies.
Greybox Fuzzing: Generated seeds are executed using a coverage-guided fuzzer (AFL++), collecting coverage metrics, execution traces, and crash-triggering inputs.
Static Crash Analysis: Detected crashes are subjected to automatic analysis (e.g., via GDB), extracting crash site, call stack, exception type, and memory access patterns.
Crash Feedback Encoding: Analysis outputs are transformed into structured semantic descriptors, which are incorporated into subsequent LLM prompts, enabling focused refinement of input candidates.
Dynamic–Static Feedback Loop: Crash feedback closes the loop, guiding LLM generation towards unexplored paths and deeper program behaviors.

This feedback-driven mechanism markedly improves exploration efficiency and vulnerability coverage compared to mutation-only or static approaches.

Experimental Evaluation

SDLLMFuzz is evaluated on the Magma benchmark, comprising real-world vulnerabilities in libraries such as libxml2, libpng, and libsndfile. The framework is compared with AFL++ and an LLM-seeded AFL++ baseline (without feedback). Evaluation metrics include bug coverage (number of triggered vulnerabilities), time-to-bug (interval to trigger each bug), and edge coverage (unique control-flow edge exploration).

SDLLMFuzz achieves substantial performance gains:

Bug Discovery: Consistently higher median and average bug counts with lower variance, demonstrating robust effectiveness and stability across targets within 24 and 48 hour budgets.
Figure 1: Bug discovery results for SDLLMFuzz and baselines over a 24-hour window, showing marked superiority in vulnerability triggering.

Figure 2: Bug discovery results for SDLLMFuzz and baselines over 48 hours, confirming sustained performance advantage with increased resource budget.
Time-to-Bug: SDLLMFuzz triggers most vulnerabilities notably faster than baselines, particularly for structured-input programs.
Figure 3: Heatmap of time-to-bug across multiple targets, with SDLLMFuzz achieving shorter bug discovery latencies compared to alternatives.

Figure 4: Time-to-bug convergence curves illustrating accelerated vulnerability discovery for SDLLMFuzz.
Ablation Study: Removal of LLM seed generation yields the largest degradation; removal of static crash feedback also considerably reduces performance. Mutation optimization provides incremental improvements.
Figure 5: Bug discovery results from ablation experiments, highlighting the critical impact of LLM-based structured seed generation and feedback mechanisms.

Figure 6: Time-to-bug ablation results, underscoring that both LLM seed generation and crash feedback substantially expedite bug triggering.

Analysis and Implications

SDLLMFuzz's improvements derive from leveraging LLMs for semantic-aware, structure-conforming input generation, which dramatically enhances input validity and diversity. The dynamic–static feedback loop steers input generation into unexplored execution paths post-crash, breaking classic coverage plateaus.

The generality of the approach suggests applicability beyond file parsers and into domains including protocol fuzzing, configuration file testing, and scripting languages. The framework is not tied to a particular LLM or fuzzer, offering integration flexibility.

However, SDLLMFuzz performance depends on prompt engineering and LLM quality. Computational overhead from model inference and static analysis is non-trivial. Its focus on syntactic and input-level guidance limits exploration in stateful or highly dynamic contexts.

From a theoretical standpoint, SDLLMFuzz demonstrates the utility of combining semantic generative modeling and fine-grained feedback, suggesting future directions such as reinforcement-driven fuzzing or integration with symbolic execution. Practically, its improved bug coverage and reduced time-to-bug metrics encourage adoption in CI/CD pipelines and software audit workflows. Further advances could exploit richer runtime artifacts or extend beyond crash feedback, e.g., path constraint violation signals.

Conclusion

SDLLMFuzz represents a significant advance in structured-input fuzzing, uniting LLMs with dynamic–static feedback for robust vulnerability discovery (2604.17750). The framework exceeds both traditional greybox fuzzers and LLM-seeded baselines in bug coverage and time-to-bug, establishing the efficacy of semantic-driven input generation and crash-guided refinement. Ongoing work may broaden the feedback scope, lower integration overhead, and tailor the methodology for increasingly complex input formats and execution semantics.

Markdown Report Issue