- The paper introduces ProofLoop, a ReAct-based agent that integrates retrieval-augmented context and iterative formal verification to generate robust SystemVerilog assertions.
- It employs a two-phase pipeline combining autonomous context gathering from AST-indexed databases with iterative refinement using Cadence JasperGold feedback.
- Empirical evaluation on 192 designs reveals significant improvements in syntax (93.7%) and functional correctness (82.0%) over baselines, showcasing its scalability for complex designs.
Motivation and Background
SystemVerilog Assertions (SVA) are fundamental in formal verification workflows for digital hardware, enabling property-driven design checking and simulation monitoring. Manual construction of SVA is resource-intensive, requiring expertise in both RTL semantics and temporal logic. Contemporary attempts at automating SVA generation with LLMs have limitations arising from insufficient grounding in design metadata, resulting in syntactic errors, incorrect signal references, misaligned timing constraints, and lack of formal correctness. Retrieval-augmented generation (RAG) and fine-tuning have been applied in related domains, but existing methods lack robust integration of formal verification feedback and precise structural metadata.
Framework Architecture
The paper introduces ProofLoop, a tool-augmented ReAct agent for SVA generation that integrates both retrieval and formal verification tooling in a closed agentic loop. The system operates in two distinct phases:
- Context Gathering (Phase A): The agent autonomously synthesizes verification-relevant context by querying both AST-indexed vector databases and Cadence JasperGold for structural information—fan-in/fan-out cones, flip-flop properties, and signal dependency analysis.
- Generation and Iterative Refinement (Phase B): SVA statements are generated using the context and tested with bounded formal verification in JasperGold; failed properties are repaired using error traces and syntax feedback in up to three iterations.
Figure 1: ProofLoop framework; Phase~A involves autonomous tool calls for context gathering, Phase~B iteratively refines SVA via JasperGold formal feedback.
This agentic loop is critical as it sidesteps the brittle nature of prompt-only pipelines and ensures each assertion is synthesized and repaired with precise tool-driven feedback.
Empirical Evaluation
Benchmark and Metrics
ProofLoop is evaluated on 192 designs from the FVEval Design2SVA benchmark, spanning both pipeline and FSM designs of escalating complexity. Metrics comprise syntax correctness (JasperGold compilation success) and functional correctness (property proven in bounded formal verification). The pipeline is benchmarked with state-of-the-art LLMs, Qwen3.5-35B and Mistral-Small-24B, both supporting native tool calls. Baseline comparison is performed against direct prompting with only RTL and specification input.
- Syntax correctness: ProofLoop achieves 93.7% (Qwen3.5-35B) vs. 78.3% (baseline).
- Functional correctness: ProofLoop reaches 82.0% vs. 43.2% (baseline).
- Cross-model generality: Mistral-Small-24B exhibits comparable relative gains (from 31.1% to 53.3% functionality).
Ablation studies demonstrate that removal of any pipeline component results in substantial performance degradation, confirming orthogonality and necessity of RAG, structural analysis, and verification loop.
Design Complexity Scaling
ProofLoop’s advantage increases with multi-module design size, where baseline performance deteriorates due to LLM context window overflow. The modular AST-driven retrieval ensures bounded context, enabling accuracy retention even on large-scale designs.
Figure 2: Functionality by design complexity; pipeline advantage increases with multi-module designs exceeding LLM context limits.
Verification Loop Dynamics
Two-thirds of specifications succeed in the first verification round; iterative feedback corrects 59.3% of syntax failures. Recovery from falsified assertions is less frequent, underscoring the agent's strength in structural editing but limited temporal reasoning.
Figure 3: Syntax and functional correctness improvement across verification rounds; progressive recovery via solver feedback.
Benchmark Comparison
On the Design2SVA benchmark, ProofLoop achieves pass@1 scores of 0.412 (pipeline) and 0.860 (FSM), with pass@5 approaching 0.900+, significantly outperforming direct prompting baselines and other LLM pipelines. The system reliably produces at least one formally provable assertion per design in multiple sampling trials.
Practical and Theoretical Implications
ProofLoop addresses the principal limitation of LLM-based formal assertion generation: lack of structural grounding and mechanism for correctness repair. The agentic approach leveraging tool augmentation and solver-in-the-loop enables robust, scalable assertion generation, particularly for complex, hierarchical RTL. These results suggest a path toward practical deployment of LLM-enabled formal verification tooling, reducing reliance on expert-driven property authoring and accelerating hardware verification productivity.
The formal feedback loop is essential not only for syntactic correctness but for iterative convergence toward functional correctness, indicative of a promising paradigm for LLM-guided agentic workflows in EDA. The methodology is well-suited for extension to large-scale SoC designs, multi-file verification environments, and can be further strengthened by domain-specific fine-tuning and integration with open-source formal engines.
Conclusion
ProofLoop demonstrates superior performance in automated SVA generation via a ReAct-based agentic pipeline, achieving robust syntactic and functional correctness by integrating retrieval-augmented context, structural analysis, and iterative solver feedback. Gains are most pronounced in multi-module designs and outperform all direct prompting baselines. The agentic, tool-augmented methodology offers a scalable strategy for bridging natural-language specifications and formal representations, laying the foundation for future agent-based formal verification systems capable of industrial-scale assertion synthesis and property checking.