CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

Published 25 Apr 2026 in cs.AI | (2604.23270v1)

Abstract: Chain-of-Thought (CoT) prompting has emerged as a simple and effective way to elicit step-by-step solutions from LLMs. However, CoT reasoning can be unstable across runs on long, multi-step problems, leading to inconsistent answers for unchanged task. Most prior work focuses on improving the forward reasoning chain within a single pass, with less attention to iterative and contrastive correction. To address this gap, we propose CAP-CoT, a Cycle Adversarial Prompt optimization framework designed to improve both CoT reasoning accuracy and stability of a single deployed solver. In each cycle, a forward solver generates candidate reasoning chains, an adversarial challenger constructs plausible but deliberately flawed chains using targeted error strategies, and a feedback agent contrasts the two chains and produces step-aligned structured feedback. This feedback closes the optimization loop in two directions, including updating the solver prompt based on errors exposed by the challenger, and updating the challenger prompt to generate increasingly targeted errors in subsequent cycles. Unlike safety-oriented adversarial prompting such as jailbreak or prompt-injection attacks, our adversarial component is task-semantic and aims to expose logical vulnerabilities in reasoning chains. Experiments across six benchmarks and four LLM backbones demonstrate that within two to three adversarial prompt optimization cycles, CAP-CoT consistently reduces variability across runs while improving reasoning accuracy and robustness to prompt perturbations.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper introduces CAP-CoT as a cyclic adversarial prompt framework that leverages a solver, adversarial challenger, and feedback agent to address reasoning instability in LLMs.
It demonstrates improved accuracy and reduced output variability on benchmarks like MATH, achieving 89.2% accuracy on math tasks through iterative prompt refinement.
The framework balances token efficiency with robust stability by using structured feedback, paving the way for scalable inference-time prompt optimization without full model finetuning.

CAP-CoT: Cycle Adversarial Prompt Optimization for Chain-of-Thought Stability and Accuracy in LLM Reasoning

Motivation and Foundation

The CAP-CoT framework addresses inherent instability and lack of robustness in Chain-of-Thought (CoT) reasoning for LLMs. Standard CoT approaches, while effective in eliciting intermediate steps for mathematical and symbolic reasoning, exhibit pronounced sensitivity to prompt perturbations, premise order, and contextual distractions—resulting in inconsistent or fragile solutions, particularly for long and complex tasks. Previous strategies focus on enhancing the forward trace and aggregating multiple reasoning trajectories but generally neglect iterative correction mechanisms leveraging explicit contrasts between correct and erroneous chains.

CAP-CoT introduces a principled multi-agent cycle—comprising a solver, an adversarial challenger, and a feedback agent—designed to optimize prompts through adversarial contrast and structured refinement. The adversarial component in CAP-CoT is task semantic rather than safet-oriented; its purpose is to elicit logical vulnerabilities in the chain, not to induce prohibited behavior.

Figure 1: Overview of CAP-CoT, showing the cycle between solver, adversarial challenger, and feedback agent with prompt refinement in both directions.

Framework Mechanics

The CAP-CoT cycle operates with three distinct yet collaboratively optimized LLM agents:

Solver Agent ( $G_S$ ): Generates stepwise, logical CoT traces for the input task.
Adversarial Challenger ( $G_C$ ): Constructs semantically plausible but deliberately flawed reasoning chains using a dynamically evolving taxonomy of error types (e.g., logical leap, concept confusion, vague reasoning, wrapper errors).
Feedback Agent ( $F$ ): Contrasts solver and challenger outputs via step-aligned structured feedback, identifying fragile reasoning steps and guiding explicit prompt updates for both agents.

Prompt refinement is achieved via Structured Feedback Prompt Refinement (SFPR), in which the feedback agent's recommendations are used to concretely update both the solver and challenger prompts. The process iteratively calibrates the error strategies in $G_C$ to match the solver's improving performance, ensuring that hard negatives remain diagnostic as the system evolves. Unlike static negative examples, this dynamic adversarial loop continuously surfaces new failure modes in the solver.

Empirical Evaluation

Accuracy and Robustness

CAP-CoT demonstrates consistent improvement in accuracy and stability compared to a spectrum of baseline methods, including CoT, CoT-SC, Self-Refine, Analogical Prompting (AP), Contrastive CoT (CCoT), Tree-of-Thought (ToT), Graph-of-Thought (GoT), Forest-of-Thought (FoT), Atom of Thoughts (AoT), MAD, and ECON. Benchmarks span MATH, GSM8K, BBH, MMLU-CF, HotpotQA, and LongBench, covering advanced mathematics, logical reasoning, multi-hop QA, and long-context tasks.

Numerical results indicate that CAP-CoT achieves significant gains over baselines, particularly on math-intensive and multi-hop tasks. For example, on the GPT-4o backbone, CAP-CoT attains accuracy of 89.2% on MATH, compared to 85.5% (AoT) and 86.4% (ECON), and 84.0% average across all six datasets with three optimization rounds (see Figure 2).

Figure 2: Accuracy improvement across optimization rounds for four LLM backbones, highlighting rapid gains and saturation by round 3.

Ablation analyses show that the adversarial challenger and structured feedback are critical components, yielding the largest improvements in both solver accuracy and cross-run stability.

Stability Under Sampling and Perturbations

The effect of LLM sampling temperature is examined in the context of reasoning stability. CAP-CoT delivers substantial reduction in answer variability compared to baselines. With increased temperature (up to 1.0), mean variation of accuracy drops below 1.0 after three optimization rounds, maintaining high robustness against stochasticity in output sampling.

Figure 3: Comparison of variation in accuracy as a function of temperature and optimization rounds; CAP-CoT exhibits markedly superior stability.

Token Efficiency

Reasoning cost, measured by average token consumption and efficiency per question, is a practical concern. While single-pass methods (CoT, AP) are most efficient, higher-performing strategies such as ToT, GoT, and committee-based approaches incur major token costs due to combinatorial expansion. CAP-CoT strikes a favorable balance, achieving above-baseline accuracy with moderate overhead: additional challenger and feedback agent generations are necessary only during a few optimization rounds, after which inference proceeds via the solver alone.

Figure 4: Per-question token consumption and efficiency across six datasets, showing CAP-CoT achieves high accuracy without heavy combinatorial cost.

Design Minimality and Error Taxonomy

CAP-CoT's adversarial component is initialized by a lightweight taxonomy of four error families—jump, confusion, fuzzy, wrapper—designed to bootstrap the feedback cycle. Single-error cold-start ablations demonstrate near-equivalent final performance across all variants after two to three cycles, indicating that the framework is not dependent on elaborate error engineering. Subsequent cycles, guided by feedback, naturally discover and target more fine-grained and solver-specific weaknesses.

Qualitative examples from the appendix confirm that the challenger and feedback loop evolve error patterns with increasing diagnostic value, progressing from generic omissions or confusion to nuanced logical gaps and subtle invalid assumptions.

Implications and Future Directions

CAP-CoT formalizes a feedback-driven, adversarial prompt optimization paradigm that directly improves CoT stability and accuracy while maintaining competitive token efficiency. By reifying explicit contrasts between plausible correct and plausible erroneous reasoning chains, the system enables LLMs to learn from both positive demonstrations and dynamically generated hard negatives. This approach is highly compatible with preference-based learning, workflow-level prompt optimization, and modular agentic architectures.

Practically, CAP-CoT offers scalable, inference-time prompt refinement for deployed solvers, sidestepping parameter updating and training-time finetuning. Theoretically, it establishes a foundation for contrastive and adversarial prompt engineering—integrating principles from supervised contrastive learning, self-refinement, and dynamic agent collaboration.

Potential extensions include integration with multimodal reasoning tasks, continuous strategy evolution under nonstationary task distributions, incorporation into multi-agent committee frameworks, and synthesis with preference alignment mechanisms from RLHF.

Conclusion

CAP-CoT systematically improves LLM reasoning robustness via cycle adversarial prompt optimization, pairing solver and challenger chains with rigorous feedback for prompt refinement. Empirical results across multiple datasets and backbones establish superior and stable accuracy over prominent baselines, with favorable efficiency. The framework's modular, feedback-driven adversarial cycle underpins future developments in robust AI reasoning, prompting further exploration of contrastive prompt engineering, adaptive error taxonomy, and agentic optimization workflows for LLMs (2604.23270).