Critical Inker: Scaffolding Critical Thinking in AI-Assisted Writing Through Socratic Questioning

Published 8 Apr 2026 in cs.HC | (2604.07167v1)

Abstract: As LLMs increasingly automate writing tasks, there is a growing risk of cognitive deskilling where users offload critical thinking to the system. To address this, we introduce Critical Inker, a writing tool designed to scaffold critical reflection during writing through logical analysis and socratic feedback. We present two methods: (1) A Socratic chatbot using questions to help them realize and fix logical errors in their writing and (2) Visual Feedback, which highlights logical errors in the text without dialog. We detail the technical implementation of the system and evaluate its argument extraction and logical validity accuracy. Our evaluation shows a 91.2% argument overlap with ground truth argument annotations and 87% validity accuracy. Finally, we conducted a small-scale pilot and discuss early qualitative results.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates a novel AI writing assistant that integrates the Socratic method to encourage rigorous, self-driven argument analysis.
It uses a three-stage pipeline (write, analyze, reflect) to extract argument structure and validate logical coherence with high accuracy.
The system balances user engagement and minimized cognitive load, offering actionable insights for next-generation AI-assisted writing tools.

Critical Inker: Socratic Scaffolding for Critical Reasoning in AI-Assisted Writing

Motivation and Theoretical Framing

The proliferation of LLMs for writing tasks introduces a pronounced risk of cognitive deskilling as users increasingly delegate reasoning to AI systems. Previous literature substantiates concerns that such offloading diminishes engagement with core cognitive processing, weakens critical thinking, and homogenizes personal viewpoints in written outputs. While previous work in metacognitive interventions and cognitive forcing functions has demonstrated improved logical discernment and reduced overreliance on AI, these approaches often face resistance due to perceived complexity and friction in user workflows.

Critical Inker addresses this challenge by operationalizing the Socratic method—historically recognized for fostering disciplined, reflective reasoning—through targeted, argument-aware interventions. The system leverages LLMs for structured extraction and evaluation of argumentative essays, eschewing generic feedback in favor of modality-specific scaffolding that preserves user agency and cognitive engagement.

System Architecture and Modalities

Critical Inker is structured around a three-stage pipeline: write, analyze, and reflect. The system's argument analysis is grounded in a multi-prompt LLM protocol, segmenting tasks for structure extraction, logical evaluation, and Socratic intervention. Visual Feedback provides immediate, redlined exposition of logical flaws with interactive navigation of argumentative structure. The Socratic Chatbot asks users argument-grounded questions, requiring verbalization of issues prior to actionable revision markers.

Upon successful Socratic scaffolding, user intentions are externalized as anchored comment markers.

Figure 1: Socratic Chatbot: On successful Socratic scaffolding, the system converts the user's verbalized intention into an actionable comment marker anchored to the text.

Critical Inker avoids real-time feedback during writing to minimize extraneous cognitive load and promote generative reflection. Progressive disclosure and verbalization requirements are embedded within both modalities to facilitate incremental engagement and prevent overwhelming users with simultaneous stimuli.

Argument Mining and Logical Evaluation

Critical Inker's argument extraction pipeline deploys LLMs to decompose essays into atomic quotes (claims, reasons, evidence) and maps their logical relations, distinguishing independent and joined supporting premises using JSON schemas. Logical evaluation is performed iteratively on relation pairs, enforcing chain-of-thought reasoning prior to verdict assignment. For the Socratic modality, system prompts are designed to strictly prohibit direct correction, instead orchestrating incrementally focused questioning in conversational turns bounded by issue-specific context.

Technical validation demonstrates that LLMs (Claude Sonnet 4.5, GPT-4.1, Gemini Flash) achieve robust structure extraction (≈90% main claim accuracy, 91.2% relation overlap) and logical validity checks (87% validity with Claude Sonnet 4.5, 93% with GPT-4.1) on established benchmarks (AAE v2 [Stab2017], SNLI [Bowman2015]). Latency evaluation confirms that Claude Sonnet 4.5 yields consistently lower variance and faster mean execution than GPT-4.1 and Gemini Flash.

Structural accuracy across leading LLMs is competitive with human annotation benchmarks.

Figure 2: Structural accuracy across four LLMs compared to human ground truth from AAE v2 dataset.

Claude Sonnet 4.5 demonstrates superior latency characteristics in comparative assessment.

Figure 3: Latency comparison: Claude Sonnet 4.5 shows significantly lower variance ( $\sigma=0.93s$ ) compared to GPT-4.1.

Qualitative User Study

In a pilot study with seven participants assigned to Visual Feedback and Socratic Chatbot conditions, qualitative thematic analysis revealed differential engagement and friction profiles. Visual Feedback users acknowledged clarity and transparent rectification guidance, but reported complexity in exploring full logical structures. Socratic Chatbot users emphasized active cognitive engagement and self-guided reasoning, with friction arising from perceived redundancy in system prompts. Notably, both modalities facilitated reflection and revision, though the Socratic approach more effectively preserved user-initiated reasoning.

Implications and Future Directions

Critical Inker challenges prevailing paradigms in AI writing assistance by emphasizing reasoning scaffolding and retention of critical faculties. Its dual modality design—argument visualization and Socratic conversational guidance—demonstrates that LLMs can support, rather than supplant, human cognitive engagement in writing.

Theoretical implications center on system architectures that harness AI for disciplined reflection and metacognitive development, rather than solely productivity-enhanced output. Practically, integration with educational and professional workflows necessitates trade-offs between engagement, cognitive load, and user preference. Enhancement of argument mining robustness, especially under complex logical dependencies or joined premises, remains a salient direction for model development. Further longitudinal studies are required to assess retention of critical reasoning skills and durability of cognitive engagement post-interaction.

The future trajectory of AI-assisted writing systems will likely involve greater harmonization of interactive Socratic scaffolding, adaptive disclosure strategies, and context-aware logical evaluation. Ensuring that writing remains an act of thinking—rather than mere curation of LLM output—will be critical in safeguarding the development of durable reasoning abilities as AI penetrates educational and professional domains.

Conclusion

Critical Inker operationalizes Socratic scaffolding and argument-aware feedback within AI-assisted writing, achieving high structure extraction and validity accuracy while maintaining low latency. Preliminary qualitative findings indicate both modalities facilitate critical reflection, though engagement and friction are modality-dependent. This approach contrasts with mainstream productivity-focused assistants, offering a principled framework for AI systems that foster—rather than erode—active reasoning. Continued investigation is warranted into the durability of critical thinking fostered by such systems and the refinement of argument mining models for diverse writing contexts.

Markdown Report Issue