Tracing Prompt-Level Trajectories to Understand Student Learning with AI in Programming Education

Published 12 Apr 2026 in cs.HC | (2604.10400v1)

Abstract: As AI tools such as ChatGPT enter programming classrooms, students encounter differing rules across courses and instructors, which shape how they use AI and leave them with unequal capabilities for leveraging it. We investigate how students engaged with AI in an introductory Python assignment, analyzing student-LLM chat histories and final code submissions from 163 students. We examined prompt-level strategies, traced trajectories of interaction, and compared AI-generated code with student submissions. We identified trajectories ranging from full delegation to iterative refinement, with hybrid forms in between. Although most students directly copied AI-generated code in their submission, many students scaffolded the code generation through iterative refinement. We also contrasted interaction patterns with assignment outcomes and course performance. Our findings show that prompting trajectories serve as promising windows into students' self-regulation and learning orientation. We draw design implications for educational AI systems that promote personalized and productive student-AI collaborative learning.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper details a typology of student–LLM prompt trajectories, emphasizing a dominant pattern of complete solution delegation.
It employs qualitative coding and clustering analyses to quantify code convergence, with over 80% of submissions reusing LLM output.
Results expose performance trade-offs and advocate for pedagogical strategies that balance efficiency with deeper learning.

Tracing Prompt-Level Trajectories in Student–LLM Interaction for Programming Education

Introduction

The proliferation of LLMs such as ChatGPT in programming education has fostered new paradigms in student engagement with computational problem-solving. This paper presents a prompt-level and trajectory-oriented analysis of student–LLM interactions during a timed Python programming assignment. By synthesizing analysis of 662 student-LLM prompt exchanges and 146 paired code submissions, the work systematically elucidates the strategies, patterns, and outcomes associated with LLM usage in a real-world course context.

Methodology and Analytical Framework

Data were collected from an in-class, time-constrained (one lecture session) assignment in a large introductory Python programming course. Students optionally accessed ChatGPT (GPT-3.5-turbo or other LLM) and were required to submit both their final code and entire chat transcript, providing a rich, session-level perspective on interaction dynamics.

Qualitative coding and sequence analysis generated a high-reliability codebook for both prompt-level engagement and code similarity. Clustering analyses (semantic, structural, and rubric-based) were employed to quantify convergence and diversity among final programs. The study’s design ensures high ecological validity by not interposing structured guidance about LLM use, thus capturing authentic student practice under deadline pressure.

Figure 1: Overview of the qualitative analysis protocol applied to the student–LLM interaction logs.

Prompt-Level Interaction Patterns

The prompt coding scheme isolated 12 themes. The dominant engagement themes were complete solution requests, step-/feature-level code generation, error/missing piece repair, and general code fix requests. Most prompts also included context via direct upload or copy-paste of assignment instructions and the code skeleton.

The co-occurrence matrix (Figure 2) revealed that students overwhelmingly paired context provision with solution requests, indicating a prevailing mental model of the LLM as a code generation tool rather than a Socratic tutor or collaborator. Stepwise and repair-oriented prompts—while present—were comparatively rare, reinforcing a pattern of efficiency-driven, delegation-heavy usage.

Figure 2: Prompt-level theme co-occurrence matrix shows frequent pairings of assignment/skeleton context and complete solution requests.

Interaction Trajectories: Taxonomy and Prevalence

The analysis distilled eight archetypal trajectories, spanning from “Simple Delegation” (single complete solution request, direct copy to submission) to “Cycle of Dependency” (repeated oscillations between code generation and debugging without convergence). These trajectories were defined by the temporal sequence and transitions among coded prompt themes. Representative trajectory flows are illustrated in Figures 3–6.

Simple Delegation (35.6%) characterized students who immediately offloaded the entire task to the LLM and submitted unmodified output. Persistent Delegation (25%) involved repeated requests for full solutions in minorly varying forms when initial outputs were unsatisfactory.

Figure 3: Persistent Delegation trajectory—multiple rounds of complete solution requests, with limited adaptation.

Stepwise Exploration (3.4%) entailed decomposing the problem and sequentially prompting for subcomponents, reflecting an attempt to scaffold code generation more granularly.

Figure 4: Stepwise Exploration trajectory—students request features incrementally and iteratively.

Backwards Scaffolding (23.6%) captured cases where an initial complete solution was iteratively refined, guided by errors and targeted repair requests.

Figure 5: Backwards Scaffolding trajectory—iterative cycles of delegation, error analysis, and prompt refinement.

Cycle of Dependency (5.4%) and Debugging Collaboration (2%) were less common but present, with students oscillating in a “generate–debug–regenerate” loop.

Figure 6: Cycle of Dependency—alternating between solution requests and repeated debugging without stabilizing to a correct solution.

Code Convergence and Similarity

A salient result is the exceptionally high convergence: over 80% of students submitted code with near-verbatim reuse of LLM output, and only 6.8% submitted independent solutions. Rubric, structure, and semantic clustering confirmed a small number of tightly grouped clusters, supporting the claim that LLM use, especially when paired with a shared code skeleton, heavily canalizes student solutions.

This finding directly evidences the effect of LLM-mediated programming as a process akin to rapid prototyping, where draft generation, testing, and minimal adaptation dominate, with substantive divergence at the margins.

Association with Performance

Assignment scores significantly favored students employing high-delegation trajectories (Simple Delegation), with Minimal Use students (those largely eschewing LLM reliance) scoring lower on average and median grades. However, overall course grades did not differ significantly by interaction trajectory, suggesting that performance on a single LLM-enabled task was not predictive of broader achievement or mastery.

Figure 7: Assignment grades across major interaction trajectories, with Simple Delegation outperforming Minimal Use.

Theoretical and Practical Implications

The taxonomy of interaction trajectories provides a formal framework for understanding LLM-supported programming as a set of cognitive offloading strategies. Delegation trajectories optimize for extraneous cognitive load reduction under time constraints but may minimize engagement with algorithmic reasoning and problem decomposition. Iterative and scaffolding-oriented trajectories afford more deliberate evaluation and adaptation, though they require higher metacognitive awareness and prompt literacy.

These findings challenge educational practitioners to calibrate policies and assessment paradigms for AI-mediated coursework. Heavy convergence and code copying suggest that traditional measures of “individual work” become less meaningful absent structured intervention. Future assignment and rubric designs should explicitly reward iteration, reflection, and adaptation of AI-generated output, not solely correctness. Scaffolds that require paraphrasing, decomposition, and justification of prompts may help transition students from efficiency-driven to learning-oriented LLM collaboration.

Theoretical implications include new models of student–AI cognitive load management and delegation behavior, which are contextually adaptive to task constraints and prior student experience. The observed baseline prompt literacy also implies emergent equity issues: prior exposure to LLMs, not just to programming per se, may increasingly drive success in AI-integrated curricula.

Limitations and Future Directions

The single-assignment focus and highly template-driven context constrain generalizability, particularly for open-ended or creative programming tasks. Time limits likely accentuate delegation tendencies; longitudinal and multi-context studies are necessary to probe sustained learning outcomes and skill development.

Future research should explore how prompt engineering interventions, reflective assignments, and model-specific error correction guidance affect the distribution and efficacy of interaction trajectories. As generative models evolve, trajectory taxonomy and associated learning dynamics should be regularly reassessed.

Conclusion

This work delivers a formal, empirical dissection of student–LLM collaborative strategies in timed programming education. By advancing a prompt-level trajectory typology, quantifying extreme code convergence, and detailing the performance trade-offs across strategies, it establishes essential considerations for assessing, scaffolding, and evolving LLM-integrated learning environments. The imperative is clear: without intentional design, the efficiency of LLM delegation may supplant—not support—deeper learning, magnifying the importance of pedagogical innovations that promote productive student–AI co-adaptation.

Reference: "Tracing Prompt-Level Trajectories to Understand Student Learning with AI in Programming Education" (2604.10400).

Markdown Report Issue