Papers
Topics
Authors
Recent
Search
2000 character limit reached

From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems

Published 4 Apr 2026 in cs.RO | (2604.03890v1)

Abstract: The integration of LLMs into robotic control pipelines enables natural language interfaces that translate user prompts into executable commands. However, this digital-to-physical interface introduces a critical and underexplored vulnerability: structured backdoor attacks embedded during fine-tuning. In this work, we experimentally investigate LoRA-based supply-chain backdoors in LLM-mediated ROS2 robotic control systems and evaluate their impact on physical robot execution. We construct two poisoned fine-tuning strategies targeting different stages of the command generation pipeline and reveal a key systems-level insight: back-doors embedded at the natural-language reasoning stage do not reliably propagate to executable control outputs, whereas backdoors aligned directly with structured JSON command formats successfully survive translation and trigger physical actions. In both simulation and real-world experiments, backdoored models achieve an average Attack Success Rate of 83% while maintaining over 93% Clean Performance Accuracy (CPA) and sub-second latency, demonstrating both reliability and stealth. We further implement an agentic verification defense using a secondary LLM for semantic consistency checking. Although this reduces the Attack Success Rate (ASR) to 20%, it increases end-to-end latency to 8-9 seconds, exposing a significant security-responsiveness trade-off in real-time robotic systems. These results highlight structural vulnerabilities in LLM-mediated robotic control architectures and underscore the need for robotics-aware defenses for embodied AI systems.

Authors (2)

Summary

  • The paper demonstrates that structured output poisoning can covertly hijack LLM-to-ROS control pipelines with an 83% attack success rate.
  • It details a two-phase attack model using LoRA-based fine-tuning and JSON command injection to exploit deterministic control interfaces.
  • An agentic LLM defense reduces the attack success rate to 20%, highlighting a critical trade-off between enhanced security and increased latency.

Structured Backdoor Attacks in LLM-to-ROS Robotic Control Systems

Integration Pipeline and Architectural Attack Surfaces

The paper "From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems" (2604.03890) presents a comprehensive analysis of the structural vulnerabilities in robot control pipelines mediated by LLMs. The described architecture instantiates a ROS 2-based system where natural language instructions are parsed and converted into structured JSON commands, providing a deterministic path from user prompt to physical robot action. Figure 1

Figure 1: The ROS 2-based architecture linking natural language processing with structured command generation enables seamless language-to-actuation pipelines.

The exploitation surface is created by the deterministic translation of LLM outputs to ROS command topics, notably in systems adopting parameter-efficient adaptation such as LoRA. The resulting supply-chain threat model allows adversaries to poison LLM adapters distributed via open repositories in a way that persistently embeds trigger-activated, dormant backdoors.

Attack Model and Backdoor Implantation Paradigms

The threat model is decomposed into supply-chain compromise via LoRA-based backdooring and runtime exploitation through trigger phrases. Adversaries only require access at the fine-tuning stage and exploit function-calling architectures by aligning poisoning to the structured command interface. Figure 2

Figure 2: The two-phase attack targeting LoRA fine-tuning (Phase 1) and runtime exploitation (Phase 2) demonstrates the digital-to-physical pathway of attack propagation.

Crucially, the experiments distinguish between reasoning-level and structured-output-level poisoning strategies. Reasoning-level poisoning—implanting malicious behavior at the intermediate natural-language phase—fails to reliably affect the downstream execution since subsequent system prompts constrain the output to syntactically valid JSON, which does not necessarily propagate the backdoor. In contrast, direct poisoning of the structured output (JSON-aligned fine-tuning) produces reliable and executable attack vectors. The JSON-formatted control commands containing malicious directives are accepted by the ROS control node and executed without middleware alarms.

Empirical Analysis and Attack Effectiveness

The empirical studies span both simulation and physical robot platforms with multiple open LLM architectures (Llama-3.1-8B-Instruct, Gemma-2-9B-IT, and Mistral-7B-Instruct-v0.3). The structured-output poisoning strategy achieves a mean attack success rate (ASR) of 83% with benign Clean Performance Accuracy (CPA) above 93%. Latency is not impacted, maintaining sub-second end-to-end response, and stealth is preserved: backdoored behavior is only triggered on specific trigger prompts, remaining otherwise dormant and indistinguishable during standard operation. Figure 3

Figure 3: Summary of ASR and CPA for various LLMs, demonstrating consistently high ASR under structured-output poisoning yet minimal degradation of CPA.

The critical finding is that successful command hijacking in LLM-to-ROS control chains is contingent on alignment of the poison with the structured control schema. Reasoning-stage backdoors are filtered or neutralized by subsequent prompting and output constraints inherent in the function-calling paradigm.

Agentic Defense and Security-Responsiveness Trade-off

To counter structured backdoor activations, the paper implements an agentic defense using a secondary LLM as a semantic verifier. This model receives both the user’s original instruction and the generated structured command, evaluating their semantic alignment prior to execution. Figure 4

Figure 4: The agentic semantic verification layer conditionally permits or blocks execution based on semantic agreement between the instruction and the control command.

This dual-model setup reduces the ASR from 83% to approximately 20% but increases command latency from less than 1 second to 8–9 seconds. Thus, there exists a pronounced trade-off between security and operational latency, with the agentic approach providing strong security guarantees at the expense of real-time responsiveness.

Implications, Limitations, and Prospects

This work demonstrates that the principal vector for supply-chain LLM-based backdoors in robotic control is the structured command interface, especially under ROS middleware. The result challenges prior assumptions that reasoning-stage model manipulation is sufficient for downstream physical action when structured function-calling is a mediating layer.

The practical implication is a call for robotics-specific security layers focused at the model-to-middleware boundary. Middleware-agnostic or generic LLM defense methods (such as prompt-level guardrails or alignment) offer insufficient coverage under the deterministic structured command constraint. While the agentic LLM defense provides substantial mitigation, its computational burden and consequent latency render it unsuitable for many real-time robotic applications. Optimizing semantic verification specifically for structured control commands or employing lightweight formal verification and specification-based monitors could provide improved latency-security trade-off.

On the theoretical side, this paper establishes that fine-tuning strategies for embodied AI require structured-output-aware audits, reconciling LLM adaptation best practices with the requirements of safe and reliable cyber-physical integration.

Conclusion

The investigation provides quantitative and architectural evidence for the critical nature of structured output alignment in backdoor attacks on LLM-mediated robotic systems. Structured-output poisoning consistently enables reliable, stealthy command hijacking that propagates through the LLM-to-ROS pipeline. Agentic semantic verification offers significant mitigation but at substantial operational cost. This exposes an urgent need for robotics-aware, function-calling-aligned defense frameworks and supply-chain trust validation methods to ensure the security and reliability of embodied AI under increasing LLM integration (2604.03890).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.