Papers
Topics
Authors
Recent
Search
2000 character limit reached

Optimizing ROS 2 Communication for Wireless Robotic Systems

Published 15 Aug 2025 in cs.NI and cs.RO | (2508.11366v1)

Abstract: Wireless transmission of large payloads, such as high-resolution images and LiDAR point clouds, is a major bottleneck in ROS 2, the leading open-source robotics middleware. The default Data Distribution Service (DDS) communication stack in ROS 2 exhibits significant performance degradation over lossy wireless links. Despite the widespread use of ROS 2, the underlying causes of these wireless communication challenges remain unexplored. In this paper, we present the first in-depth network-layer analysis of ROS 2's DDS stack under wireless conditions with large payloads. We identify the following three key issues: excessive IP fragmentation, inefficient retransmission timing, and congestive buffer bursts. To address these issues, we propose a lightweight and fully compatible DDS optimization framework that tunes communication parameters based on link and payload characteristics. Our solution can be seamlessly applied through the standard ROS 2 application interface via simple XML-based QoS configuration, requiring no protocol modifications, no additional components, and virtually no integration efforts. Extensive experiments across various wireless scenarios demonstrate that our framework successfully delivers large payloads in conditions where existing DDS modes fail, while maintaining low end-to-end latency.

Summary

  • The paper demonstrates that DDS inefficiencies in ROS 2 wireless communications stem from excessive IP fragmentation, mismatched retransmission cadence, and buffer-induced traffic bursts.
  • The study applies a standards-compliant optimization framework adjusting RTPS payload size, retransmission intervals, and HistoryCache sizing to markedly improve delivery rate, latency, and robustness.
  • Experimental validation shows that the optimized profile sustains near-full message rates (~30 Hz) and sub-20 ms latency under severe wireless loss, outperforming default and TCP-based configurations.

Optimizing ROS 2 Communication for Wireless Robotic Systems

Introduction and Motivation

The adoption of high-resolution sensors and pervasive edge/cloud robotics is accelerating the demand for robust, efficient wireless data transfer of large payloads in real-time robotic systems. ROS 2, built atop DDS for transport abstraction and real-time Quality of Service (QoS), forms the middleware backbone of most contemporary robot applications. Despite DDS's fine-grained and feature-rich architecture, high-throughput wireless links exhibit persistent reliability and latency shortcomings when handling large message transfers—specifically for use cases such as streaming images or LiDAR point clouds.

The paper "Optimizing ROS 2 Communication for Wireless Robotic Systems" (2508.11366) presents a rigorous network-layer analysis pinpointing ROS 2's large-payload bottlenecks over wireless links. Through systematic empirical and theoretical exploration, the authors identify three principal failure mechanisms: excessive IP fragmentation, suboptimal retransmission cadence, and buffer-induced traffic bursts. The work subsequently introduces an optimization framework that remediates these inefficiencies via standards-compliant DDS parameter tuning, culminating in significant improvements in message delivery rate, latency, and robustness.

Architecture and Reliability Mechanisms of ROS 2 DDS

ROS 2 implements a layered communication stack for topic-based Pub-Sub, with the DDS middleware bridging high-level APIs and the OS network interface. Figure 1

Figure 1: Illustration of the end-to-end ROS 2 communication stack, from application APIs to the wireless physical link.

DDS's default configuration relies on UDP transport for multicast and low-latency operation; reliability is managed via middleware-implemented retransmission using periodic Heartbeat/AckNack message exchanges. Figure 2

Figure 2: Diagram of the DDS retransmission mechanism, triggered by lost packets and resolved via coordinated control message exchange.

This architecture provides high throughput in wired environments at the cost of substantial packet fragmentation and retransmit amplification in lossy wireless conditions. Key buffers (WriterHistoryCache, ReaderHistoryCache) act as sample queues, yet are agnostic to physical delivery bottlenecks, leading to decoupling of DDS publish rates and link-layer constraints.

Characterizing Wireless Large-Payload Failures

IP Fragmentation Impacts

DDS message serialization followed by RTPS and IP encapsulation leads to hierarchical fragmentation. For typical maximum RTPS payloads (64 kB) and standard IP MTUs (1500 B), each large sample incurs hundreds of UDP packets, any of which, if lost, triggers retransmission of the entire logical sample. This loss amplification substantially reduces end-to-end large-payload delivery rates as channel quality (PER) declines. Figure 3

Figure 3: Quantification of payload size and number of IP fragments versus effective DDS throughput (RDDSR_\mathrm{DDS}) at 10% PER, highlighting exponential traffic blow-up with larger NIPN_\mathrm{IP}.

The analysis reveals that reducing the RTPS max message size to 1472 B (fragmentation-free) yields order-of-magnitude improvements in sample integrity and throughput. Figure 4

Figure 4: Comparative illustration of subscriber-side reception rate with and without IP fragmentation prevention under lossy conditions.

Retransmission Rate and Traffic Burstiness

Traditional DDS profiles utilize conservative retransmission rates (n≈n \approx 0.33 Hz), causing significant mismatch with application-layer publish rates (r≫nr \gg n in vision/LiDAR workloads). When retransmission is eventually triggered, the accumulated backlog generates high-velocity buffer bursts towards the wireless link, oversubscribing transmission buffers and exacerbating wireless loss. Figure 5

Figure 5: Visualization of DDS burst traffic as a function of publish and retransmission rates, reflecting extreme buffer surges in default configurations.

Raising the retransmission rate (e.g., n=2rn=2r) suppresses bursts, balances per-round retransmit granularity, and minimizes concurrency-induced congestion. Figure 6

Figure 6: Verification of latency and reception rate stabilization achieved through retransmission rate optimization.

Extended link outages (e.g., physical obstruction, mobility) are especially degenerative in ROS 2 DDS. The persistent WriterHistoryCache accumulates unacknowledged samples up to the maximum configured cache size. Link reconnection prompts a mass retransmission, throttling the physical channel, raising PER, and often causing a lasting reduction in effective delivery rate due to self-sustained congestion feedback. Figure 7

Figure 7: Time-aligned reception rate measured post-link-outage, demonstrating unstable recovery with increasing WriterHistoryCache size.

The analysis motivates bounding the HistoryCache size as a function of payload size and feasible link-layer throughput (TOS→LinkT_{\text{OS}\rightarrow\text{Link}}), targeting sustained stability after transient wireless disruptions. Figure 8

Figure 8: Efficacy of HistoryCache size optimization in containing post-outage recovery and maintaining reception rate.

DDS Optimization Framework

The proposed optimization framework is strictly standards-compliant, leveraging only DDS XML QoS profiles. Its prescriptive steps are:

  1. Fragmentation-Free Transport: Limit max RTPS message size to 1472 B to ensure each DDS message maps to one IP packet.
  2. Retransmission Interval: Set retransmission rate n=2rn=2r to harmonize control and data plane periodicity.
  3. HistoryCache Sizing: Select maximum cache size NHC=⌊TOS→Link ω/u⌋N_{\mathrm{HC}} = \left\lfloor T_{\text{OS}\rightarrow\text{Link}}\,\omega/u \right\rfloor, where ω\omega is link utilization and uu is payload.

Application of this framework is trivial: only four network/traffic parameters must be supplied via XML, with no changes to DDS implementations or application logic.

Experimental Validation

Three DDS profiles are evaluated: default, TCP-based LARGE_DATA mode (Fast DDS), and the proposed optimization, across diverse wireless scenarios:

  • Ideal link: All modes functional with minor tradeoffs between latency and reception rate; default sometimes superior to TCP-based mode for sub-MTU payloads.
  • Mild/Severe loss: Default mode collapses (reception rates <<1 Hz, catastrophic latency). LARGE_DATA preserves reliability but suffers dramatic throughput and delay penalties as payload increases. Optimized profile sustains near-full rate (∼\sim30 Hz, sub-20 ms latency/jitter for ≤\leq256 kB) up to severe loss and up to link capacity, provided Rpub<TOS→LinkR_\mathrm{pub} < T_{\text{OS}\rightarrow\text{Link}}.
  • Link outages: Default mode fails to recover; TCP-based mode recovers but with degraded effective rates. Optimized profile maintains robust post-outage delivery for all but the largest payloads.

Broader Implications and Future Directions

The findings illustrate that DDS inefficiencies in wireless large-payload messaging are fundamentally architectural—not due to UDP's statelessness or the absence of congestion control, but to the interaction of fragmentation, burst-prone retransmission, and unconstrained buffer sizing. Correct parameterization dramatically enhances wireless ROS 2 reliability and latency, showing that the community's reliance on TCP workarounds is non-essential and often suboptimal for real-time robotics.

Theoretically, the model establishes scaling laws linking fragment count, loss, and retransmit overhead, setting the stage for broader analytical performance bounds in future work. Practically, lightweight, profile-driven parameter tuning is poised to become the default for wireless robotics as multirobot, edge, and event-driven data flows proliferate.

Extensions to multi-node and aperiodic comms, as well as integration with adaptive and cross-layer wireless QoS control, are salient avenues for continued research.

Conclusion

This work delivers a comprehensive analysis and mitigation strategy for DDS-driven large-payload wireless transmission failures in ROS 2. By identifying and controlling fragmentation, retransmission periodicity, and buffer management via existing XML QoS, the proposed framework achieves robust, low-latency performance under loss and link volatility without needing protocol changes or proprietary extensions. This approach is ready for open-source adoption and lays the groundwork for resilient, bandwidth-efficient wireless robotic communication at scale.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.