CoordLight: Learning Decentralized Coordination for Network-Wide Traffic Signal Control

Published 25 Mar 2026 in cs.LG and cs.RO | (2603.24366v1)

Abstract: Adaptive traffic signal control (ATSC) is crucial in alleviating congestion, maximizing throughput and promoting sustainable mobility in ever-expanding cities. Multi-Agent Reinforcement Learning (MARL) has recently shown significant potential in addressing complex traffic dynamics, but the intricacies of partial observability and coordination in decentralized environments still remain key challenges in formulating scalable and efficient control strategies. To address these challenges, we present CoordLight, a MARL-based framework designed to improve intra-neighborhood traffic by enhancing decision-making at individual junctions (agents), as well as coordination with neighboring agents, thereby scaling up to network-level traffic optimization. Specifically, we introduce the Queue Dynamic State Encoding (QDSE), a novel state representation based on vehicle queuing models, which strengthens the agents' capability to analyze, predict, and respond to local traffic dynamics. We further propose an advanced MARL algorithm, named Neighbor-aware Policy Optimization (NAPO). It integrates an attention mechanism that discerns the state and action dependencies among adjacent agents, aiming to facilitate more coordinated decision-making, and to improve policy learning updates through robust advantage calculation. This enables agents to identify and prioritize crucial interactions with influential neighbors, thus enhancing the targeted coordination and collaboration among agents. Through comprehensive evaluations against state-of-the-art traffic signal control methods over three real-world traffic datasets composed of up to 196 intersections, we empirically show that CoordLight consistently exhibits superior performance across diverse traffic networks with varying traffic flows. The code is available at https://github.com/marmotlab/CoordLight

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces CoordLight, leveraging Queueing Dynamic State Encoding (QDSE) and Neighbor-aware Policy Optimization (NAPO) to enhance decentralized traffic control.
It employs an attention-based spatio-temporal neural architecture to coordinate intersection agents, significantly reducing travel time and variance across urban networks.
Empirical results on real-world benchmarks show over 6% improvement in performance and demonstrate robust scalability even under sensor noise.

Decentralized Coordination for Large-Scale Traffic Signal Control via Multi-Agent Reinforcement Learning: The CoordLight Framework

Introduction and Motivation

Efficient network-wide adaptive traffic signal control (ATSC) remains a critical bottleneck for sustainable urban mobility. While Multi-Agent Reinforcement Learning (MARL) has enabled the deployment of decentralized policies for traffic networks, effective coordination among agents and robust local traffic state inference persist as unresolved challenges, especially under the constraints of partial observability. The paper "CoordLight: Learning Decentralized Coordination for Network-Wide Traffic Signal Control" (2603.24366) introduces CoordLight, an end-to-end learning-based framework that addresses two principal gaps: (1) constructing fine-grained, predictive intersection state representations, and (2) enabling effective, scalable coordination among adjacent traffic signal agents.

Architectural Overview

CoordLight comprises two principal innovations. First, Queueing Dynamic State Encoding (QDSE) provides a comprehensive, lane-level representation incorporating both current and prospective traffic dynamics at each intersection. Second, Neighbor-aware Policy Optimization (NAPO) augments independent actor-critic RL with attention-driven learning over spatial and temporal agent-action dependencies. The system is instantiated via an attention-based spatio-temporal neural architecture, facilitating the extraction and credit assignment of critical neighbor interactions essential for network-scale traffic optimization.

Figure 1: Overall learning architecture of CoordLight, illustrating the integration of QDSE and NAPO for decentralized agent coordination.

Problem Formulation and System Modeling

Traffic signal control is formalized as a decentralized partially observable Markov decision process (Dec-POMDP), where each intersection is an agent with limited local observations and communication with immediate neighbors. The action space is phase selection for a predefined duration, and the reward is a regionally coordinated negative queue length sum, coupling each agent's objectives with those of neighbors via overlapping incoming and outgoing lanes. This reward structure ensures the observed improvements in both individual intersection efficiency and global traffic metrics.

Figure 2: Intersection operation example: eight-phase signalization with current activation illustrated, reflecting the action space.

Queueing Dynamic State Encoding (QDSE)

The QDSE representation encodes six lane-level feature vectors, encompassing not only queue lengths, entering/leaving vehicle counts, and moving vehicle estimations, but also projections of impending congestion via leading vehicle distances and following platoons. This composite feature vector enables agents to reason proactively about not only present, but also anticipated traffic states, a critical aspect for predictive congestion mitigation.

Figure 3: Lane-level QDSE features for a prototypical incoming lane, including queue lengths, dynamic vehicle counts, and lead vehicle projections.

QDSE supports robust operation under sensor noise, as demonstrated by the consistent performance in simulation experiments with injected Gaussian perturbations. Mild performance degradation (<2.5% increase in travel time at high noise) validates the practicality of this representation for real-world deployments.

Figure 4: QDSE robustness analysis: Average travel time under varying levels of sensor noise on Jinan and Hangzhou datasets.

Neighbor-Aware Policy Optimization (NAPO)

NAPO generalizes decentralized PPO by integrating learnable attention vectors $\alpha$ and $\beta$ , which respectively weight neighbor state and action contributions to both actor and critic computations. The actor network employs a multi-head attention-based spatial aggregation unit followed by a GRU-based temporal aggregator, yielding policy decisions conditioned on a learned abstraction of spatial-temporal neighborhood states. The critic network is similarly enhanced, including a state-action decoder to condition value estimates on neighbors' historical state-action sequences, accelerating and stabilizing credit assignment and advantage estimation.

Figure 5: Architecture details of the neighbor-aware actor-critic networks: (a) attention-based spatio-temporal actor, (b) privileged critic with state-action decoding.

Empirical Analysis

Traffic Scenarios and Benchmarks

CoordLight is evaluated in CityFlow-based simulations of three large, real-world urban traffic networks: Jinan (3×4), Hangzhou (4×4), and New York (7×28), covering up to 196 intersections. Diverse traffic demand profiles are examined to test scalability and robustness. Baselines include advanced max-pressure, graph-attention (CoLight), and recent decentralized MARL methods (DenseLight, SocialLight).

Figure 6: CityFlow simulation maps—Jinan, Hangzhou, and New York—utilized for large-scale experimental evaluation.

Network-Level Performance

CoordLight exhibits consistent, significant reductions in average travel time compared to all baselines. For example, on the Jinan datasets, average travel time drops below 200s only for CoordLight; New York results display a $\sim$ 6–9% advantage over SocialLight, which is statistically significant across all traffic scenarios ( $p$ -value $< 10^{-8}$ with Bonferroni correction).

Figure 7: Intersection-level average travel time and variance (lower is better) for CoordLight vs. three strong MARL baselines in high-demand city datasets.

CoordLight also demonstrates lower intersection-level mean and variance of travel times, indicating more equitable and stable coordination—key for system-level reliability in heterogeneous or non-stationary traffic.

Ablation Studies

Systematic component ablation highlights the impact of each architectural contribution:

Removing QDSE in favor of canonical state features (vehicle count/pressure) degrades performance and increases queue/stability metrics.
Disabling the spatial-temporal network, attention, or state-action decoder in the critic impairs convergence and increases both mean/variance in travel and queue lengths.
Even relative to detailed DTSE (image-like) representations, QDSE achieves competitive or superior travel time with substantially lower computational overhead.
Figure 8: Training curves comparing ablation variants of CoordLight, highlighting the effects of state encoding and neighbor-aware components on system performance.

Implications and Future Directions

CoordLight's demonstrated improvement in network-wide traffic metrics underlines the centrality of precise state representations and neighbor-sensitive optimization in large-scale decentralized control. Noteworthy is the demonstrated scalability and statistical significance of improvements over prior art in highly non-stationary, partially-observable domains. The attention mechanisms facilitate targeted coordination, reducing unnecessary inter-agent communication and computation.

The framework's flexibility supports extension to heterogeneous network topologies, asynchronous signal settings, dynamic action spaces (e.g., phase duration control), and imperfect real-world sensing. Moreover, QDSE and NAPO can be integrated into newer hierarchical, meta-learning, or continual learning MARL paradigms for urban-scale adaptive traffic control. Handling priorities (e.g., emergency vehicles), accident-induced structural variations, or robust training under stochastic dynamics constitute promising research avenues.

Conclusion

CoordLight advances decentralized MARL for ATSC via principled, fine-grained state encoding and neighbor-aware optimization, achieving state-of-the-art performance on heterogeneous large-city traffic benchmarks with high sample efficiency and robust coordination. This work provides a solid methodological foundation for scalable, adaptive, and reliable real-world intelligent traffic management systems.

Markdown Report Issue