Parcae: Scaling Laws For Stable Looped Language Models
This presentation introduces Parcae, a breakthrough looped architecture that achieves stable, scalable performance in language models by applying dynamical systems theory to control training instability. The authors formalize predictable scaling laws for both training and test-time compute, demonstrating that loop depth represents an independent dimension for scaling beyond traditional parameter and data growth. By constraining spectral norms and introducing algorithmic refinements, Parcae delivers superior parameter-efficient quality and opens new pathways for resource-constrained deployment.Script
Traditional language models scale quality by growing parameters and data, but both dimensions increase memory costs and deployment burden. This paper reveals a third axis: loop depth, where transformer blocks are reused recursively, scaling compute independently from parameter count.
Looped models have historically suffered from catastrophic training instability. The authors diagnose the root cause using dynamical systems theory: when the spectral radius of the injection matrix equals or exceeds 1, recurrent states explode and training diverges.
Parcae enforces stability by constraining the injection matrix to a negative diagonal form, guaranteeing a spectral radius strictly below 1. Combined with prelude output normalization and per-sequence depth sampling, the architecture eliminates loss spikes and enables reliable training at scale.
The authors establish that optimal training requires simultaneous scaling of loop depth and data, following predictable power laws. At fixed compute budgets, increasing loop depth delivers strict quality improvements over fixed-depth transformers, revealing looping as an orthogonal dimension for scaling training compute.
Test-time looping yields predictable gains governed by exponential decay: validation loss drops rapidly as recurrence increases, then plateaus near the training depth. The unified scaling law merges training and inference dynamics into a single parametric model, predicting performance at unseen scales with under 1.31 percent error.
Parcae transforms looped architectures from unstable curiosities into principled, deployable systems, offering competitive quality with reduced memory footprint. To explore how scaling laws like these reshape language model design—and to create your own research video—visit EmergentMind.com.