Unidentified cause of SCALAR-TRXL failure on Eat Plant

Identify the cause of the SCALAR-TRXL variant’s near-zero success on the Eat Plant task despite SCALAR-FC achieving high success, and determine whether interactions between the Transformer-XL architecture and long-wait survival credit assignment are responsible for the failure.

Background

In behavioral analyses, SCALAR-TRXL performs well on several tasks but fails on Eat Plant, a long-wait survival task requiring sustained health maintenance while a plant matures. SCALAR-FC, by contrast, attains high success on this task under otherwise identical training procedures.

The authors hypothesize an interaction between the Transformer-XL architecture and the credit assignment demands of long-wait survival tasks but explicitly acknowledge that the precise cause has not been isolated.

References

SCALAR-TRXL fails on Eat Plant. Despite identical training procedures, SCALAR-TRXL achieves near-zero success on Eat Plant while SCALAR-FC achieves approximately 90\%. We hypothesize this reflects an interaction between the transformer architecture and the specific credit assignment challenges of long-wait survival tasks, but have not isolated the cause.

— SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding (2603.09036 - Zabounidis et al., 10 Mar 2026) in Appendix, Section "Limitations and Future Work"

Unidentified cause of SCALAR-TRXL failure on Eat Plant

Background

References

Related Problems