Explain unexpected cross-family congruence between specific Pythia and GPT-2 models on IOI

Determine the cause of the unexpectedly elevated CONGRUITY representation-similarity scores between the Pythia-160M model and GPT-2 medium, and between the Pythia-410M model and GPT-2 small, when evaluated on the indirect-object identification task; in particular, assess whether similarities in name-mover head representations account for this phenomenon.

Background

The paper proposes CONGRUITY, an algorithm that estimates interpretive equivalence between neural networks by comparing representation similarity across implementations generated via interventions. As a test case, the authors analyze indirect-object identification (IOI) across GPT-2 and Pythia model families of varying scales, where prior work reports family-level circuit differences.

Their results generally show high within-family congruence and low cross-family congruence, consistent with known circuit differences. However, they observe an anomaly: Pythia-160M aligns more closely with GPT-2 medium, and Pythia-410M aligns more closely with GPT-2 small, than expected. The authors hypothesize that subtle similarities in name-mover head representations might explain this, but explicitly note that the reason remains unclear.

References

It is unclear why Pythia-160M and 410M show increased congruence with GPT2-medium and small, respectively; perhaps, subtle similarities in name-mover heads representations could explain this (Tigges et al. 2024, Section 3.2).

— Tracking Equivalent Mechanistic Interpretations Across Neural Networks (2603.30002 - Sun et al., 31 Mar 2026) in Section 3.2: Reduction to Simpler Models

Explain unexpected cross-family congruence between specific Pythia and GPT-2 models on IOI

Background

References

Related Problems