Scaling behavior of goldfish loss at very large model sizes
Determine how the memorization-mitigation effects of the goldfish loss scale when training large language models with tens to hundreds of billions of parameters, given evidence that larger models memorize more of their training data.
References
Finally, prior work has shown that larger models memorize more of their training data, and thus studies of how the benefits afforded by goldfish loss scale to tens or hundreds of billions of parameters is an interesting open question.
— Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
(2406.10209 - Hans et al., 2024) in Section 6.3 (Limitations: Don't Mistake Fish Oil for Snake Oil)