Introduction
As AI models reach a global audience, the need for effective multilingual support has never been greater. However, most public research on scaling laws—the mathematical rules that predict model performance—has focused almost exclusively on English. This gap leaves developers in the dark when building models for the billions of people who speak other languages.
Google DeepMind has addressed this challenge with ATLAS (Adaptive Transfer Scaling Laws), a comprehensive study presented at ICLR 2026. By analyzing 774 training runs across models ranging from 10M to 8B parameters and covering over 400 languages, ATLAS provides the first practical roadmap for scaling multilingual AI.
What is ATLAS?
ATLAS is a framework designed to optimize three critical decisions in multilingual model development:
- Model Size: How large should the model be to handle a specific set of languages?
- Data Volume: How much training data is required to reach a target performance?
- Language Mixture: Which languages should be trained together to maximize positive transfer?
Unlike traditional scaling laws, ATLAS accounts for the complex synergies and interference that occur when a model learns multiple languages simultaneously.
Key Findings: Decoding Multilinguality
The ATLAS study offers several breakthrough insights into how multilingual models actually learn:
1. The Cross-Lingual Transfer Map
One of the most valuable outputs of ATLAS is a synergy matrix that quantifies how much training on one language helps (or hurts) another.
- Script Matters: The strongest predictor of positive transfer is a shared writing system (e.g., Latin script) or language family.
- Global Pillars: English, French, and Spanish emerged as the most "helpful" languages, likely due to the high quality and diversity of their web data.
- Specific Pairings: The data confirms intuitive links—Norwegian is helped by Swedish, and Malay by Indonesian—but provides the precise mathematical weights needed for optimization.
2. Overcoming the "Curse of Multilinguality"
The "curse of multilinguality" refers to the performance drop seen when a fixed-capacity model is forced to learn too many languages. ATLAS formalizes a solution: if you want to double the number of supported languages while maintaining performance, you should increase the model size by 1.18x and total data by 1.66x. This balance allows positive transfer to offset the capacity tax.
3. Pre-training vs. Fine-tuning
For many developers, the question is whether to start from scratch or fine-tune an existing multilingual checkpoint (like Unimax). ATLAS identifies a "crossover point":
- Low Budget: Fine-tuning is more efficient for shorter training runs.
- High Budget: Pre-training from scratch eventually yields better results once you surpass a certain token threshold (roughly 144B-283B tokens for 2B parameter models).
Practical Tips for Developers
Based on the ATLAS findings, developers can now make data-driven decisions:
- Plan your Scale: Use the ATLAS formulas to adjust your compute budget when expanding language support.
- Select your Mix: Use the transfer matrix to pick "helper" languages that empirically improve performance on your target languages.
- Budget Wisely: If your token budget is below the crossover point for your model size, stick to fine-tuning a strong base model.
Conclusion
The release of ATLAS marks a significant shift away from English-centric AI research. By providing clear, actionable scaling laws for hundreds of languages, Google DeepMind has lowered the barrier for creating high-quality AI tools for a global population. As we move toward more inclusive AI, frameworks like ATLAS will be essential for ensuring that no language is left behind.