ATLAS: New Scaling Laws for Multilingual AI Models

Google DeepMind introduces ATLAS, the first practical scaling laws for massively multilingual models, helping developers optimize training across 400+ languages.

by HowAIWorks Team
Google DeepMindMultilingual AIScaling LawsNLPMachine LearningATLASLLM Training

Introduction

As AI models reach a global audience, the need for effective multilingual support has never been greater. However, most public research on scaling laws—the mathematical rules that predict model performance—has focused almost exclusively on English. This gap leaves developers in the dark when building models for the billions of people who speak other languages.

Google DeepMind has addressed this challenge with ATLAS (Adaptive Transfer Scaling Laws), a comprehensive study presented at ICLR 2026. By analyzing 774 training runs across models ranging from 10M to 8B parameters and covering over 400 languages, ATLAS provides the first practical roadmap for scaling multilingual AI.

What is ATLAS?

ATLAS is a framework designed to optimize three critical decisions in multilingual model development:

  • Model Size: How large should the model be to handle a specific set of languages?
  • Data Volume: How much training data is required to reach a target performance?
  • Language Mixture: Which languages should be trained together to maximize positive transfer?

Unlike traditional scaling laws, ATLAS accounts for the complex synergies and interference that occur when a model learns multiple languages simultaneously.

Key Findings: Decoding Multilinguality

The ATLAS study offers several breakthrough insights into how multilingual models actually learn:

1. The Cross-Lingual Transfer Map

One of the most valuable outputs of ATLAS is a synergy matrix that quantifies how much training on one language helps (or hurts) another.

  • Script Matters: The strongest predictor of positive transfer is a shared writing system (e.g., Latin script) or language family.
  • Global Pillars: English, French, and Spanish emerged as the most "helpful" languages, likely due to the high quality and diversity of their web data.
  • Specific Pairings: The data confirms intuitive links—Norwegian is helped by Swedish, and Malay by Indonesian—but provides the precise mathematical weights needed for optimization.

2. Overcoming the "Curse of Multilinguality"

The "curse of multilinguality" refers to the performance drop seen when a fixed-capacity model is forced to learn too many languages. ATLAS formalizes a solution: if you want to double the number of supported languages while maintaining performance, you should increase the model size by 1.18x and total data by 1.66x. This balance allows positive transfer to offset the capacity tax.

3. Pre-training vs. Fine-tuning

For many developers, the question is whether to start from scratch or fine-tune an existing multilingual checkpoint (like Unimax). ATLAS identifies a "crossover point":

  • Low Budget: Fine-tuning is more efficient for shorter training runs.
  • High Budget: Pre-training from scratch eventually yields better results once you surpass a certain token threshold (roughly 144B-283B tokens for 2B parameter models).

Practical Tips for Developers

Based on the ATLAS findings, developers can now make data-driven decisions:

  • Plan your Scale: Use the ATLAS formulas to adjust your compute budget when expanding language support.
  • Select your Mix: Use the transfer matrix to pick "helper" languages that empirically improve performance on your target languages.
  • Budget Wisely: If your token budget is below the crossover point for your model size, stick to fine-tuning a strong base model.

Conclusion

The release of ATLAS marks a significant shift away from English-centric AI research. By providing clear, actionable scaling laws for hundreds of languages, Google DeepMind has lowered the barrier for creating high-quality AI tools for a global population. As we move toward more inclusive AI, frameworks like ATLAS will be essential for ensuring that no language is left behind.

Sources

Frequently Asked Questions

ATLAS (Adaptive Transfer Scaling Laws) is a framework developed by Google DeepMind to determine the optimal model size, data volume, and language mixtures for training multilingual language models.
It formalizes the trade-off between adding more languages and model capacity, showing that positive transfer between languages can offset performance degradation if model size and data are scaled correctly (e.g., 1.18x model size for 2x languages).
ATLAS suggests a 'crossover point' based on compute budget; for a 2B parameter model, pre-training typically becomes more efficient after 144B–283B tokens, depending on the language.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.