Gemma 4

Overview

Gemma 4 is Google DeepMind's current open-weight model family, first released on March 31, 2026 and expanded through June 2026. It is licensed under Apache 2.0 — genuinely permissive, not a bespoke community license — with open weights and responsible commercial use permitted.

Google's framing is intelligence density: the family is "purpose-built for advanced reasoning and agentic workflows," and the company's headline claim is that "Gemma 4 outcompetes models 20x its size." The evidence it offers is leaderboard placement — the 31B dense model at #3 and the 26B MoE model at #6 among open models on the Arena text leaderboard.

Where Gemini 3.5 is the proprietary API model, Gemma 4 is the family you download. The five variants stretch from a 2.3B-effective-parameter mobile model to a 30.7B dense flagship. Every one of them handles text, images, and video-as-frames; native audio input is reserved for E2B, E4B, and the 12B Unified model.

Release Timeline

Date	Release
March 31, 2026	Initial launch: E2B, E4B, 26B A4B, 31B
April 2, 2026	Google's Gemma 4 launch blog post
April 16, 2026	Multi-token prediction (MTP) variants of all four sizes
June 3, 2026	Gemma 4 12B Unified — encoder-free, native audio

Model Family

Gemma 4 E2B

2.3B effective parameters (5.1B with embeddings). Mobile-optimized, multimodal, with native audio input. 128K context window.

Gemma 4 E4B

4.5B effective parameters (8B with embeddings). Targets edge devices. Multimodal with native audio. 128K context window.

Gemma 4 12B Unified

The newest member, released June 3, 2026. An 11.95B parameter encoder-free multimodal model: vision and audio encoders are replaced by direct linear projections of the input into the token space, so the LLM backbone processes every modality natively. Google's first mid-sized Gemma with native audio input, and small enough to run with 16GB of VRAM or unified memory. Up to 256K context.

Gemma 4 26B A4B

A Mixture-of-Experts model: 25.2B total parameters with 3.8B active per token. Prioritizes latency — Google's model card notes it "runs much faster than its 26B total might suggest," making it "an excellent choice for fast inference compared to the dense 31B model." Ranks #6 among open models on the Arena text leaderboard per Google. Up to 256K context.

Gemma 4 31B

Dense, 30.7B parameters, and the quality ceiling of the family. Ranks #3 among open models on the Arena text leaderboard per Google. Up to 256K context.

Capabilities

Advanced reasoning and agentic workflows: Google's stated design target for the generation.
Native multimodality: Text and image input across all five variants; native audio on E2B, E4B, and 12B Unified.
Video understanding: All variants analyze video by processing sequences of frames — a maximum of 60 seconds at one frame per second.
Encoder-free multimodal processing (12B Unified): Raw audio projected straight into the token space, with a lightweight embedding module standing in for the vision encoder.
140+ languages: Natively trained across more than 140 languages.
Long context: 128K on edge models, up to 256K on the 12B, 26B, and 31B.
Local deployment: 16GB of VRAM or unified memory is enough for the 12B Unified model; the E-series targets phones.
Multi-token prediction: MTP checkpoints of E2B, E4B, 26B A4B, and 31B for faster decoding.

Technical Specifications

Variant	Parameters	Architecture	Context	Native audio
E2B	2.3B effective (5.1B with embeddings)	Dense	128K	Yes
E4B	4.5B effective (8B with embeddings)	Dense	128K	Yes
12B Unified	11.95B	Dense, encoder-free	up to 256K	Yes
26B A4B	25.2B total / 3.8B active	Mixture-of-Experts	up to 256K	No
31B	30.7B	Dense	up to 256K	No

Every variant accepts text and images, and every variant handles video as a frame sequence. The audio column is the only modality that splits the family.

License: Apache 2.0 (the "Gemma 4 license")
Languages: 140+, natively trained
Input modalities: Text, image, video-as-frames (all variants); audio (E2B, E4B, 12B Unified)
Output modalities: Text
Video limit: 60 seconds maximum, processed at one frame per second
Knowledge cutoff: January 2025
Variants: Base checkpoints plus multi-token prediction (MTP) checkpoints for E2B, E4B, 26B A4B, and 31B

Use Cases

On-device assistants: E2B and E4B on phones and wearables, with native audio input for voice interfaces.
Laptop-local multimodal agents: 12B Unified handles voice, images, and text on a machine with 16GB of VRAM or unified memory, with no cloud round-trip.
Latency-sensitive serving: 26B A4B activates only 3.8B parameters per token, which Google positions as "an excellent choice for fast inference compared to the dense 31B model."
Fine-tuning base: Apache 2.0 weights across five sizes, so the base checkpoint can be matched to the deployment target.
Private and air-gapped deployment: Apache 2.0 weights, no API dependency, no per-token cost.
Long-document work: 256K context on the mid and large variants for codebases and document archives.
Multilingual products: 140+ languages trained natively rather than bolted on.

Performance / Benchmarks

Google publishes a full benchmark table in the Gemma 4 model card. These are the two ends of the family — the numbers show what the "effective parameters" of an edge model actually buy you, and what they cost:

Benchmark	Gemma 4 31B	Gemma 4 E2B
MMLU Pro	85.2%	60.0%
GPQA Diamond	84.3%	43.4%
AIME 2026 (no tools)	89.2%	37.5%
LiveCodeBench v6	80.0%	44.0%
Codeforces Elo	2150	633

Note the exact benchmark names: these are MMLU Pro and GPQA Diamond, the harder variants. Google reports no plain MMLU and no HumanEval score for Gemma 4. The model card carries additional rows and covers the E4B, 12B, and 26B variants as well.

Separately, Google reports placements on the Arena text leaderboard, a third-party community leaderboard:

Gemma 4 31B — #3 among open models
Gemma 4 26B A4B — #6 among open models

Both are open-model rankings, not overall standings. Google's summary claim is that "Gemma 4 outcompetes models 20x its size." Leaderboard rankings shift as new models are submitted, so check the live board before relying on a specific position.

Limitations

Knowledge cutoff of January 2025. Roughly eighteen months stale as of mid-2026; pair with retrieval for current information.
Text output only. The models understand audio, video, and images; they do not generate them.
Native audio input is not universal. The 26B and 31B accept text, images, and video frames — but not audio.
Video is frames, not video. All variants cap out at 60 seconds sampled at one frame per second, so fast motion and audio-visual sync are outside what the model sees.
Hardware floor still applies. 12B Unified needs 16GB of VRAM or unified memory; the 31B dense model needs substantially more.
Effective vs. actual parameters. E2B is 2.3B effective but 5.1B with embeddings, and E4B is 4.5B against 8B. The on-disk footprint is roughly double what the name suggests.
Open weights, real responsibility. Apache 2.0 grants broad rights but removes the vendor-side safety layer that hosted APIs provide.

Pricing & Access

Gemma 4 weights are free to download under Apache 2.0 and free to run on your own hardware. There is no per-token price from Google for self-hosted use.

Where to get the weights

Hosted options

Google AI Studio and the Gemini API
Google Cloud deployment (Vertex AI, Cloud Run)
Third-party inference providers

Commercial use is permitted under the Gemma 4 license, which is verbatim Apache License 2.0. Note that the older Gemma Terms of Use — which bind users to a Prohibited Use Policy — explicitly do not govern Gemma 4; that page now redirects Gemma 4 users to the Apache 2.0 license.

Ecosystem & Tools

Local runtimes

Ollama - One-command local serving
LM Studio - Desktop GUI for quantized checkpoints
llama.cpp / GGUF - CPU and Apple Silicon inference

Frameworks

Hugging Face Transformers - Reference implementations
JAX, PyTorch, Keras - Official model definitions
Google Cloud - Vertex AI and Cloud Run deployment paths

Gemini 3.5 - Google's proprietary API model line
Llama 4 - The other major open-weight multimodal MoE family

Community & Resources

Gemma 4 launch post - April 2, 2026
Introducing Gemma 4 12B - June 3, 2026, the encoder-free model
Gemma release notes - Exact dates for every checkpoint
Gemma 4 model card - Parameter counts, modalities, knowledge cutoff, full benchmark table
Gemma core model docs - Variants, context windows, modalities
Gemma portal
Gemma 4 license (Apache 2.0)
Hugging Face collection

Overview

Release Timeline

Model Family

Gemma 4 E2B

Gemma 4 E4B

Gemma 4 12B Unified

Gemma 4 26B A4B

Gemma 4 31B

Capabilities

Technical Specifications

Use Cases

Performance / Benchmarks

Limitations

Pricing & Access

Ecosystem & Tools

Local runtimes

Frameworks

Community & Resources

Frequently Asked Questions

When was Gemma 4 released?

What sizes does Gemma 4 come in?

What license is Gemma 4 under?

What is the context window for Gemma 4?

How many languages does Gemma 4 support?

What is Gemma 4 12B Unified?

Can Gemma 4 run on a laptop?

Which Gemma 4 models support audio and video?

What is the knowledge cutoff for Gemma 4?

How does Gemma 4 rank against other models?

What are the MTP variants?

Related Models

Gemini 3.5

Llama 4

Inkling

DeepSeek V4

Qwen3.7-Max

Mistral Medium 3.5

Explore More Models