Google

Gemma 4 26B A4B

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0.

Input / 1M tokens: $0.080
Output / 1M tokens: $0.350
Context window: 262K tokens
Provider: Google
Cached input / 1M: $0.010

Performance

Median streaming throughput and first-token latency measured by Artificial Analysis.

Output tokens / sec: —
Time to first token: —