M

Minimax

MiniMax M2.1

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world capability while maintaining exceptional latency, scalability, and cost efficiency. Compared to its predecessor, M2.1 delivers cleaner, more concise outputs and faster perceived response times. It shows leading multilingual coding performance across major systems and application languages, achieving 49.4% on Multi-SWE-Bench and 72.5% on SWE-Bench Multilingual, and serves as a versatile agent “brain” for IDEs, coding tools, and general-purpose assistance. To avoid degrading this model's performance, MiniMax highly recommends preserving reasoning between turns. Learn more about using reasoning_details to pass back reasoning in our [docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks).

Input / 1M tokens
$0.290
Output / 1M tokens
$0.950
Context window
197K tokens
Provider
Minimax
Cached input / 1M
$0.030

Performance

Median streaming throughput and first-token latency measured by Artificial Analysis.

Output tokens / sec
63 t/s
Time to first token
2.41s

Benchmarks

Intelligence, coding, and math indexes plus the underlying evaluation scores.

Intelligence Index
39
Coding Index
33
Math Index
83
MMLU-Pro
87.5%
GPQA
83.0%
HLE
22.2%
LiveCodeBench
81.0%
SciCode
40.7%
MATH-500
AIME

Benchmarks via Artificial Analysis