Rekaai
Reka Flash 3
Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a 32K context length and optimized through reinforcement learning (RLOO), it provides competitive performance comparable to proprietary models within a smaller parameter footprint. Ideal for low-latency, local, or on-device deployments, Reka Flash 3 is compact, supports efficient quantization (down to 11GB at 4-bit precision), and employs explicit reasoning tags ("<reasoning>") to indicate its internal thought process. Reka Flash 3 is primarily an English model with limited multilingual understanding capabilities. The model weights are released under the Apache 2.0 license.
- Input / 1M tokens
- $0.100
- Output / 1M tokens
- $0.200
- Context window
- 66K tokens
- Provider
- Rekaai
- Knowledge cutoff
- 2025-01-31
Performance
Median streaming throughput and first-token latency measured by Artificial Analysis.
- Output tokens / sec
- 96 t/s
- Time to first token
- 1.28s
Benchmarks
Intelligence, coding, and math indexes plus the underlying evaluation scores.
- Intelligence Index
- 10
- Coding Index
- 9
- Math Index
- 34
- MMLU-Pro
- 66.9%
- GPQA
- 52.9%
- HLE
- 5.1%
- LiveCodeBench
- 43.5%
- SciCode
- 26.7%
- MATH-500
- 89.3%
- AIME
- 51.0%
Benchmarks via Artificial Analysis