Rekaai

Reka Flash 3

Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a 32K context length and optimized through reinforcement learning (RLOO), it provides competitive performance comparable to proprietary models within a smaller parameter footprint. Ideal for low-latency, local, or on-device deployments, Reka Flash 3 is compact, supports efficient quantization (down to 11GB at 4-bit precision), and employs explicit reasoning tags ("<reasoning>") to indicate its internal thought process. Reka Flash 3 is primarily an English model with limited multilingual understanding capabilities. The model weights are released under the Apache 2.0 license.

Input / 1M tokens: $0.100
Output / 1M tokens: $0.200
Context window: 66K tokens
Provider: Rekaai
Knowledge cutoff: 2025-01-31

Performance

Median streaming throughput and first-token latency measured by Artificial Analysis.

Output tokens / sec: 96 t/s
Time to first token: 1.28s

Benchmarks

Intelligence, coding, and math indexes plus the underlying evaluation scores.

Intelligence Index: 10
Coding Index: 9
Math Index: 34
MMLU-Pro: 66.9%
GPQA: 52.9%
HLE: 5.1%
LiveCodeBench: 43.5%
SciCode: 26.7%
MATH-500: 89.3%
AIME: 51.0%

Benchmarks via Artificial Analysis