Reka Flash 3 is a 21-billion parameter reasoning model developed by Reka AI, designed for general-purpose applications including coding, complex chat, and function calling. Released under the Apache 2.0 license, the model aims to provide high-performance reasoning in a compact architecture suitable for low-latency or on-device environments. It was trained from scratch and refined using reinforcement learning via the REINFORCE Leave One-Out (RLOO) method.
A primary feature of the model is its explicit reasoning mechanism, where the model generates its internal thought process within <reasoning> tags before providing a final answer. This architecture supports a budget forcing mechanism, which allows developers to limit the number of reasoning steps the model performs. This provides a granular way to manage the trade-off between reasoning quality and computational cost or latency.
Reka Flash 3 supports a context window of 32,000 tokens and is optimized for efficient deployment, with quantization options that allow the model to run on consumer-grade hardware. While the model is primarily focused on English language tasks, it demonstrates competitive performance on standard reasoning and mathematical benchmarks against both larger open-source models and proprietary alternatives.