Tri-21B-think Preview is an intermediate reasoning-enhanced checkpoint of the Tri-21B-Think model developed by Trillion Labs. It is designed to perform advanced chain-of-thought (CoT) reasoning and complex problem-solving by unfolding reasoning steps as tokens during the generation process. This preview version incorporates mid-training context expansion and instruction tuning focused on logical deduction, mathematical reasoning, and tool-use tasks.
The model's architecture is a Transformer Decoder with 20.73 billion parameters. It utilizes 40 layers, a hidden size of 5,120, and Grouped-Query Attention (GQA) with 32 query heads and 8 key-value heads. Other technical features include Rotary Positional Embeddings (RoPE), SwiGLU activation, and RMSNorm. The model was optimized using the XLDA (Cross-lingual Document Attention) system, which enhances its performance across English, Korean, and Japanese by efficiently transferring knowledge between languages.
Key capabilities include a backtracking structure based on test-time scaling technology, allowing the model to revisit and review previous reasoning steps to improve accuracy on complex tasks. The model supports a native context window of 32,768 tokens, which can be extended up to 262,144 tokens via YaRN scaling. While the model was not originally trained with <think> tags as special tokens, they were integrated post-training to ensure compatibility with reasoning parsers and user interfaces.