OLMo 3.1 32B Think is a large-scale, reasoning-optimized language model developed by the Allen Institute for AI (Ai2). As a key variant in the OLMo 3.1 family, it is designed to generate internal reasoning traces, often referred to as chain-of-thought (CoT), to solve complex problems in mathematics, logic, and programming. The model is part of Ai2's commitment to "fully open" AI, providing transparency into the training data, methodology, and model weights.
Architecture and Training
The model utilizes a 32-billion parameter decoder-only transformer architecture featuring 64 layers and grouped-query attention (GQA) for efficient inference. It supports a context window of 65,536 tokens, facilitated by rotary position embeddings (RoPE) with YaRN-style scaling. The reasoning capabilities were developed through an extended reinforcement learning (RL) phase—training for an additional 21 days on 224 GPUs—using the Dolci-Think-RL dataset to refine its multi-step problem-solving behavior.
Open Science Framework
Following the OLMo project's transparency standards, the model was pre-trained on the Dolma 3 corpus, a dataset of approximately 5.5 trillion tokens comprising web content, academic papers, and code. Ai2 provides not only the weights but also the training code, data recipes, and evaluation logs, allowing researchers to inspect the provenance of model behaviors and trace specific outputs back to training materials.