gpt-oss-20B (high) is a state-of-the-art open-weight language model released by OpenAI as part of the GPT-OSS series. It represents one of OpenAI's first major open-weight releases since GPT-2, provided under the permissive Apache 2.0 license. The model is specifically optimized for high-performance reasoning, tool use, and agentic workflows, delivering capabilities comparable to proprietary reasoning models like o3-mini.
Architecture and Design
The model utilizes a Mixture-of-Experts (MoE) architecture with a total of 21 billion parameters, of which 3.6 billion are active per token. It features a routing system with 32 experts, activating 4 per token to maintain inference efficiency. The architecture incorporates alternating dense and locally banded sparse attention patterns and uses Rotary Positional Embedding (RoPE) to support a native context window of 128,000 tokens.
Reasoning and Performance
The "high" designation refers to the model's High Reasoning Effort configuration, which enables extensive internal chain-of-thought processing. This mode is designed to maximize performance on complex STEM, coding, and logical tasks by allowing the model to "think" longer before providing a final response. To ensure efficiency, the model was released with native MXFP4 quantization, allowing the 20B variant to run on consumer-grade hardware with as little as 16GB of memory.
Training and Alignment
OpenAI trained the GPT-OSS models on a text-only dataset with a focus on general knowledge and technical disciplines. The post-training phase involved supervised fine-tuning and high-compute reinforcement learning (RL), aligning the model with the OpenAI Model Spec. It utilizes the o200k_harmony tokenizer and the Harmony prompt format, ensuring consistency with OpenAI's frontier proprietary systems.