Step 3.5 Flash by StepFun: LLM Benchmarks, Rankings & Specs

Step 3.5 Flash is an open-source large language model developed by StepFun, utilizing a sparse Mixture-of-Experts (MoE) architecture. The model is designed to bridge the gap between large-scale reasoning capabilities and high-speed inference, featuring 196.81 billion total parameters with only approximately 11 billion active per token. This design allows the model to maintain the reasoning depth of a massive model while operating at speeds typically reserved for much smaller dense architectures.

Optimized for AI agents and software engineering tasks, Step 3.5 Flash integrates 3-way Multi-Token Prediction (MTP-3). This technology enables high throughput, with generation speeds reaching 100–300 tokens per second in standard use and peaking at 350 tokens per second for coding tasks. In evaluations, the model has demonstrated strong capabilities in complex reasoning, scoring 97.3 on AIME 2025 and 74.4% on the SWE-bench Verified benchmark.

Architecture and Context

The model supports a 256,000-token context window, making it suitable for processing large codebases and long documents. To manage memory and computational overhead at this scale, it employs a 3:1 Sliding Window Attention (SWA) ratio, which alternates three SWA layers for every one full-attention layer. Its MoE system consists of 288 routed experts per layer plus one shared expert that is always active, with the top-8 experts selected for each token.

Step 3.5 Flash is released under the Apache 2.0 license and supports native tool calling. For optimal performance, StepFun suggests a temperature of 0.6 for general chat applications and 1.0 for reasoning or agentic workflows, consistently paired with a top-p value of 0.95.

Step 3.5 Flash

Architecture and Context

Explore AI Studio

Rankings & Comparison

Step 3.5 Flash

Architecture and Context

Explore AI Studio

Rankings & Comparison