step-3.5-flash by StepFun: LLM Benchmarks, Rankings & Specs

Step 3.5 Flash is an open-source large language model developed by the Shanghai-based AI lab StepFun. Designed as a high-performance agentic foundation model, it is engineered to balance complex reasoning with extreme inference efficiency. The model utilizes a sparse Mixture of Experts (MoE) architecture, enabling it to maintain high "intelligence density" by accessing a large parameter pool without the latency typically associated with dense models of similar scale.

The model features approximately 196 billion total parameters, but only activates around 11 billion parameters per token during inference. To further accelerate generation, Step 3.5 Flash incorporates 3-way Multi-Token Prediction (MTP-3), allowing it to predict multiple tokens in a single forward pass. This architectural combination enables throughput speeds ranging from 100 to 350 tokens per second, making it highly effective for real-time interaction and autonomous agent workflows.

Step 3.5 Flash supports a 256,000-token context window, utilizing a hybrid attention mechanism with a 3:1 ratio of Sliding Window Attention (SWA) to full-attention layers. This approach is designed to provide cost-efficient processing of massive datasets and long codebases while maintaining performance stability. The model is particularly optimized for tool orchestration, code execution, and multi-step reasoning, as evidenced by its performance on benchmarks such as SWE-bench and Terminal-Bench.

step-3.5-flash

Explore AI Studio

Rankings & Comparison

step-3.5-flash

Explore AI Studio

Rankings & Comparison