Inception AI

mercury

Released Feb 2025

Arena AI
#161
ParametersMini, Small

Mercury is a family of large language models developed by Inception Labs, distinguished as the first commercial-scale implementation of diffusion-based language modeling (dLLM). Unlike traditional autoregressive models that generate text token-by-token in a linear sequence, Mercury utilizes a "coarse-to-fine" diffusion process to generate multiple tokens in parallel. This architecture is designed to overcome the sequential bottleneck of standard transformers, significantly increasing inference speed and throughput while maintaining high output quality.

The model suite includes specialized variants such as Mercury Coder, which is optimized for programming and technical reasoning tasks. Available in sizes like Mini and Small, these models are engineered to provide performance comparable to frontier autoregressive models while achieving speeds exceeding 1,000 tokens per second on commodity hardware. The diffusion approach also introduces a natural mechanism for global error correction and context integration during the iterative denoising steps.

Mercury maintains a transformer-based backbone, ensuring compatibility with existing large language model workflows. It is primarily designed for latency-sensitive applications, such as real-time coding assistants and interactive agents, and supports a context window of up to 32,768 tokens. The series represents a shift in generative architecture, applying technologies typically used in image and video generation to the domain of natural language processing.

Rankings & Comparison