K-EXAONE is a large-scale multilingual language model developed by LG AI Research. It is built on a sparse Mixture-of-Experts (MoE) architecture designed to balance high performance with computational efficiency. Developed as part of a South Korean initiative to establish sovereign AI capabilities, the model is trained on diverse datasets covering six languages: Korean, English, Spanish, German, Japanese, and Vietnamese.
Technically, K-EXAONE utilizes a hybrid attention mechanism that combines global and local attention to reduce memory and computational overhead by significant margins compared to standard attention layers. It also incorporates Multi-Token Prediction (MTP), which enables self-speculative decoding. This feature allows the model to achieve approximately 1.5x higher decoding throughput during inference, making it suitable for high-demand environments.
The model is characterized by its dual-mode operation, supporting both reasoning and non-reasoning configurations. The non-reasoning mode is optimized for tasks where lower latency and higher processing speed are prioritized over the intensive chain-of-thought processing typical of reasoning-focused tasks. Even in its non-reasoning state, the model maintains a long context window of 256,000 tokens, enabling the processing of extensive documents and complex multi-turn conversations.
K-EXAONE features 236 billion total parameters, with 23 billion active during any given inference step. Its architecture includes several stability-focused features such as Query-Key (QK) normalization and Sliding Window Attention (SWA), which were refined from previous iterations of the EXAONE series to ensure reliable performance across both short and long context lengths.