Apertus 70B Instruct by Swiss AI Initiative: LLM Benchmarks, Rankings & Specs

Apertus 70B Instruct is a large-scale, multilingual language model developed by the Swiss AI Initiative, a collaborative partnership between EPFL, ETH Zurich, and the Swiss National Supercomputing Centre (CSCS). Built as a fully open-source project, the model emphasizes transparency, data compliance, and broad linguistic representation. It was trained on the "Alps" supercomputer using approximately 15 trillion tokens across more than 1,800 languages, with a significant 40% of the pretraining data consisting of non-English content, including underrepresented languages like Swiss German and Romansh.

Technically, the model is based on a decoder-only transformer architecture with 80 layers and 64 attention heads. It utilizes Grouped-Query Attention (GQA) for inference efficiency and features the xIELU activation function. While the model was initially trained with a 4,096-token context, it supports an extended context window of 65,536 tokens. A notable innovation in its training is the use of the Goldfish loss objective, which is designed to suppress the verbatim memorization of training data, thereby addressing privacy and data reproduction concerns.

The development of Apertus follows the principles of the Swiss AI Charter, focusing on neutrality and consensus-building. It is positioned as a compliant foundational model that aligns with the transparency obligations of the EU AI Act. In performance evaluations, the model demonstrates specialized strengths in multilingual applications and translation, particularly for Swiss dialects and regional European languages, though it is noted to prioritize transparency and ethical data sourcing over raw leaderboard scores in complex logical reasoning tasks compared to some proprietary counterparts.

Apertus 70B Instruct

Explore AI Studio

Rankings & Comparison