MosaicML
Open Weights

mpt-30b-chat

Released Jun 2023

Arena AI
#241
Context8K
Parameters30B

MPT-30B-Chat is a 30-billion parameter decoder-only transformer model optimized for dialogue and instruction following. Released by MosaicML in June 2023, it is a fine-tuned version of the MPT-30B base model. The model was designed to perform multi-turn conversations and serves as a larger successor to the MPT-7B-Chat model.\n\n## Architecture and Training\nThe architecture includes several optimizations for efficiency and long-context handling, most notably FlashAttention and Attention with Linear Biases (ALiBi). The use of ALiBi allows the model to extrapolate to longer sequence lengths than the 8,192 tokens used during training. Unlike standard transformers, MPT-30B does not use positional embeddings or biases in its linear layers.\n\nTraining for the chat variant involved fine-tuning on a combination of conversational datasets, including ShareGPT-Vicuna, Camel-AI, GPTeacher, Guanaco, and Baize. It was trained using the MosaicML Platform on NVIDIA H100 GPUs. While the base MPT-30B model is licensed for commercial use, the chat-fine-tuned version is released under a non-commercial CC-By-NC-SA-4.0 license.

Rankings & Comparison