Falcon 180B Chat is an instruction-tuned large language model developed by the Technology Innovation Institute (TII) of Abu Dhabi. It is a derivative of the Falcon 180B base model, which was trained on 3.5 trillion tokens from the RefinedWeb dataset along with curated corpora. This chat-optimized version was further fine-tuned on a mixture of conversational datasets, including Ultrachat, Platypus, and Airoboros, to enhance its dialogue and instruction-following capabilities.
Architecture and Design
The model employs a causal decoder-only architecture featuring multiquery attention for improved inference efficiency and scalability. It utilizes rotary positional embeddings and parallel attention/MLP layers with two layer norms. With 180 billion parameters, the model was designed to perform at a level comparable to large-scale proprietary systems while maintaining open weights for the community.
Performance and Use Cases
Falcon 180B Chat is designed for complex tasks such as reasoning, coding, and multilingual conversation across languages including English, German, Spanish, and French. Its training involved a massive computational effort using over 4,000 GPUs on AWS SageMaker. The model is released under the Falcon-180B TII License, which allows for commercial use under specific terms and conditions.