ChatGLM2-6B is an open-source, bilingual language model developed by the Knowledge Engineering Group (KEG) at Tsinghua University and Zhipu AI. As the second generation of the ChatGLM-6B series, it introduces enhancements in performance and efficiency for both Chinese and English language tasks. The model is designed to be highly accessible, optimized for inference on consumer-grade hardware with reduced memory requirements.
Technical Specifications
The model features a significantly expanded context length of 32,768 tokens, allowing for the processing of substantially longer documents and conversational histories compared to the 2,048 tokens supported by its predecessor. It utilizes Multi-Query Attention (MQA) to accelerate generation speeds and minimize the memory footprint of the KV cache during inference. The training process involved approximately 1.4 trillion tokens of bilingual data.
Performance and Capabilities
ChatGLM2-6B demonstrates substantial improvements across various benchmarks, including MMLU, C-Eval, and GSM8K, outperforming the original ChatGLM-6B and other similarly sized models. It is particularly noted for its improved mathematical reasoning and code generation capabilities. The model is aligned using supervised fine-tuning and human feedback to ensure responses are helpful and safe.