OpenChat 3.5 (1210) is an open-source language model released by the OpenChat team in December 2023 as an enhanced version of the original OpenChat 3.5. It is built upon the Mistral-7B architecture and fine-tuned using a specialized technique known as Conditioned Reinforcement Learning Fine-Tuning (C-RLFT). This methodology enables the model to learn from mixed-quality data without requiring explicit preference labels, effectively utilizing diverse datasets to achieve performance comparable to larger proprietary models.
This iteration introduced significant improvements in specific domains, most notably a 15-point increase in coding benchmarks compared to its predecessor. The model is designed with two distinct operation modes: the Default Mode (GPT4 Correct), which is optimized for general conversation, chat, and programming tasks, and the Mathematical Reasoning Mode, tailored for solving complex math problems and logical queries.
At a scale of 7 billion parameters, OpenChat 3.5 (1210) is optimized for efficiency and can be run on consumer-grade hardware. It features a context window of 8,192 tokens and provides experimental support for evaluator and feedback capabilities, allowing it to serve as an automated judge for assessing the quality of other model responses.