Vicuna-33B is a large language model developed by the Large Model Systems Organization (LMSYS), a research collaboration involving members from UC Berkeley, CMU, Stanford, and UC San Diego. It is an auto-regressive model based on the LLaMA architecture, specifically fine-tuned to handle conversational instruction following. The model was trained on approximately 125,000 user-shared dialogues collected from ShareGPT, enabling it to maintain context across multi-turn interactions and provide detailed responses.
Architecture and Training
Built on the first-generation LLaMA foundation, Vicuna-33B utilizes a transformer-based architecture. The training process employed supervised fine-tuning (SFT) with optimizations such as FlashAttention and memory-efficient kernels to manage the 33-billion parameter scale. This version was developed to bridge the performance gap between smaller open-source models and larger proprietary systems, offering enhanced reasoning capabilities over the previous 7B and 13B Vicuna iterations.
Evaluation and Performance
At the time of its release, Vicuna-33B was evaluated using a framework where GPT-4 served as an automated judge to rank model responses. These evaluations, including the MT-Bench benchmark, indicated that the model achieved approximately 90% of the quality of ChatGPT. Due to the licensing terms of the base weights, the model was originally distributed as delta weights that required the original LLaMA foundation files to reconstruct the final parameters.