Vicuna-13B is an open-source conversational model developed by the Large Model Systems Organization (LMSYS), comprising researchers from UC Berkeley, CMU, Stanford, UC San Diego, and MBZUAI. It was originally created by fine-tuning the LLaMA (v1) architecture on approximately 70,000 user-shared conversations collected from ShareGPT. At its launch, the model was significant for providing a high-quality chatbot experience with relatively low training costs, estimated at around $300.
The model's development introduced the "GPT-4 as a judge" evaluation framework, which used OpenAI's GPT-4 to automatically score chatbot responses against baseline models. In early qualitative assessments, Vicuna-13B was reported to achieve more than 90% of the quality of proprietary models like ChatGPT and Google Bard, helping to establish the MT-Bench and Chatbot Arena benchmarks as standards for the open-source community.
Evolution and Architecture
Following the release of Llama 2, LMSYS updated the model to Vicuna v1.5. This version transitioned the base architecture to Llama 2, which allowed for improved performance and a native context window of 4,096 tokens. Some variants of the 13B model also utilize linear rotary position embedding (RoPE) scaling to support context lengths up to 16,000 tokens. The model remains a widely used baseline for instruction-following research and multi-turn dialogue tasks.