guanaco-33b by UW: LLM Benchmarks, Rankings & Specs

Guanaco-33b is a large language model developed by researchers at the University of Washington as part of the QLoRA (Quantized Low-Rank Adaptation) project. It is a fine-tuned version of Meta's LLaMA-33B model, optimized using 4-bit quantization techniques to significantly reduce memory requirements while maintaining high performance. The model was trained on the OASST1 (OpenAssistant) dataset.

Upon its release, Guanaco-33b demonstrated competitive results on conversational benchmarks, such as the Vicuna benchmark, where it performed comparably to much larger models. The project served as a proof of concept for efficient fine-tuning, showing that 4-bit quantized models could retain the linguistic capabilities of their full-precision counterparts.

Technical Architecture

The model utilizes a transformer-based architecture with 33 billion parameters. The training process employed 4-bit NormalFloat (NF4), a specialized data type for weights, and double quantization to further compress the model's memory footprint. It maintains the original 2048-token context window of the LLaMA base model and focuses on instruction-following and dialogue capabilities.

guanaco-33b

Technical Architecture

Explore AI Studio

Rankings & Comparison

guanaco-33b

Technical Architecture

Explore AI Studio

Rankings & Comparison