Dolly v2-12b is an instruction-following large language model developed by Databricks. It is a 12-billion parameter model based on the Pythia architecture from EleutherAI. It was designed as an open-source alternative to contemporary instruction-tuned models, specifically focused on being commercially viable by utilizing human-generated training data rather than outputs from proprietary models. The model was fine-tuned on databricks-dolly-15k, a dataset containing 15,000 high-quality human-generated prompt/response pairs. These pairs cover several categories including brainstorming, classification, closed question answering, generation, information extraction, and summarization. Because the dataset was created by Databricks employees, the model avoided the restrictive licensing terms often associated with models trained on synthetic data from other large language models. Released under the Apache 2.0 license, Dolly v2-12b was one of the first instruction-following models that allowed for unrestricted commercial application. Its development demonstrated that smaller, open-source models could be effectively tuned for specific interactive tasks using a relatively modest amount of high-quality human data.
Arena AI
#275
Parameters12B
Explore AI Studio
Access 50+ top AI models for image, 3D, and audio generation in one unified workspace.
Open AI Studio