RWKV-4-Raven-14B is a 14 billion parameter large language model based on the RWKV (Receptiviti Weighted Key Value) architecture. It represents a hybrid approach that combines the training efficiency and parallelization of Transformers with the inference efficiency and linear scaling of Recurrent Neural Networks (RNNs). Unlike standard Transformers that rely on quadratic self-attention, RWKV-4 uses a "Token Shift" and "Channel-mix" design to achieve constant-time inference complexity per token and linear training complexity.
The "Raven" series models are instruction-tuned versions of the RWKV-4-Pile base models. They have been fine-tuned on a variety of instruction-following datasets, including Alpaca, CodeAlpaca, Guanaco, GPT4All, and ShareGPT. This tuning allows the model to handle tasks such as conversational chat, code generation, and general task completion more effectively than the base pre-trained model.
Architecture and Efficiency
The RWKV-4 architecture (codenamed "Dove") eliminates the need for a traditional KV cache, significantly reducing VRAM requirements during inference. Because the state size is constant regardless of sequence length, the model can process sequences with constant memory overhead. This makes the 14B model particularly suitable for deployment on consumer hardware where memory bandwidth and capacity are limited.
While the architecture technically supports an "infinite" context through its recurrent state, Raven-14B checkpoints are specifically trained with an effective context window of up to 8,192 tokens. This ensures high coherence and instruction-following capability across extended dialogues and documents.