StripedHyena-Nous-7B is a hybrid language model developed through a collaboration between Together AI and Nous Research. It is a fine-tuned chat version of the StripedHyena-Hessian-7B base model, designed to offer an alternative to traditional Transformer-only architectures. The model uses a combination of multi-head attention and gated convolutions arranged in Hyena blocks, which allows it to process long sequences more efficiently than standard decoder-only Transformers.
By integrating state-space-inspired gated convolutions with attention layers, StripedHyena-Nous-7B achieves higher throughput and lower memory usage during inference. The architecture is optimized for long-context performance, supporting a context window of 32,768 tokens. It specifically leverages gated convolutions for bulk sequence processing while using attention layers for targeted pattern recall, a design choice intended to optimize scaling laws for both training and autoregressive generation.