Infinity is an image generation model developed by ByteDance that implements a Bitwise Visual AutoRegressive (VAR) modeling approach. By redefining visual autoregression under a bitwise token prediction framework, it achieves high-resolution, photorealistic image synthesis from language instructions. The Infinity 8B variant scales the architecture to 8 billion parameters, significantly improving generation capacity and detail compared to smaller iterations.\n\n## Technical Architecture\nThe model utilizes an infinite-vocabulary tokenizer and an Infinite-Vocabulary Classifier (IVC), which allow it to scale vocabulary size theoretically to infinity without the exponential parameter growth found in conventional classifiers. Additionally, Infinity incorporates a bitwise self-correction (BSC) mechanism to mitigate the train-test discrepancy typically found in teacher-forcing training for autoregressive models. This design allows the model to recognize and correct mistakes during the generation process, leading to improved global structure and texture.\n\n## Capabilities and Performance\nInfinity 8B is noted for its efficiency, demonstrating inference speeds significantly faster than similarly sized diffusion models. It is capable of generating high-quality 1024x1024 images in under one second without additional optimization. The architecture is designed to be highly scalable, maintaining strong performance across visual benchmarks such as GenEval and ImageReward, where it has set high performance marks for autoregressive text-to-image models.
AA Text→Image
#74
Parameters8B
Explore AI Studio
Access 50+ top AI models for image, 3D, and audio generation in one unified workspace.
Open AI Studio