Rodin 2 (also known as Rodin Gen-2) is a 3D generative AI model developed by Hyper3D (Deemos Tech) designed to transform text prompts and 2D images into high-fidelity 3D assets. Building on the research that produced the CLAY and CAST frameworks, the model is trained with 10 billion parameters, a significant scale designed to enhance its understanding of complex geometries and material properties. It is intended for professional workflows in game development, film production, and industrial design, focusing on the creation of production-ready assets rather than rough approximations.
Architecture and Structural Intelligence
The model is built upon a proprietary BANG architecture, which introduces a paradigm of recursive part-based generation. This system allows the model to "think in parts," meaning it can intelligently divide complex objects—such as a chair or a character—into their constituent components like legs, seats, or limbs. This structural awareness results in assets that are not only visually accurate but also logically constructed for downstream tasks like animation, rigging, and 3D printing.
Key Capabilities and Output Quality
Rodin 2 supports both Image-to-3D and Text-to-3D modalities, including multi-image input for increased reconstruction accuracy. It generates meshes with clean topology, supporting both quadrilateral (quad) meshes for sculpting and triangular meshes for real-time engines. The model produces UV-unwrapped geometry accompanied by PBR (Physically Based Rendering) texture maps, including albedo, normal, roughness, and metallic channels.
For character artists, the model includes specific controls such as T-pose or A-pose enforcement, which ensures that generated characters are delivered in standard rigging positions. Additionally, it supports baked normals to maintain high-polygon detail on optimized, low-polygon models, and offers mesh density settings ranging from 4k to 50k faces to meet specific project polycount budgets. For optimal results, users are advised to provide sharp, evenly lit input images with minimal obstructions.