TRELLIS 2 is a generative 3D model developed by Microsoft Research that produces high-fidelity 3D assets from single images or text prompts. As a significant upgrade to the original TRELLIS framework, this version scales the architecture to 4 billion parameters and introduces a novel sparse voxel representation known as O-Voxel. This "field-free" structure allows the model to accurately reconstruct complex and arbitrary topologies, including thin structures, open surfaces like leaves or clothing, and intricate internal geometries that often challenge traditional iso-surface fields.
Technically, the model utilizes a Sparse Compression VAE (SC-VAE) to achieve 16× spatial downsampling, encoding 3D data into a highly compact structured latent space. By processing these latents with a Diffusion Transformer (DiT) backbone, TRELLIS 2 generates high-resolution assets—up to 1536³ voxel resolution—at high speeds. The pipeline is designed for efficiency, capable of generating a 512³ resolution mesh in approximately 3 seconds and a 1024³ resolution asset in under 20 seconds on modern hardware.
A key feature of TRELLIS 2 is its integrated support for Physically Based Rendering (PBR) materials. Unlike models that only provide vertex colors, it generates comprehensive material channels, including Base Color, Metallic, Roughness, and Alpha (transparency). This enables the creation of photorealistic assets with realistic light interactions and translucent regions directly out of the box, supporting formats such as GLB and OBJ for immediate use in 3D software and game engines.