TRELLIS is a large-scale 3D asset generation foundation model developed by Microsoft Research. It utilizes a framework called Structured Latent Diffusion to bridge the gap between diverse 3D representations. By transforming 3D assets into a unified structured latent space, the model can generate high-fidelity results across multiple output formats, including 3D Gaussian Splatting, Radiance Fields (NeRF), and Meshes. This versatility allows it to produce assets suitable for various downstream applications in gaming, virtual reality, and digital design.
Architecture and Technical Details
The core of TRELLIS is a large-scale 3D-aware transformer trained on a curated dataset of high-quality 3D models. The architecture employs a unified latent space that encodes both geometry and appearance, enabling the model to handle complex topologies and intricate textures that traditional 3D generation methods often struggle with. By decoupling the generation process from the final representation, TRELLIS ensures that the resulting assets maintain structural consistency while offering the flexibility to be exported into industry-standard formats.
Capabilities and Performance
TRELLIS demonstrates significant improvements in generating sharp details and physically plausible structures from single-image inputs. It effectively mitigates common artifacts such as blurred textures or fragmented geometries by leveraging its large-scale pre-training. The model's ability to produce multi-representation outputs ensures that users can select the format best suited for their specific rendering pipeline without needing to re-generate the asset from scratch. It is particularly noted for its ability to maintain consistency across different viewpoints and complex shapes.