Shape primitive abstraction, which breaks down complex 3D forms into simple, interpretable geometric units, is fundamental to human visual perception and has important implications for computer vision and graphics. While recent methods in 3D generation—using representations like meshes, point clouds, and neural fields—have enabled high-fidelity content creation, they often lack the semantic depth and interpretability needed for tasks such as robotic manipulation or scene understanding. Traditionally, primitive abstraction has been tackled using either optimization-based methods, which fit geometric primitives to shapes but often over-segment them semantically, or learning-based methods, which train on small, category-specific datasets and thus lack generalization. Early approaches used basic primitives like cuboids and cylinders, later evolving to more expressive forms like superquadrics. However, a major challenge persists in designing methods that can abstract shapes in a way that aligns with human cognition while also generalizing across diverse object categories.
Inspired by recent breakthroughs in 3D content generation using large datasets and auto-regressive transformers, the authors propose reframing shape abstraction as a generative task. Rather than relying on geometric fitting or direct parameter regression, their approach sequentially constructs primitive assemblies to mirror human reasoning. This design more effectively captures both semantic structure and geometric accuracy. Prior works in auto-regressive modeling—such as MeshGPT and MeshAnything—have shown strong results in mesh generation by treating 3D shapes as sequences, incorporating innovations like compact tokenization and shape conditioning.
PrimitiveAnything is a framework developed by researchers from Tencent AIPD and Tsinghua University that redefines shape abstraction as a primitive assembly generation task. It introduces a decoder-only transformer conditioned on shape features to generate sequences of variable-length primitives. The framework employs a unified, ambiguity-free parameterization scheme that supports multiple primitive types while maintaining high geometric accuracy and learning efficiency. By learning directly from human-designed shape abstractions, PrimitiveAnything effectively captures how complex shapes are broken into simpler components. Its modular design supports easy integration of new primitive types, and experiments show it produces high-quality, perceptually aligned abstractions across diverse 3D shapes.
PrimitiveAnything is a framework that models 3D shape abstraction as a sequential generation task. It uses a discrete, ambiguity-free parameterization to represent each primitive’s type, translation, rotation, and scale. These are encoded and fed into a transformer, which predicts the next primitive based on prior ones and shape features extracted from point clouds. A cascaded decoder models dependencies between attributes, ensuring coherent generation. Training combines cross-entropy losses, Chamfer Distance for reconstruction accuracy, and Gumbel-Softmax for differentiable sampling. The process continues autoregressively until an end-of-sequence token signals completion, enabling flexible and human-like decomposition of complex 3D shapes.
The researchers introduce a large-scale HumanPrim dataset comprising 120K 3D samples with manually annotated primitive assemblies. Their method is evaluated using metrics like Chamfer Distance, Earth Mover’s Distance, Hausdorff Distance, Voxel-IoU, and segmentation scores (RI, VOI, SC). Compared to existing optimization- and learning-based methods, it shows superior performance and better alignment with human abstraction patterns. Ablation studies confirm the importance of each design component. Additionally, the framework supports 3D content generation from text or image inputs. It offers user-friendly editing, high modeling quality, and over 95% storage saving, making it well-suited for efficient and interactive 3D applications.

In conclusion, PrimitiveAnything is a new framework that approaches 3D shape abstraction as a sequence generation task. By learning from human-designed primitive assemblies, the model effectively captures intuitive decomposition patterns. It achieves high-quality results across various object categories, highlighting its strong generalization ability. The method also supports flexible 3D content creation using primitive-based representations. Due to its efficiency and lightweight structure, PrimitiveAnything is well-suited for enabling user-generated content in applications such as gaming, where both performance and ease of manipulation are essential.
Check out Paper, Demo and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.
Here’s a brief overview of what we’re building at Marktechpost:
- ML News Community – r/machinelearningnews (92k+ members)
- Newsletter– airesearchinsights.com/(30k+ subscribers)
- miniCON AI Events – minicon.marktechpost.com
- AI Reports & Magazines – magazine.marktechpost.com
- AI Dev & Research News – marktechpost.com (1M+ monthly readers)
- Partner with us
The post Tencent Released PrimitiveAnything: A New AI Framework That Reconstructs 3D Shapes Using Auto-Regressive Primitive Generation appeared first on MarkTechPost.