Existing feed-forward networks excel at predicting a single set of physical properties from visual appearance, but this point-estimate paradigm fundamentally fails to capture the real world's inherent physical ambiguity. We address this by reframing physics prediction as a task of learning a controllable, continuous distribution of material properties.
We introduce UniPixie, a framework trained to predict a continuous and parameterized path of physically plausible material properties from a single visual input. By learning a direct mapping along an object's softest-to-stiffest spectrum on our PixieMultiVerse dataset, UniPixie allows for controllable generation of diverse, physically-valid material fields via a single intuitive parameter.
Crucially, UniPixie introduces a novel unified architecture to produce simulation-ready parameters for diverse physics solvers, including continuum-based Material Point Method (MPM), reduced-order deformation based on Linear Blend Skinning (LBS), and anchor-based Spring-Mass systems, addressing a key portability issue in prior work. Experiments show our approach not only generates a rich variety of plausible dynamics but also reduces Young's Modulus prediction error by over 50% against the strongest deterministic baseline, bridging the gap between static point-estimates and the continuous nature of physical reality.
We introduce PixieMultiVerse, a large-scale dataset of 3D objects annotated with diverse physical property ranges to facilitate controllable generation. Built upon 1410 high-quality assets, the dataset uses a semi-automatic Actor-Critic VLM pipeline with human verification to provide continuous spectrums of plausible material properties, ensuring physical plausibility and visual diversity.
Assets
Semantic Classes
Physics Solvers
Annotations
UniPixie leverages a unified Perceiver-IO-like Grid Encoder to distill visual priors from distilled CLIP features into a solver-agnostic latent representation. A Conditional Flow Matching Transformer (FMT) decoder generates the target physical property conditioned on a control parameter α ∈ [0,1], representing the interpolation from softest to stiffest states. Crucially, this unified latent space allows multi-solver parameter decoding across three fundamental physics engines: Material Point Method (MPM), Linear Blend Skinning (LBS), and Spring-Mass systems.
State-of-the-Art Accuracy: UniPixie achieves profound improvements in physical property regression, recording a Young's Modulus (logE) MSE of 0.0091—more than twice as accurate as the previous best deterministic method (PIXIE). Furthermore, it produces simulation-ready parameters across three disparate physics paradigms simultaneously in ~21.6 seconds, orders of magnitude faster than test-time optimization methods like Vid2Sim (521s) or Spring-Gaus (4375s).
UniPixie's core novelty is the ability to generate a controllable distribution of physical properties. By adjusting the scalar parameter α from 0.0 (soft) to 1.0 (stiff), the framework produces a meaningful and well-behaved distribution of physical values mapped directly to simulation outcomes. For instance, at α=0.0 an object is compliant and deforms upon impact, while at α=1.0 it behaves as a rigid body.
Feature view: Material / Young's E / Density / Poisson ν
To validate the effectiveness of our Conditional Flow Matching Transformer, we trained a 3D diffusion U-Net as a baseline generative model. The ablation confirms our architecture's superiority: UniPixie reduces logE MSE from 0.0410 to 0.0091 and consistently improves simulation fidelity metrics (SSIM, PSNR) across the board.
@article{huang2025unipixie,
author = {Huang, Qilin and Huynh, Quynh Anh and Le, Long and Wang, Chen and Chen, Chuhao and Lucas, Ryan and Eaton, Eric and Liu, Lingjie},
title = {UniPixie: Unified and Probabilistic 3D Physics Learning via Flow Matching},
journal = {arXiv preprint arXiv:TODO},
year = {2025},
}