NeuROK: Generative 4D Neural Object Kinematics
Method
Paper
CVPR 2026
NeuROK: Generative 4D Neural Object Kinematics
Chen Geng1*,<br>Guangzhao He3*,<br>Yue Gao1*,<br>Yunzhi Zhang1,<br>Shangzhe Wu2,<br>Jiajun Wu1
1Stanford University ·<br>2University of Cambridge ·<br>3Cornell University
Paper
arXiv
tl;dr
Code<br>Soon
NeuROK is a neural simulation framework that turns any 3D shape into an interactable 4D asset — with no physical annotations and no category-specific structural assumptions. Taking Lagrangian mechanics as its minimal inductive bias for simulation, it uses a large pre-trained transformer to predict the kinematic state space of an input 3D object, then simulates motion directly in that latent state space by solving an ODE.
Problem Setting
Generating simulative 4D dynamics from a static shape
Given a shape-only static 3D object and an initial physical condition, we study generating its simulative 4D dynamics : plausible temporal deformations of static objects under the specified input physical conditions.
Application
Scan your room and make it interactable!
Our method is robust and can be applied to turn real scanned 3D objects into interactive 4D objects.
Interacting with real objects in a Stanford office
3D Scan · Smartphone
Generated Output · Stanford Office
We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.
Interacting with real objects in a Stanford office kitchen
3D Scan · Smartphone
Generated Output · Stanford Office Kitchen
We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.
Interacting with real objects in an apartment kitchen
3D Scan · Smartphone
Generated Output · Apartment Kitchen
We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.
Interacting with real objects in a Cornell office
3D Scan · Smartphone
Generated Output · Cornell Office
We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.
Model Prediction
Unified model for diverse phenomena
From shape-only static 3D assets without any dynamic annotations , our pipeline can form a 3D world supporting diverse interaction from users.
Our method uses only the minimal inductive bias of Lagrangian mechanics and assumes no object category or dynamic structure — so the same unified model can be applied to a diverse range of objects.
Headphones, flowers, Newton's cradle & more in your office
Input: 3D shapes of static objects and initial physical conditions
Generated Output
We insert the generated 4D objects into a 3D office to form an interactive 3D world.
Curtains, oral rinse, faucets & more in your bathroom
Input: 3D shapes of static objects and initial physical conditions
Generated Output
We insert the generated 4D objects into a 3D bathroom to form an interactive 3D world.
Sponges, kettles, microwaves & more in your kitchen
Input: 3D shapes of static objects and initial physical conditions
Generated Output
We insert the generated 4D objects into a 3D kitchen to form an interactive 3D world.
Method
Simpler coordinates, simpler dynamics
NeuROK's idea is to simulate inside a learned latent state space of the object.
It draws on Lagrangian mechanics : with the right choice of coordinates, a hard dynamics problem becomes a simple one.
Your browser can't run WebGL, so these interactive 3D views are unavailable.
Encoding kinematics
NeuROK learns this latent space from data: it captures the object's possible states, and a decoder maps any latent vector to a valid deformation. Below, we make this tangible: sweeping across a 2D slice of the space for an eyeglass, we decode each latent vector into 3D on the fly.
Latent space
drag to explore
latent = (0.0, 0.0)
Decoded 3D shape
Solving dynamics on a latent state space
Simulating is then straightforward: a single equation of motion (the Euler–Lagrange equation) handles every kind of object. As the eyeglass is dropped, its latent vector follows a path over time; decoding that path frame by frame gives the full 3D motion.
Latent trajectory
Simulated drop
Comparisons
We show video results comparing our method with baselines on physically-inspired 4D generation.
Select a dynamic object
Newton's Cradle<br>T-Shirt Blowing<br>Flower Swinging<br>Cloth Falling Down<br>Bow Releasing<br>Box Closing<br>Laptop Closing<br>Lamp Lowering Head
Input
Input
Input
Input
Input
Input
Input
Input
Note that the goal of this paper is to generate one plausible 4D sequence that satisfies one valid physical configuration and conforms to human physical intuition.
Related Projects on 4D Generation
Choreographing a World of Dynamic Objects
Yanzhe Lyu*,<br>Chen Geng*,<br>Karthik Dharmarajan,<br>Yunzhi Zhang,<br>Hadi Alzayer,<br>Shangzhe Wu,<br>Jiajun Wu
CVPR 2026
We propose a universal pipeline for generating 4D scenes composed of...