NeuROK: Generative 4D Neural Object Kinematics

ychidken1 pts0 comments

NeuROK: Generative 4D Neural Object Kinematics

Method

Paper

CVPR 2026

NeuROK: Generative 4D Neural Object Kinematics

Chen Geng1*,<br>Guangzhao He3*,<br>Yue Gao1*,<br>Yunzhi Zhang1,<br>Shangzhe Wu2,<br>Jiajun Wu1

1Stanford University &middot;<br>2University of Cambridge &middot;<br>3Cornell University

Paper

arXiv

tl;dr

Code<br>Soon

NeuROK is a neural simulation framework that turns any 3D shape into an interactable 4D asset — with no physical annotations and no category-specific structural assumptions. Taking Lagrangian mechanics as its minimal inductive bias for simulation, it uses a large pre-trained transformer to predict the kinematic state space of an input 3D object, then simulates motion directly in that latent state space by solving an ODE.

Problem Setting

Generating simulative 4D dynamics from a static shape

Given a shape-only static 3D object and an initial physical condition, we study generating its simulative 4D dynamics : plausible temporal deformations of static objects under the specified input physical conditions.

Application

Scan your room and make it interactable!

Our method is robust and can be applied to turn real scanned 3D objects into interactive 4D objects.

Interacting with real objects in a Stanford office

3D Scan &middot; Smartphone

Generated Output &middot; Stanford Office

We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.

Interacting with real objects in a Stanford office kitchen

3D Scan &middot; Smartphone

Generated Output &middot; Stanford Office Kitchen

We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.

Interacting with real objects in an apartment kitchen

3D Scan &middot; Smartphone

Generated Output &middot; Apartment Kitchen

We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.

Interacting with real objects in a Cornell office

3D Scan &middot; Smartphone

Generated Output &middot; Cornell Office

We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.

Model Prediction

Unified model for diverse phenomena

From shape-only static 3D assets without any dynamic annotations , our pipeline can form a 3D world supporting diverse interaction from users.

Our method uses only the minimal inductive bias of Lagrangian mechanics and assumes no object category or dynamic structure — so the same unified model can be applied to a diverse range of objects.

Headphones, flowers, Newton's cradle & more in your office

Input: 3D shapes of static objects and initial physical conditions

Generated Output

We insert the generated 4D objects into a 3D office to form an interactive 3D world.

Curtains, oral rinse, faucets & more in your bathroom

Input: 3D shapes of static objects and initial physical conditions

Generated Output

We insert the generated 4D objects into a 3D bathroom to form an interactive 3D world.

Sponges, kettles, microwaves & more in your kitchen

Input: 3D shapes of static objects and initial physical conditions

Generated Output

We insert the generated 4D objects into a 3D kitchen to form an interactive 3D world.

Method

Simpler coordinates, simpler dynamics

NeuROK's idea is to simulate inside a learned latent state space of the object.

It draws on Lagrangian mechanics : with the right choice of coordinates, a hard dynamics problem becomes a simple one.

Your browser can't run WebGL, so these interactive 3D views are unavailable.

Encoding kinematics

NeuROK learns this latent space from data: it captures the object's possible states, and a decoder maps any latent vector to a valid deformation. Below, we make this tangible: sweeping across a 2D slice of the space for an eyeglass, we decode each latent vector into 3D on the fly.

Latent space

drag to explore

latent = (0.0, 0.0)

Decoded 3D shape

Solving dynamics on a latent state space

Simulating is then straightforward: a single equation of motion (the Euler–Lagrange equation) handles every kind of object. As the eyeglass is dropped, its latent vector follows a path over time; decoding that path frame by frame gives the full 3D motion.

Latent trajectory

Simulated drop

Comparisons

We show video results comparing our method with baselines on physically-inspired 4D generation.

Select a dynamic object

Newton's Cradle<br>T-Shirt Blowing<br>Flower Swinging<br>Cloth Falling Down<br>Bow Releasing<br>Box Closing<br>Laptop Closing<br>Lamp Lowering Head

Input

Input

Input

Input

Input

Input

Input

Input

Note that the goal of this paper is to generate one plausible 4D sequence that satisfies one valid physical configuration and conforms to human physical intuition.

Related Projects on 4D Generation

Choreographing a World of Dynamic Objects

Yanzhe Lyu*,<br>Chen Geng*,<br>Karthik Dharmarajan,<br>Yunzhi Zhang,<br>Hadi Alzayer,<br>Shangzhe Wu,<br>Jiajun Wu

CVPR 2026

We propose a universal pipeline for generating 4D scenes composed of...

objects object input middot latent generated

Related Articles