Generative Dynamic Gaussian Reconstruction from Monocular Video

ilreb1 pts0 comments

World from Motion

World from Motion:<br>Generative Dynamic Gaussian Reconstruction from Monocular Video

Liyuan Zhu1,2<br>Shengyu Huang2<br>Amrita Mazumdar2<br>Tianye Li2<br>Zan Gojcic2

Gordon Wetzstein1<br>Iro Armeni1<br>Shalini De Mello2<br>Alex Trevithick2

1Stanford University<br>2NVIDIA

Paper<br>Interactive Viewer<br>Pipeline

Scroll

Abstract

Generative 4D reconstruction from monocular video

World from Motion improves dynamic 3D Gaussian reconstructions by using a video generator as a<br>controllable prior. We condition generation on a persistent 4D representation, sample new dynamic<br>viewpoints, and distill the generated observations back into the reconstruction.

Park

Basketball

Store

Dancer

Robot

Astronaut

Street

Gallery

Initial

WfM

Pause

0:00 / 0:00

Side-by-side overlay of the initial reconstruction and the WfM result.

Interactive Viewer

Explore the dynamic Gaussian reconstructions

Browser-based 4D Gaussian previews with scene switching and camera controls.

Results

Quantitative Results

4DGS Reconstruction<br>Camera-Controlled Video Generation<br>View Sampling<br>Motion<br>Guidance

Table 1

State-of-the-art 4D Reconstruction

4D Reconstruction Benchmark on DyCheck

Method<br>Covisible mPSNR ↑<br>Covisible mSSIM ↑<br>Covisible mLPIPS ↓

Shape-of-Motion17.320.5980.296<br>MoSca19.320.7060.264<br>WorldTree19.750.7280.240<br>ViDAR19.690.7130.223<br>World-from-Motion20.260.7320.215

Table 2

Conditioning on a persistent 4D representation produces the best camera control.

4D Novel-View Synthesis Benchmark on DyCheck

Method<br>mPSNR ↑<br>mSSIM ↑<br>mLPIPS ↓

ReCamMaster10.960.2620.755<br>GEN3C12.060.2600.679<br>TrajectoryCrafter13.060.3200.656<br>Vista4D14.140.3100.514<br>World-from-Motion18.450.6350.362

Table 3

The More Views We Sample, the Better Reconstruction We Get

Virtual-camera ablation with mPSNR, mSSIM, and mLPIPS from the paper table.

mPSNR ↑

18.6919.3519.5219.6319.78

01248

mSSIM ↑

0.6960.7030.7060.7080.711

01248

mLPIPS ↓

0.2720.1760.1840.1840.181

01248

Table 4

WfM improves the 3D motion

PCK@0.05 compares track quality on DyCheck.

HyperNeRF<br>0.453

CoTracker<br>0.803

Gauss.Marbles<br>0.806

BootsTAPIR<br>0.779

MoSca<br>0.824

Ours<br>0.862

Table 5

Reconstruction Guidance

Inference-time guidance on dense 4D scaffold rendering improves both fidelity and accuracy.

mPSNR ↑

19.3019.3519.4019.4519.5019.55<br>12345

G-Buffer Guidance Scale<br>mPSNR (↑)

mLPIPS ↓

0.2250.2300.2350.2400.245<br>12345

G-Buffer Guidance Scale<br>mLPIPS (↓)

No guidance<br>CFG<br>APG (x0, thr=8, η=0)<br>APG (vel, thr=32, η=0.3)

Table 4

Generative Methods: CAT4D-Comparable Setting

Metrics reported in the CAT4D-comparable evaluation setting.

mPSNR ↑

CAT4D<br>18.24

Ours<br>19.89

mSSIM ↑

CAT4D<br>0.666

Ours<br>0.715

mLPIPS ↓

CAT4D<br>0.227

Ours<br>0.197

-->

Acknowledgements

We thank Yang Zheng, Zhengfei Kuang, Lior Yariv, and Jianhao Zheng for fruitful discussions. We<br>also thank Yijia Weng and Jiahui Lei for providing evaluation details for MoSca, Kuan Heng Lin<br>for providing Vista4D evaluation details, and Michal Nazarczuk and Eduardo P&eacute;rez-Pellitero for<br>providing evaluation details for ViDAR. This website builds on the templates from<br>RealmDreamer and<br>CAT4D.

BibTeX

@misc{zhu2026worldfrommotion,<br>title = {World from Motion: Generative Dynamic Gaussian Reconstruction from Monocular Video},<br>author = {Liyuan Zhu and Shengyu Huang and Amrita Mazumdar and Tianye Li and Zan Gojcic and Gordon Wetzstein and Iro Armeni and Shalini De Mello and Alex Trevithick},<br>year = {2026}

from reconstruction table mpsnr mlpips dynamic

Related Articles