Feedforward 4D reconstruction from just two unposed images
UFO-4D predicts dynamic 3D Gaussians and camera poses from a pair of unposed images in a single feedforward pass. This unified representation enables rendering of image, 3D geometry, and 3D motion at any intermediate view or timestamp using just one model. Via semi-supervised training through the differentiable 4D rasterizer, UFO-4D unlocks three major advantages:
Given predicted dynamic 3D Gaussians, we can rasterize images, depth, and motion at interpolated timestamps and views. Both depth and motion are defined in the canonical camera coordinate. In motion visualization, only moving objects are highlighted in non-white colors.
UFO-4D successfully interpolates and renders high-quality images, depth, and motion from predicted dynamic 3D Gaussians.
Even on scenarios with large motion, UFO-4D robustly outputs all estimates from unposed two images.
For extreme motion or minimal overlap, UFO-4D accurately estimates camera motion and static geometry, though dynamic object motion becomes challenging.
@inproceedings{Hur:2026:UFO,
title = {{UFO-4D}: Unposed Feedforward 4{D} Reconstruction from Two Images},
author = {Hur, Junhwa and Herrmann, Charles and Peng, Songyou and Henzler, Philipp and Ma, Zeyu and Zickler, Todd and Sun, Deqing},
booktitle = {ICLR},
year = {2026}
}