arXiv 2503.11651
VGGT: Visual Geometry Grounded Transformer
By Jianyuan Wang, Minghao Chen, et al.
Published 2025-03-14
Citation lineage
Review the prior work and downstream research connected to this paper.
We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views. This approach is a step forward in 3D computer vision, where models have typically been constrained to and specialized for single tasks. It is also simple and efficient, reconstructing images in under…