arXiv 2511.10647

Depth Anything 3: Recovering the Visual Space from Any Views

By Haotong Lin, Sili Chen, et al.

Published 2025-11-13

Citation lineage

Review the prior work and downstream research connected to this paper.

We present Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from an arbitrary number of visual inputs, with or without known camera poses. In pursuit of minimal modeling, DA3 yields two key insights: a single plain transformer (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization, and a singular depth-ray prediction target obviates the need for compl…

View the original paper on arXiv