arXiv 2511.10647

Depth Anything 3: Recovering the Visual Space from Any Views

By Haotong Lin, Sili Chen, et al.

Published 2025-11-13

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We present Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from an arbitrary number of visual inputs, with or without known camera poses. In pursuit of minimal modeling, DA3 yields two key insights: a single plain transformer (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization, and a singular depth-ray prediction target obviates the need for compl…

View the original paper on arXiv