arXiv 2103.14030

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

By Ze Liu, Yutong Lin, et al.

Published 2021-03-25

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a…

View the original paper on arXiv