arXiv 2310.15200
Open-Set Image Tagging with Multi-Grained Text Supervision
By Xinyu Huang, Yi-Jie Huang, et al.
Published 2023-10-23
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
In this paper, we introduce the Recognize Anything Plus Model (RAM++), an open-set image tagging model effectively leveraging multi-grained text supervision. Previous approaches (e.g., CLIP) primarily utilize global text supervision paired with images, leading to sub-optimal performance in recognizing multiple individual semantic tags. In contrast, RAM++ seamlessly integrates individual tag supervision with global t…