arXiv 2312.06742
Honeybee: Locality-enhanced Projector for Multimodal LLM
By Junbum Cha, Wooyoung Kang, et al.
Published 2023-12-11
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in managing the number of…