arXiv 2312.06742
Honeybee: Locality-enhanced Projector for Multimodal LLM
By Junbum Cha, Wooyoung Kang, et al.
Published 2023-12-11
Citation lineage
Review the prior work and downstream research connected to this paper.
In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in managing the number of…