arXiv 2312.06742

Honeybee: Locality-enhanced Projector for Multimodal LLM

By Junbum Cha, Wooyoung Kang, et al.

Published 2023-12-11

Citation lineage

Review the prior work and downstream research connected to this paper.

In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in managing the number of…

View the original paper on arXiv