arXiv 2312.06742

Honeybee: Locality-enhanced Projector for Multimodal LLM

By Junbum Cha, Wooyoung Kang, et al.

Published 2023-12-11

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in managing the number of…

View the original paper on arXiv