arXiv 2305.16934

On Evaluating Adversarial Robustness of Large Vision-Language Models

By Yunqing Zhao, Tianyu Pang, et al.

Published 2023-05-26

Citation lineage

Review the prior work and downstream research connected to this paper.

Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented performance in response generation, especially with visual inputs, enabling more creative and adaptable interaction than large language models such as ChatGPT. Nonetheless, multimodal generation exacerbates safety concerns, since adversaries may successfully evade the entire system by subtly manipulating the most vulnerable modality (e.g.,…

View the original paper on arXiv