arXiv 2305.16934

On Evaluating Adversarial Robustness of Large Vision-Language Models

By Yunqing Zhao, Tianyu Pang, et al.

Published 2023-05-26

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented performance in response generation, especially with visual inputs, enabling more creative and adaptable interaction than large language models such as ChatGPT. Nonetheless, multimodal generation exacerbates safety concerns, since adversaries may successfully evade the entire system by subtly manipulating the most vulnerable modality (e.g.,…

View the original paper on arXiv