On the Natural Robustness of Vision-Language Models Against Visual Perception Attacks in Autonomous Driving

Published in arXiv preprint arXiv:2506.11472, 2025

Autonomous vehicles (AVs) rely on deep neural networks (DNNs) for critical tasks such as traffic sign recognition (TSR), automated lane centering (ALC), and vehicle detection (VD). However, these models are vulnerable to attacks that can cause misclassifications and compromise safety. Traditional defense mechanisms, including adversarial training, often degrade benign accuracy and fail to generalize against unseen attacks.

In this work, we introduce Vehicle Vision Language Models (V2LMs), fine-tuned vision-language models specialized for AV perception. Our findings demonstrate that V2LMs inherently exhibit superior robustness against unseen attacks without requiring adversarial training, maintaining significantly higher accuracy than conventional DNNs under adversarial conditions.

We evaluate two deployment strategies: Solo Mode, where individual V2LMs handle specific perception tasks, and Tandem Mode, where a single unified V2LM is fine-tuned for multiple tasks simultaneously.

Experimental results show that DNNs suffer performance drops of 33% to 46% under attacks, whereas V2LMs maintain adversarial accuracy with reductions of less than 8% on average. Tandem Mode further offers a memory-efficient alternative with comparable robustness.

We also explore using V2LMs as parallel components in AV perception stacks to enhance resilience. Our results suggest that V2LMs are a promising direction for secure and resilient AV systems.

👉 Read the full paper (PDF)