Dynamic Adversarial Reinforcement Learning for Robust Multimodal Large Language Models

2026-03-04Unverified0· sign in to hype

Yicheng Bao, Xuhong Wang, Qiaosheng Zhang, Chaochao Lu, Xia Hu, Xin Tan

Unverified — Be the first to reproduce this paper.

Abstract

Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex scenes. This weakness stems from a reliance on finite training datasets, which are prohibitively expensive to scale and impose a ceiling on model robustness. We introduce AOT-SFT, a large-scale adversarial dataset for bootstrapping MLLM robustness. Building on this, we propose AOT (Adversarial Opponent Training), a self-play framework that forges MLLM robustness by creating its own training data. Our method orchestrates a co-evolution between an image-editing Attacker and a Defender MLLM, where the Attacker generates a diverse and dynamic curriculum of image manipulations, forcing the Defender to adapt and improve. Extensive experiments demonstrate that AOT enhances the Defender's perceptual robustness and reduces hallucinations, establishing a scalable paradigm for training more reliable MLLMs.

Dynamic Adversarial Reinforcement Learning for Robust Multimodal Large Language Models

Abstract

Reproductions