| A Vision Centric Remote Sensing Benchmark | Mar 20, 2025 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| UMIT: Unifying Medical Imaging Tasks via Vision-Language Models | Mar 20, 2025 | DiagnosticMedical Image Analysis | CodeCode Available | 0 |
| UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation | Mar 19, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models | Mar 19, 2025 | MM-VetMultimodal Reasoning | —Unverified | 0 |
| GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback | Mar 19, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| TruthLens:A Training-Free Paradigm for DeepFake Detection | Mar 19, 2025 | Binary ClassificationDeepFake Detection | —Unverified | 0 |
| Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding | Mar 18, 2025 | document understandingQuestion Answering | CodeCode Available | 0 |
| Where do Large Vision-Language Models Look at when Answering Questions? | Mar 18, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 2 |
| NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models | Mar 17, 2025 | Question AnsweringScene Understanding | CodeCode Available | 1 |
| MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | Mar 17, 2025 | ArticlesBenchmarking | CodeCode Available | 1 |
| Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference | Mar 17, 2025 | Feature CompressionImage Compression | —Unverified | 0 |
| From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration | Mar 17, 2025 | DenoisingQuestion Answering | —Unverified | 0 |
| GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing | Mar 16, 2025 | Change DetectionImage Captioning | —Unverified | 0 |
| PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models | Mar 16, 2025 | Machine UnlearningPrivacy Preserving | —Unverified | 0 |
| DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models | Mar 14, 2025 | Autonomous DrivingComputational Efficiency | —Unverified | 0 |
| T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation | Mar 14, 2025 | AttributeQuestion Answering | CodeCode Available | 0 |
| How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game | Mar 13, 2025 | Multimodal ReasoningQuestion Answering | CodeCode Available | 1 |
| DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding | Mar 13, 2025 | 4kAutonomous Driving | CodeCode Available | 2 |
| On the Limitations of Vision-Language Models in Understanding Image Transforms | Mar 12, 2025 | Question AnsweringVideo Generation | —Unverified | 0 |
| SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery | Mar 12, 2025 | Activity RecognitionAnatomy | —Unverified | 0 |
| SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment | Mar 12, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 3 |
| Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework | Mar 11, 2025 | Conformal PredictionMultimodal Reasoning | —Unverified | 0 |
| From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics | Mar 10, 2025 | MathQuestion Answering | —Unverified | 0 |
| Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru | Mar 10, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 |
| TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems | Mar 9, 2025 | Multimodal Sentiment AnalysisQuestion Answering | —Unverified | 0 |