| Siamese Tracking with Lingual Object Constraints | Nov 23, 2020 | ObjectObject Tracking | CodeCode Available | 0 |
| World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering | Sep 30, 2024 | Optical Character Recognition (OCR)Question Answering | CodeCode Available | 0 |
| VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation | Aug 15, 2017 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization | Dec 21, 2024 | Image CaptioningMultimodal Reasoning | CodeCode Available | 0 |
| Sim2Real Transfer for Vision-Based Grasp Verification | May 5, 2025 | Objectobject-detection | CodeCode Available | 0 |
| Hallucination Benchmark in Medical Visual Question Answering | Jan 11, 2024 | HallucinationMedical Visual Question Answering | CodeCode Available | 0 |
| Simple Baseline for Visual Question Answering | Dec 7, 2015 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 |
| HalLoc: Token-level Localization of Hallucinations for Vision Language Models | Jun 12, 2025 | HallucinationImage Captioning | CodeCode Available | 0 |
| Understanding Attention for Vision-and-Language Tasks | Aug 17, 2022 | Image GenerationImage Retrieval | CodeCode Available | 0 |
| Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding | Apr 20, 2025 | Autonomous DrivingImage Captioning | CodeCode Available | 0 |
| A Question-Centric Model for Visual Question Answering in Medical Imaging | Mar 2, 2020 | Medical Image AnalysisQuestion Answering | CodeCode Available | 0 |
| Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering | Mar 14, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains | Jun 9, 2025 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| Applying recent advances in Visual Question Answering to Record Linkage | Jul 12, 2020 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering | Nov 17, 2024 | HallucinationIn-Context Learning | CodeCode Available | 0 |
| Single-Stream Multi-Level Alignment for Vision-Language Pretraining | Mar 27, 2022 | Image-text RetrievalQuestion Answering | CodeCode Available | 0 |
| VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning | Mar 5, 2023 | Answer GenerationEntity Alignment | CodeCode Available | 0 |
| Hadamard Product for Low-rank Bilinear Pooling | Oct 14, 2016 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 0 |
| Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types | Sep 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Grounding Answers for Visual Questions Asked by Visually Impaired People | Feb 4, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Grad-CAM: Why did you say that? | Nov 22, 2016 | Image CaptioningVisual Question Answering | CodeCode Available | 0 |
| Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model | Jan 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| SlotPi: Physics-informed Object-centric Reasoning Models | Jun 12, 2025 | ObjectQuestion Answering | CodeCode Available | 0 |
| Understanding the World's Museums through Vision-Language Reasoning | Dec 2, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets | Oct 12, 2024 | Knowledge DistillationQuestion Answering | CodeCode Available | 0 |