| Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis | Jan 26, 2025 | ArticlesHallucination | —Unverified | 0 | 0 |
| CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs | Nov 19, 2024 | HallucinationLanguage Modeling | —Unverified | 0 | 0 |
| Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning | Dec 1, 2021 | Logical ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning | Feb 19, 2025 | Autonomous DrivingBench2Drive | —Unverified | 0 | 0 |
| SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering | Dec 16, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Scene Graph Generation with Geometric Context | Nov 25, 2021 | Activity RecognitionGraph Generation | —Unverified | 0 | 0 |
| Visual Question Generation as Dual Task of Visual Question Answering | Sep 21, 2017 | Question AnsweringQuestion Generation | —Unverified | 0 | 0 |
| Scene Graph Reasoning for Visual Question Answering | Jul 2, 2020 | NavigateQuestion Answering | —Unverified | 0 | 0 |
| A Comprehensive Survey of Scene Graphs: Generation and Application | Mar 17, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations | Jun 21, 2025 | Question AnsweringScene Understanding | —Unverified | 0 | 0 |
| CapWAP: Image Captioning with a Purpose | Nov 1, 2020 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Scene Understanding Enabled Semantic Communication with Open Channel Coding | Jan 24, 2025 | Question AnsweringScene Understanding | —Unverified | 0 | 0 |
| CapWAP: Captioning with a Purpose | Nov 9, 2020 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering | Apr 4, 2023 | counterfactualMetric Learning | —Unverified | 0 | 0 |
| SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes | Aug 21, 2023 | AttributeQuestion Answering | —Unverified | 0 | 0 |
| CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns | Oct 2, 2020 | Image Captioningobject-detection | —Unverified | 0 | 0 |
| SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs | Aug 21, 2024 | Contrastive LearningLanguage Modeling | —Unverified | 0 | 0 |
| Second Place Solution of WSDM2023 Toloka Visual Question Answering Challenge | Jul 5, 2024 | Cross-Modal RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Visual Question: Predicting If a Crowd Will Agree on the Answer | Aug 29, 2016 | Question Answeringvalid | —Unverified | 0 | 0 |
| SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors | Mar 18, 2024 | HallucinationMotion Planning | —Unverified | 0 | 0 |
| Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework | Mar 11, 2025 | Conformal PredictionMultimodal Reasoning | —Unverified | 0 | 0 |
| Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding | May 22, 2025 | Causal InferenceHallucination | —Unverified | 0 | 0 |
| Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models | Nov 7, 2024 | Adversarial AttackImage Captioning | —Unverified | 0 | 0 |
| Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings | Dec 31, 2020 | Common Sense ReasoningKnowledge Graph Embeddings | —Unverified | 0 | 0 |
| CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making | Jun 15, 2025 | Answer GenerationDecision Making | —Unverified | 0 | 0 |
| "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models | Feb 17, 2025 | Object RecognitionQuestion Answering | —Unverified | 0 | 0 |
| SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering | Oct 1, 2019 | Embodied Question AnsweringQuestion Answering | —Unverified | 0 | 0 |
| Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images | Jul 11, 2024 | Question AnsweringSegmentation | —Unverified | 0 | 0 |
| Segmentation Guided Attention Networks for Visual Question Answering | Jul 1, 2017 | Common Sense ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval | Nov 6, 2024 | Autonomous NavigationIn-Context Learning | —Unverified | 0 | 0 |
| Selectively Answering Visual Questions | Jun 3, 2024 | AvgIn-Context Learning | —Unverified | 0 | 0 |
| Visual Question Reasoning on General Dependency Tree | Mar 31, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network | Sep 23, 2019 | Question AnsweringTriplet | —Unverified | 0 | 0 |
| SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering | Oct 3, 2023 | Graph Neural NetworkQuestion Answering | —Unverified | 0 | 0 |
| Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering | Jun 25, 2020 | DiversityQuestion Answering | —Unverified | 0 | 0 |
| Can you even tell left from right? Presenting a new challenge for VQA | Mar 15, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Can We Generate Visual Programs Without Prompting LLMs? | Dec 11, 2024 | Data AugmentationQuestion Answering | —Unverified | 0 | 0 |
| WeaQA: Weak Supervision via Captions for Visual Question Answering | Dec 4, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Reference Resolution using Attention Memory for Visual Dialog | Sep 23, 2017 | Parameter PredictionQuestion Answering | —Unverified | 0 | 0 |
| Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement | Apr 6, 2024 | Image-text Retrievalobject-detection | —Unverified | 0 | 0 |
| Semantic Aligned Multi-modal Transformer for Vision-LanguageUnderstanding: A Preliminary Study on Visual QA | Jun 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Relationship Detection using Scene Graphs: A Survey | May 16, 2020 | Graph GenerationImage Generation | —Unverified | 0 | 0 |
| Semantic-aware Modular Capsule Routing for Visual Question Answering | Jul 21, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Semantic Composition in Visually Grounded Language Models | May 15, 2023 | Image CaptioningInductive Bias | —Unverified | 0 | 0 |
| Semantic-enhanced Modality-asymmetric Retrieval for Online E-commerce Search | Jun 25, 2025 | Question AnsweringRetrieval | —Unverified | 0 | 0 |
| Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail | Aug 28, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable Sensors | Oct 26, 2024 | Question AnsweringTransfer Learning | —Unverified | 0 | 0 |
| Sentence Attention Blocks for Answer Grounding | Sep 20, 2023 | Question AnsweringSentence | —Unverified | 0 | 0 |
| ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering | Nov 18, 2015 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Can SAR improve RSVQA performance? | Aug 28, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |