| Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis | Jan 26, 2025 | ArticlesHallucination | —Unverified | 0 | 0 |
| CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs | Nov 19, 2024 | HallucinationLanguage Modeling | —Unverified | 0 | 0 |
| Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning | Dec 1, 2021 | Logical ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning | Feb 19, 2025 | Autonomous DrivingBench2Drive | —Unverified | 0 | 0 |
| SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering | Dec 16, 2022 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Scene Graph Generation with Geometric Context | Nov 25, 2021 | Activity RecognitionGraph Generation | —Unverified | 0 | 0 |
| Visual Question Generation as Dual Task of Visual Question Answering | Sep 21, 2017 | Question AnsweringQuestion Generation | —Unverified | 0 | 0 |
| Scene Graph Reasoning for Visual Question Answering | Jul 2, 2020 | NavigateQuestion Answering | —Unverified | 0 | 0 |
| A Comprehensive Survey of Scene Graphs: Generation and Application | Mar 17, 2021 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations | Jun 21, 2025 | Question AnsweringScene Understanding | —Unverified | 0 | 0 |
| CapWAP: Image Captioning with a Purpose | Nov 1, 2020 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Scene Understanding Enabled Semantic Communication with Open Channel Coding | Jan 24, 2025 | Question AnsweringScene Understanding | —Unverified | 0 | 0 |
| CapWAP: Captioning with a Purpose | Nov 9, 2020 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering | Apr 4, 2023 | counterfactualMetric Learning | —Unverified | 0 | 0 |
| SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes | Aug 21, 2023 | AttributeQuestion Answering | —Unverified | 0 | 0 |
| CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns | Oct 2, 2020 | Image Captioningobject-detection | —Unverified | 0 | 0 |
| SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs | Aug 21, 2024 | Contrastive LearningLanguage Modeling | —Unverified | 0 | 0 |
| Second Place Solution of WSDM2023 Toloka Visual Question Answering Challenge | Jul 5, 2024 | Cross-Modal RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Visual Question: Predicting If a Crowd Will Agree on the Answer | Aug 29, 2016 | Question Answeringvalid | —Unverified | 0 | 0 |
| SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors | Mar 18, 2024 | HallucinationMotion Planning | —Unverified | 0 | 0 |
| Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework | Mar 11, 2025 | Conformal PredictionMultimodal Reasoning | —Unverified | 0 | 0 |
| Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding | May 22, 2025 | Causal InferenceHallucination | —Unverified | 0 | 0 |
| Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models | Nov 7, 2024 | Adversarial AttackImage Captioning | —Unverified | 0 | 0 |
| Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings | Dec 31, 2020 | Common Sense ReasoningKnowledge Graph Embeddings | —Unverified | 0 | 0 |
| CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making | Jun 15, 2025 | Answer GenerationDecision Making | —Unverified | 0 | 0 |