| Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings | Dec 31, 2020 | Common Sense ReasoningKnowledge Graph Embeddings | —Unverified | 0 |
| "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models | Feb 17, 2025 | Object RecognitionQuestion Answering | —Unverified | 0 |
| SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering | Oct 1, 2019 | Embodied Question AnsweringQuestion Answering | —Unverified | 0 |
| Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images | Jul 11, 2024 | Question AnsweringSegmentation | —Unverified | 0 |
| Segmentation Guided Attention Networks for Visual Question Answering | Jul 1, 2017 | Common Sense ReasoningQuestion Answering | —Unverified | 0 |
| Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval | Nov 6, 2024 | Autonomous NavigationIn-Context Learning | —Unverified | 0 |
| Selectively Answering Visual Questions | Jun 3, 2024 | AvgIn-Context Learning | —Unverified | 0 |
| SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering | Oct 3, 2023 | Graph Neural NetworkQuestion Answering | —Unverified | 0 |
| Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering | Jun 25, 2020 | DiversityQuestion Answering | —Unverified | 0 |
| WeaQA: Weak Supervision via Captions for Visual Question Answering | Dec 4, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement | Apr 6, 2024 | Image-text Retrievalobject-detection | —Unverified | 0 |
| Semantic Aligned Multi-modal Transformer for Vision-LanguageUnderstanding: A Preliminary Study on Visual QA | Jun 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Semantic-aware Modular Capsule Routing for Visual Question Answering | Jul 21, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Semantic Composition in Visually Grounded Language Models | May 15, 2023 | Image CaptioningInductive Bias | —Unverified | 0 |
| Semantic-enhanced Modality-asymmetric Retrieval for Online E-commerce Search | Jun 25, 2025 | Question AnsweringRetrieval | —Unverified | 0 |
| Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable Sensors | Oct 26, 2024 | Question AnsweringTransfer Learning | —Unverified | 0 |
| Sentence Attention Blocks for Answer Grounding | Sep 20, 2023 | Question AnsweringSentence | —Unverified | 0 |
| Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering | Jan 1, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Is the House Ready For Sleeptime? Generating and Evaluating Situational Queries for Embodied Question Answering | May 8, 2024 | 2kEmbodied Question Answering | —Unverified | 0 |
| Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures | May 10, 2022 | AutoMLBIG-bench Machine Learning | —Unverified | 0 |
| SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering | Jun 14, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention | May 15, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| SILC: Improving Vision Language Pretraining with Self-Distillation | Oct 20, 2023 | ClassificationContrastive Learning | —Unverified | 0 |
| Silkie: Preference Distillation for Large Visual Language Models | Dec 17, 2023 | HallucinationMME | —Unverified | 0 |
| Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps | Dec 9, 2020 | DecoderImage Captioning | —Unverified | 0 |