Large Scale Scene Text Verification with Guided Attention Apr 23, 2018 Question Answering Scene Text Detection
— Unverified 00 Latent Image and Video Resolution Prediction using Convolutional Neural Networks Oct 17, 2024 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 00 Latent Variable Models for Visual Question Answering Jan 16, 2021 Benchmarking Question Answering
— Unverified 00 LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement Nov 20, 2024 Autonomous Driving Computational Efficiency
— Unverified 00 LAVIS: A Library for Language-Vision Intelligence Sep 15, 2022 Benchmarking Image Captioning
— Unverified 00 LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering Jan 29, 2024 Language Modeling Language Modelling
— Unverified 00 LEAF-QA: Locate, Encode & Attend for Figure Question Answering Jul 30, 2019 Chart Question Answering Question Answering
— Unverified 00 Learning Answer Embeddings for Visual Question Answering Jun 10, 2018 Question Answering Transfer Learning
— Unverified 00 Learning by Asking Questions Dec 4, 2017 Question Answering Visual Question Answering
— Unverified 00 Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision Oct 24, 2022 cross-modal alignment Cross-Modal Retrieval
— Unverified 00 Learning Compositional Representation for Few-shot Visual Question Answering Feb 21, 2021 Attribute Question Answering
— Unverified 00 Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering Apr 16, 2016 General Classification Human-Object Interaction Detection
— Unverified 00 Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues Mar 1, 2021 Question Answering Visual Question Answering
— Unverified 00 Learning Rich Image Region Representation for Visual Question Answering Oct 29, 2019 Language Modeling Language Modelling
— Unverified 00 Learning Sparse Mixture of Experts for Visual Question Answering Sep 19, 2019 Mixture-of-Experts Question Answering
— Unverified 00 Learning to Answer Multilingual and Code-Mixed Questions Nov 14, 2022 AI Agent Question Answering
— Unverified 00 Learning to Answer Questions From Image Using Convolutional Neural Network Jun 1, 2015 General Classification Question Answering
— Unverified 00 Learning to Collocate Neural Modules for Image Captioning Apr 18, 2019 Decoder Image Captioning
— Unverified 00 Learning to Compose Diversified Prompts for Image Emotion Classification Jan 26, 2022 Classification Emotion Classification
— Unverified 00 Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering Sep 11, 2024 Question Answering Visual Question Answering
— Unverified 00 Learning to Disambiguate by Asking Discriminative Questions Aug 9, 2017 Benchmarking Image Captioning
— Unverified 00 Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios Nov 20, 2024 Question Answering Visual Question Answering (VQA)
— Unverified 00 Neural Reasoning, Fast and Slow, for Video Question Answering Jul 10, 2019 Natural Questions Question Answering
— Unverified 00 Learning to Recognize the Unseen Visual Predicates Sep 25, 2019 Question Answering Visual Question Answering
— Unverified 00 Learning to Select Question-Relevant Relations for Visual Question Answering Jun 1, 2021 Graph Attention Question Answering
— Unverified 00 Learning to Specialize with Knowledge Distillation for Visual Question Answering Dec 1, 2018 General Classification General Knowledge
— Unverified 00 Learning Visual Knowledge Memory Networks for Visual Question Answering Jun 13, 2018 Question Answering Visual Question Answering
— Unverified 00 Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision Apr 20, 2020 counterfactual image-classification
— Unverified 00 LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Mar 25, 2025 Autonomous Navigation Question Answering
— Unverified 00 Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model Jun 10, 2022 Question Answering Task 2
— Unverified 00 Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation Jul 18, 2023 Image Generation Question Answering
— Unverified 00 Leveraging Medical Visual Question Answering with Supporting Facts May 28, 2019 Diversity Medical Visual Question Answering
— Unverified 00 Leveraging Video Descriptions to Learn Video Question Answering Nov 12, 2016 Question Answering Video Question Answering
— Unverified 00 Leveraging Visual Question Answering for Image-Caption Ranking May 4, 2016 Image Retrieval Question Answering
— Unverified 00 Leveraging Visual Question Answering to Improve Text-to-Image Synthesis Oct 28, 2020 Auxiliary Learning Image Generation
— Unverified 00 Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Jun 8, 2025 Medical Report Generation Question Answering
— Unverified 00 LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation Jul 9, 2025 Question Answering Visual Question Answering
— Unverified 00 Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks Aug 18, 2020 Image Captioning Visual Question Answering (VQA)
— Unverified 00 Linguistically Driven Graph Capsule Network for Visual Question Reasoning Mar 23, 2020 Question Answering Visual Question Answering
— Unverified 00 Linguistically Routing Capsule Network for Out-of-Distribution Visual Question Answering Jan 1, 2021 Novel Concepts Question Answering
— Unverified 00 LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing Jun 1, 2023 Question Answering Visual Question Answering
— Unverified 00 LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering Nov 29, 2021 Diversity Question Answering
— Unverified 00 利用图像描述与知识图谱增强表示的视觉问答(Exploiting Image Captions and External Knowledge as Representation Enhancement for Visual Question Answering) Aug 1, 2021 Image Captioning Question Answering
— Unverified 00 LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound Oct 19, 2024 Instruction Following Knowledge Distillation
— Unverified 00 LLM4VG: Large Language Models Evaluation for Video Grounding Dec 21, 2023 Image Captioning Video Grounding
— Unverified 00 Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling Aug 20, 2021 Data Ablation Optical Character Recognition
— Unverified 00 Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs Apr 30, 2025 Hallucination Hallucination Evaluation
— Unverified 00 Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA Apr 4, 2023 Answer Generation Language Modelling
— Unverified 00 Logically Consistent Loss for Visual Question Answering Nov 19, 2020 Multi-Task Learning Question Answering
— Unverified 00 LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model Oct 3, 2024 image-classification Image Classification
— Unverified 00