LIVE: Learnable In-Context Vector for Visual Question Answering Jun 19, 2024 In-Context Learning Question Answering
Code Code Available 1Biomedical Visual Instruction Tuning with Clinician Preference Alignment Jun 19, 2024 Instruction Following Visual Question Answering (VQA)
Code Code Available 0Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies for Zero-shot Knowledge-based VQA Jun 18, 2024 Question Answering Visual Question Answering
Code Code Available 0MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model Jun 17, 2024 Language Modeling Language Modelling
Code Code Available 1FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture Jun 16, 2024 Diversity Multiple-choice
Code Code Available 1AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models Jun 16, 2024 Hallucination Hallucination Evaluation
Code Code Available 3Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model Jun 15, 2024 Question Answering Video Understanding
Code Code Available 0What is the Visual Cognition Gap between Humans and Multimodal LLMs? Jun 14, 2024 object-detection Object Detection
Code Code Available 0Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models Jun 14, 2024 Decoder Knowledge Graphs
— Unverified 0Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps Jun 14, 2024 Question Answering Visual Question Answering
Code Code Available 1Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns Jun 13, 2024 Autonomous Driving Question Answering
— Unverified 0VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks Jun 12, 2024 Image Generation Language Modeling
Code Code Available 5VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Jun 11, 2024 Multiple-choice Question Answering
Code Code Available 5CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark Jun 10, 2024 Diversity Question Answering
— Unverified 0Composition Vision-Language Understanding via Segment and Depth Anything Model Jun 7, 2024 Question Answering Visual Question Answering (VQA)
Code Code Available 0Understanding Information Storage and Transfer in Multi-modal Large Language Models Jun 6, 2024 Factual Visual Question Answering Model Editing
— Unverified 0Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following Jun 4, 2024 Question Answering Visual Question Answering
Code Code Available 0Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering Jun 4, 2024 Data Augmentation Machine Translation
— Unverified 0Dragonfly: Multi-Resolution Zoom-In Encoding Enhances Vision-Language Models Jun 3, 2024 Image Captioning Language Modelling
Code Code Available 2Selectively Answering Visual Questions Jun 3, 2024 Avg In-Context Learning
— Unverified 0Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering Jun 3, 2024 Diversity Question Answering
— Unverified 0TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy Jun 3, 2024 Language Modelling Question Answering
Code Code Available 2Ovis: Structural Embedding Alignment for Multimodal Large Language Model May 31, 2024 Language Modeling Multimodal Large Language Model
Code Code Available 5DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models May 31, 2024 cross-modal alignment Visual Localization
Code Code Available 2Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA May 30, 2024 Diagnostic Medical Diagnosis
Code Code Available 1VQA Training Sets are Self-play Environments for Generating Few-shot Pools May 30, 2024 Question Answering Visual Question Answering
— Unverified 0Instruction-Guided Visual Masking May 30, 2024 Instruction Following Visual Grounding
Code Code Available 1Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs May 29, 2024 Image Retrieval Question Answering
Code Code Available 1Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks May 29, 2024 Question Answering Visual Question Answering
— Unverified 0PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild May 28, 2024 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Privacy-Aware Visual Language Models May 27, 2024 Visual Question Answering (VQA)
— Unverified 0LM4LV: A Frozen Large Language Model for Low-level Vision Tasks May 24, 2024 Language Modeling Language Modelling
Code Code Available 2SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge May 23, 2024 Question Answering RAG
— Unverified 0PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery May 22, 2024 Question Answering Visual Question Answering
Code Code Available 1Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering May 21, 2024 Diversity Information Retrieval
Code Code Available 0MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering May 20, 2024 Benchmarking Question Answering
Code Code Available 2Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions May 18, 2024 Visual Question Answering (VQA)
— Unverified 0EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging May 18, 2024 Question Answering Visual Question Answering
— Unverified 0StackOverflowVQA: Stack Overflow Visual Question Answering Dataset May 17, 2024 Question Answering Sentence
— Unverified 0RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content May 14, 2024 Contrastive Learning Video Enhancement
— Unverified 0Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI May 12, 2024 Question Answering Visual Question Answering
— Unverified 0Federated Document Visual Question Answering: A Pilot Study May 10, 2024 Federated Learning Question Answering
Code Code Available 0Exploring the Capabilities of Large Multimodal Models on Dense Text May 9, 2024 Prompt Engineering Visual Question Answering (VQA)
Code Code Available 4CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts May 9, 2024 Image Captioning Instruction Following
Code Code Available 2Is the House Ready For Sleeptime? Generating and Evaluating Situational Queries for Embodied Question Answering May 8, 2024 2k Embodied Question Answering
— Unverified 0VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images May 6, 2024 Attribute Language Modeling
— Unverified 0Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance May 6, 2024 Exposure Correction Video Enhancement
Code Code Available 1Advancing Multimodal Medical Capabilities of Gemini May 6, 2024 Computed Tomography (CT) image-classification
— Unverified 0OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning May 2, 2024 Autonomous Driving counterfactual
Code Code Available 4Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis May 1, 2024 Image Captioning Question Answering
— Unverified 0