Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents Nov 23, 2024 Question Answering RAG
Code Code Available 0ReWind: Understanding Long Videos with Instructed Learnable Memory Nov 23, 2024 Large Language Model Question Answering
— Unverified 0Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy Nov 23, 2024 Instruction Following MME
— Unverified 0Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains Nov 22, 2024 Benchmarking Caption Generation
— Unverified 0mR^2AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA Nov 22, 2024 RAG Retrieval
— Unverified 0Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset Nov 21, 2024 Question Answering Visual Grounding
Code Code Available 0Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training Nov 20, 2024 Contrastive Learning image-classification
— Unverified 0Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving Nov 20, 2024 Autonomous Driving Multimodal Reasoning
— Unverified 0Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios Nov 20, 2024 Question Answering Visual Question Answering (VQA)
— Unverified 0Teaching VLMs to Localize Specific Objects from In-context Examples Nov 20, 2024 Object Object Tracking
Code Code Available 1LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement Nov 20, 2024 Autonomous Driving Computational Efficiency
— Unverified 0Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model Nov 19, 2024 Language Modeling Language Modelling
— Unverified 0Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts Nov 18, 2024 Benchmarking Multimodal Large Language Model
Code Code Available 0F^3OCUS -- Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics Nov 17, 2024 Diversity Federated Learning
— Unverified 0Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Nov 17, 2024 Hallucination In-Context Learning
Code Code Available 0Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry Nov 17, 2024 Question Answering Scene Understanding
— Unverified 0A Comprehensive Survey on Visual Question Answering Datasets and Algorithms Nov 17, 2024 Diagnostic Miscellaneous
— Unverified 0Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Nov 16, 2024 Mixture-of-Experts Optical Character Recognition (OCR)
Code Code Available 1Visual question answering based evaluation metrics for text-to-image generation Nov 15, 2024 Image Generation Image Manipulation
— Unverified 0SparrowVQE: Visual Question Explanation for Course Content Understanding Nov 12, 2024 Question Answering Visual Question Answering
Code Code Available 0Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding Nov 12, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models Nov 9, 2024 object-detection Object Detection
Code Code Available 1Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models Nov 8, 2024 Quantization Question Answering
— Unverified 0VQA^2: Visual Question Answering for Video Quality Assessment Nov 6, 2024 Question Answering Video Quality Assessment
Code Code Available 2NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA Nov 6, 2024 Federated Learning Language Modelling
— Unverified 0Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval Nov 6, 2024 Autonomous Navigation In-Context Learning
— Unverified 0Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset Nov 5, 2024 Benchmarking Language Modeling
Code Code Available 1MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning Nov 5, 2024 MME Question Answering
— Unverified 0Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent Nov 5, 2024 Benchmarking Hallucination
Code Code Available 3Multimodal Commonsense Knowledge Distillation for Visual Question Answering Nov 5, 2024 Knowledge Distillation Question Answering
— Unverified 0One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering Nov 4, 2024 Continual Learning Question Answering
— Unverified 0Goal-Oriented Semantic Communication for Wireless Visual Question Answering Nov 3, 2024 Edge-computing Question Answering
— Unverified 0A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning Nov 3, 2024 object-detection Object Detection
— Unverified 0Right this way: Can VLMs Guide Us to See More to Answer Questions? Nov 1, 2024 Question Answering Visual Question Answering
Code Code Available 0Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP Oct 31, 2024 Image Captioning Prompt Learning
— Unverified 0SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset Oct 30, 2024 Question Answering Visual Question Answering
— Unverified 0Are VLMs Really Blind Oct 29, 2024 Language Modeling Language Modelling
Code Code Available 0Few-Shot Multimodal Explanation for Visual Question Answering Oct 28, 2024 Explainable artificial intelligence Explainable Artificial Intelligence (XAI)
Code Code Available 0Improving Generalization in Visual Reasoning via Self-Ensemble Oct 28, 2024 Visual Question Answering (VQA) Visual Reasoning
— Unverified 0Attention Overlap Is Responsible for The Entity Missing Problem in Text-to-image Diffusion Models! Oct 28, 2024 Denoising Question Answering
— Unverified 0AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Oct 28, 2024 Benchmarking Question Answering
Code Code Available 0Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering Oct 28, 2024 Computational Efficiency Decision Making
— Unverified 0R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest Oct 27, 2024 Medical Visual Question Answering Multiple-choice
— Unverified 0GPT-4o System Card Oct 25, 2024 Multiple-choice Spatial Reasoning
— Unverified 0Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data Oct 24, 2024 Image Generation Question Generation
Code Code Available 7Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering Oct 23, 2024 Federated Learning Medical Visual Question Answering
— Unverified 0Progressive Compositionality In Text-to-Image Generative Models Oct 22, 2024 Attribute Contrastive Learning
Code Code Available 1Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective Oct 22, 2024 Question Answering Visual Question Answering
— Unverified 0Frontiers in Intelligent Colonoscopy Oct 22, 2024 Image Captioning
Code Code Available 2Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models Oct 21, 2024 Instruction Following object-detection
— Unverified 0