Task Progressive Curriculum Learning for Robust Visual Question Answering Nov 26, 2024 Data Augmentation Ensemble Learning
— Unverified 0Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering Nov 25, 2024 Question Answering Visual Question Answering
Code Code Available 2GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis Nov 25, 2024 Medical Visual Question Answering Multiple-choice
— Unverified 0VideoOrion: Tokenizing Object Dynamics in Videos Nov 25, 2024 Language Modeling Language Modelling
— Unverified 0Context Awareness Gate For Retrieval Augmented Generation Nov 25, 2024 Open-Domain Question Answering Question Answering
Code Code Available 1AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning Nov 25, 2024 Hallucination Question Answering
Code Code Available 1Text-Guided Coarse-to-Fine Fusion Network for Robust Remote Sensing Visual Question Answering Nov 24, 2024 Question Answering Relational Reasoning
— Unverified 0Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents Nov 23, 2024 Question Answering RAG
Code Code Available 0Seed-Free Synthetic Data Generation Framework for Instruction-Tuning LLMs: A Case Study in Thai Nov 23, 2024 Diversity Question Answering
Code Code Available 1AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset Nov 23, 2024 Language Modeling Language Modelling
— Unverified 0FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity Nov 23, 2024 Attribute Cross-Modal Retrieval
— Unverified 0freePruner: A Training-free Approach for Large Multimodal Model Acceleration Nov 23, 2024 Quantization Question Answering
— Unverified 0ReWind: Understanding Long Videos with Instructed Learnable Memory Nov 23, 2024 Large Language Model Question Answering
— Unverified 0KBAlign: Efficient Self Adaptation on Specific Knowledge Bases Nov 22, 2024 Question Answering RAG
Code Code Available 0VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection Nov 22, 2024 Question Answering Video Question Answering
Code Code Available 2GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI Nov 21, 2024 Decision Making Language Modeling
Code Code Available 2Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset Nov 21, 2024 Question Answering Visual Grounding
Code Code Available 0Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective Nov 21, 2024 Knowledge Graphs Question Answering
— Unverified 0FastRAG: Retrieval Augmented Generation for Semi-structured Data Nov 21, 2024 Management Question Answering
— Unverified 0Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training Nov 20, 2024 Contrastive Learning image-classification
— Unverified 0Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU Nov 20, 2024 Question Answering RAG
— Unverified 0Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios Nov 20, 2024 Question Answering Visual Question Answering (VQA)
— Unverified 0Teaching VLMs to Localize Specific Objects from In-context Examples Nov 20, 2024 Object Object Tracking
Code Code Available 1LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement Nov 20, 2024 Autonomous Driving Computational Efficiency
— Unverified 0Evaluating LLMs Capabilities Towards Understanding Social Dynamics Nov 20, 2024 Prompt Engineering Question Answering
— Unverified 0Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model Nov 19, 2024 Language Modeling Language Modelling
— Unverified 0CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs Nov 19, 2024 Hallucination Language Modeling
— Unverified 0AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Nov 19, 2024 GPU Question Answering
— Unverified 0Neon: News Entity-Interaction Extraction for Enhanced Question Answering Nov 19, 2024 Articles Open Information Extraction
— Unverified 0DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding Nov 19, 2024 Question Answering Video Understanding
— Unverified 0A Survey of Medical Vision-and-Language Applications and Their Techniques Nov 19, 2024 Decision Making Diagnostic
Code Code Available 1Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering Nov 19, 2024 Fact Checking Open-Domain Question Answering
— Unverified 0Mitigating Knowledge Conflicts in Language Model-Driven Question Answering Nov 18, 2024 Document Summarization Hallucination
— Unverified 0Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts Nov 18, 2024 Benchmarking Multimodal Large Language Model
Code Code Available 0MC-LLaVA: Multi-Concept Personalized Vision-Language Model Nov 18, 2024 Language Modeling Language Modelling
Code Code Available 2ForPKG: A Framework for Constructing Forestry Policy Knowledge Graph and Application Analysis Nov 17, 2024 graph construction Knowledge Graphs
Code Code Available 0Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Nov 17, 2024 Hallucination In-Context Learning
Code Code Available 0Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry Nov 17, 2024 Question Answering Scene Understanding
— Unverified 0Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning Nov 17, 2024 Image Captioning Language Modeling
Code Code Available 0A Comprehensive Survey on Visual Question Answering Datasets and Algorithms Nov 17, 2024 Diagnostic Miscellaneous
— Unverified 0BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation Nov 17, 2024 Action Recognition backdoor defense
Code Code Available 1LLaSA: Large Language and Structured Data Assistant Nov 16, 2024 Hypergraph representations Question Answering
— Unverified 0Large Vision-Language Models for Remote Sensing Visual Question Answering Nov 16, 2024 Language Modeling Language Modelling
— Unverified 0Everything is a Video: Unifying Modalities through Next-Frame Prediction Nov 15, 2024 Caption Generation Cross-Modal Retrieval
— Unverified 0Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity Nov 15, 2024 Contrastive Learning Hallucination
— Unverified 0SlimLM: An Efficient Small Language Model for On-Device Document Assistance Nov 15, 2024 Language Modeling Language Modelling
— Unverified 0LLaVA-CoT: Let Vision Language Models Reason Step-by-Step Nov 15, 2024 Logical Reasoning Multimodal Reasoning
Code Code Available 7Visual question answering based evaluation metrics for text-to-image generation Nov 15, 2024 Image Generation Image Manipulation
— Unverified 0AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference Nov 15, 2024 Quantization Question Answering
— Unverified 0A Benchmark for Long-Form Medical Question Answering Nov 14, 2024 Answer Generation Form
Code Code Available 0