SUGAR: Leveraging Contextual Confidence for Smarter Retrieval Jan 9, 2025 Question Answering RAG
— Unverified 0LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding Jan 9, 2025 Language Modeling Language Modelling
— Unverified 0VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models Jan 9, 2025 Benchmarking Mathematical Problem-Solving
Code Code Available 1Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning Jan 9, 2025 Benchmarking Question Answering
— Unverified 0Statistical Uncertainty Quantification for Aggregate Performance Metrics in Machine Learning Benchmarks Jan 8, 2025 Question Answering Uncertainty Quantification
— Unverified 0Feedback-Driven Vision-Language Alignment with Minimal Human Supervision Jan 8, 2025 Hallucination Question Answering
— Unverified 0Knowledge Retrieval Based on Generative AI Jan 8, 2025 Large Language Model Multiple-choice
— Unverified 0TimelineKGQA: A Comprehensive Question-Answer Pair Generator for Temporal Knowledge Graphs Jan 8, 2025 Knowledge Graphs Question Answering
Code Code Available 0Multilingual Open QA on the MIA Shared Task Jan 7, 2025 Cross-Lingual Information Retrieval Information Retrieval
— Unverified 0Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States Jan 7, 2025 Machine Translation Multiple-choice
— Unverified 0KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration Jan 7, 2025 Anomaly Detection Anomaly Segmentation
— Unverified 0Multimodal Multihop Source Retrieval for Web Question Answering Jan 7, 2025 Multi-hop Question Answering Question Answering
— Unverified 0Visual question answering: from early developments to recent advances -- a survey Jan 7, 2025 Descriptive Natural Language Understanding
— Unverified 0BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations Jan 6, 2025 Document AI document understanding
— Unverified 0FlippedRAG: Black-Box Opinion Manipulation Adversarial Attacks to Retrieval-Augmented Generation Models Jan 6, 2025 Adversarial Attack Hallucination
— Unverified 0ReDiT: Re‑evaluating large visual question answering model confidence by defining input scenario Difficulty and applying Temperature mapping Jan 6, 2025 Question Answering Visual Question Answering
Code Code Available 0Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Jan 6, 2025 Language Model Evaluation Language Modeling
Code Code Available 1Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild Jan 6, 2025 Hallucination Multimodal Reasoning
Code Code Available 0QuIM-RAG: Advancing Retrieval-Augmented Generation with Inverted Question Matching for Enhanced QA Performance Jan 6, 2025 Question Answering RAG
— Unverified 0Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? Jan 5, 2025 Image Captioning Image to text
Code Code Available 1Survey on Question Answering over Visually Rich Documents: Methods, Challenges, and Trends Jan 4, 2025 document understanding Question Answering
— Unverified 0Accounting for Focus Ambiguity in Visual Questions Jan 4, 2025 Question Answering Visual Question Answering
— Unverified 0A Survey on Large Language Models with some Insights on their Capabilities and Limitations Jan 3, 2025 Code Generation Question Answering
— Unverified 0The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters Jan 3, 2025 Question Answering
— Unverified 0Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models Jan 3, 2025 Binary Classification Face Anti-Spoofing
— Unverified 0HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding Jan 3, 2025 Question Answering Video Understanding
Code Code Available 0QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture Jan 3, 2025 Benchmarking Question Answering
— Unverified 0MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning Jan 3, 2025 Diagnostic General Knowledge
— Unverified 0(WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges Jan 3, 2025 Multiple-choice Question Answering
Code Code Available 0Predicting the Performance of Black-box LLMs through Self-Queries Jan 2, 2025 Question Answering
Code Code Available 1Citations and Trust in LLM Generated Responses Jan 2, 2025 Chatbot Question Answering
— Unverified 0CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering Jan 2, 2025 Multiple-choice Question Answering
— Unverified 0Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal Models Jan 2, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation Jan 1, 2025 Image Captioning Question Answering
— Unverified 0Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering Jan 1, 2025 Multiple-choice Question Answering
— Unverified 0Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation Jan 1, 2025 Language Modeling Language Modelling
— Unverified 0Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering Jan 1, 2025 Contrastive Learning Medical Visual Question Answering
— Unverified 0AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction Jan 1, 2025 GPU Question Answering
— Unverified 0SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation Jan 1, 2025 Benchmarking Diagnostic
— Unverified 0Font-Agent: Enhancing Font Understanding with Large Language Models Jan 1, 2025 Font Generation Question Answering
— Unverified 0HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding Jan 1, 2025 Question Answering Video Understanding
— Unverified 0Efficient Motion-Aware Video MLLM Jan 1, 2025 Question Answering Video Question Answering
— Unverified 0EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models Jan 1, 2025 MM-Vet Multimodal Reasoning
— Unverified 0AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning Jan 1, 2025 Audio-visual Question Answering Continual Learning
Code Code Available 0Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering Jan 1, 2025 Large Language Model Multimodal Large Language Model
Code Code Available 1JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems Jan 1, 2025 Question Answering Visual Question Answering
— Unverified 0Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression Jan 1, 2025 Question Answering
— Unverified 0MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output Jan 1, 2025 Instruction Following Language Modeling
Code Code Available 0Flexible Frame Selection for Efficient Video Reasoning Jan 1, 2025 Language Modeling Language Modelling
— Unverified 0Seeing More with Less: Human-like Representations in Vision Models Jan 1, 2025 object-detection Object Detection
— Unverified 0