VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models May 28, 2025 Decision Making Question Answering
Code Code Available 0Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems May 28, 2025 Large Language Model Question Answering
— Unverified 0ER-REASON: A Benchmark Dataset for LLM-Based Clinical Reasoning in the Emergency Room May 28, 2025 Medical Question Answering Question Answering
— Unverified 0NegVQA: Can Vision Language Models Understand Negation? May 28, 2025 Negation Question Answering
— Unverified 03DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model May 28, 2025 Language Modeling Language Modelling
— Unverified 0Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data May 28, 2025 Machine Translation Paraphrase Generation
Code Code Available 0StressTest: Can YOUR Speech LM Handle the Stress? May 28, 2025 Question Answering Sentence
— Unverified 0DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding May 27, 2025 Benchmarking Change Detection
— Unverified 0FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering May 27, 2025 Benchmarking Question Answering
Code Code Available 0SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge May 27, 2025 Benchmarking Multiple-choice
— Unverified 0Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs May 27, 2025 Audio-visual Question Answering Question Answering
Code Code Available 0Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making May 27, 2025 Decision Making Diagnostic
— Unverified 0DriveRX: A Vision-Language Reasoning Model for Cross-Task Autonomous Driving May 27, 2025 Autonomous Driving Decision Making
— Unverified 0Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models May 27, 2025 Question Answering Visual Reasoning
— Unverified 0Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective May 27, 2025 Language Modeling Language Modelling
— Unverified 0Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering May 26, 2025 Knowledge Graphs Question Answering
— Unverified 0SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback May 26, 2025 Prompt Learning Question Answering
— Unverified 0WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models May 26, 2025 Multi-Label Classification MUlTI-LABEL-ClASSIFICATION
Code Code Available 0CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis May 26, 2025 Diversity Open-Ended Question Answering
Code Code Available 0GenKI: Enhancing Open-Domain Question Answering with Knowledge Integration and Controllable Generation in Large Language Models May 26, 2025 Open-Domain Question Answering Passage Retrieval
Code Code Available 0ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs May 26, 2025 cross-modal alignment Emotion Recognition
— Unverified 0Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs May 26, 2025 Hallucination Question Answering
— Unverified 0BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs May 26, 2025 Question Answering
Code Code Available 0MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering May 26, 2025 Continual Learning Question Answering
Code Code Available 0Interleaved Reasoning for Large Language Models via Reinforcement Learning May 26, 2025 Logical Reasoning Math
— Unverified 0Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat May 26, 2025 Benchmarking Question Answering
— Unverified 0ExAnte: A Benchmark for Ex-Ante Inference in Large Language Models May 26, 2025 Prediction Question Answering
Code Code Available 0Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models May 26, 2025 image-classification Image Classification
Code Code Available 0Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking Insights May 26, 2025 Benchmarking Question Answering
Code Code Available 0It's High Time: A Survey of Temporal Information Retrieval and Question Answering May 26, 2025 Articles Information Retrieval
— Unverified 0CP-Router: An Uncertainty-Aware Router Between LLM and LRM May 26, 2025 Conformal Prediction Logical Reasoning
— Unverified 0S2LPP: Small-to-Large Prompt Prediction across LLMs May 26, 2025 Natural Language Inference Prediction
— Unverified 0DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems May 26, 2025 Answer Generation Knowledge Graphs
— Unverified 0AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare May 26, 2025 Benchmarking Medical Diagnosis
Code Code Available 0SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs May 25, 2025 Benchmarking Diversity
— Unverified 0ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment May 25, 2025 Code Generation Mathematical Reasoning
— Unverified 0Self-Critique Guided Iterative Reasoning for Multi-hop Question Answering May 25, 2025 Multi-hop Question Answering Question Answering
Code Code Available 0SituatedThinker: Grounding LLM Reasoning with Real-World through Situated Thinking May 25, 2025 Mathematical Reasoning Multi-hop Question Answering
Code Code Available 0Weaver: Interweaving SQL and LLM for Table Reasoning May 25, 2025 Question Answering Table-based Question Answering
— Unverified 0Medical Large Vision Language Models with Multi-Image Visual Ability May 25, 2025 Question Answering Visual Question Answering (VQA)
Code Code Available 0When Two LLMs Debate, Both Think They'll Win May 25, 2025 Question Answering
— Unverified 0UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models May 25, 2025 Machine Translation Question Answering
— Unverified 0GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance May 25, 2025 Caption Generation Question Answering
— Unverified 0Hypercube-RAG: Hypercube-Based Retrieval-Augmented Generation for In-domain Scientific Question-Answering May 25, 2025 Question Answering RAG
Code Code Available 0Climate-Eval: A Comprehensive Benchmark for NLP Tasks Related to Climate Change May 24, 2025 News Classification Question Answering
— Unverified 0BRIT: Bidirectional Retrieval over Unified Image-Text Graph May 24, 2025 Image to text Question Answering
— Unverified 0Benchmarking Poisoning Attacks against Retrieval-Augmented Generation May 24, 2025 Benchmarking Question Answering
— Unverified 0The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems May 24, 2025 Answer Generation Question Answering
— Unverified 0Multilingual Question Answering in Low-Resource Settings: A Dzongkha-English Benchmark for Foundation Models May 24, 2025 Question Answering
Code Code Available 0Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding May 23, 2025 Form Question Answering
— Unverified 0