CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays May 23, 2025 Diagnostic Question Answering
Code Code Available 0PPT: A Process-based Preference Learning Framework for Self Improving Table Question Answering Models May 23, 2025 Code Generation Mathematical Reasoning
— Unverified 0FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain May 23, 2025 Question Answering RAG
— Unverified 0Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding May 23, 2025 Form Question Answering
— Unverified 0QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning May 23, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 4VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models May 23, 2025 Question Answering Visual Question Answering
Code Code Available 1DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding May 23, 2025 Language Modeling Language Modelling
Code Code Available 2MetaGen Blended RAG: Higher Accuracy for Domain-Specific Q&A Without Fine-Tuning May 23, 2025 Few-Shot Learning Question Answering
Code Code Available 1PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language May 23, 2025 Benchmarking Question Answering
— Unverified 0Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need? May 23, 2025 Medical Question Answering Quantization
— Unverified 0Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models May 23, 2025 Continual Learning Question Answering
— Unverified 0EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models May 22, 2025 Question Answering Specificity
— Unverified 0VoxRAG: A Step Toward Transcription-Free RAG Systems in Spoken Question Answering May 22, 2025 Question Answering RAG
— Unverified 0Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression May 22, 2025 Hallucination Image Description
Code Code Available 1CUB: Benchmarking Context Utilisation Techniques for Language Models May 22, 2025 Benchmarking Fact Checking
— Unverified 0O^2-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering May 22, 2025 Answer Generation Open-Ended Question Answering
Code Code Available 1Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA May 22, 2025 Multi-hop Question Answering Question Answering
— Unverified 0Tools in the Loop: Quantifying Uncertainty of LLM Question Answering Systems That Use Tools May 22, 2025 Information Retrieval Question Answering
— Unverified 0Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs May 22, 2025 Question Answering
— Unverified 0Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering May 22, 2025 Global Facts Language Modeling
Code Code Available 0Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering May 22, 2025 Benchmarking Evidence Selection
Code Code Available 1Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning May 22, 2025 Form Question Answering
Code Code Available 1Continually Self-Improving Language Models for Bariatric Surgery Question--Answering May 22, 2025 Large Language Model Misinformation
— Unverified 0CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering May 22, 2025 Computed Tomography (CT) Question Answering
— Unverified 0Collaboration among Multiple Large Language Models for Medical Question Answering May 22, 2025 Medical Question Answering Multiple-choice
— Unverified 0Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding May 22, 2025 Causal Inference Hallucination
— Unverified 0Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation May 22, 2025 Hallucination Image Captioning
— Unverified 0A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering May 22, 2025 counterfactual Medical Visual Question Answering
— Unverified 0UNCLE: Uncertainty Expressions in Long-Form Generation May 22, 2025 4k Form
— Unverified 0Zero-Shot Anomaly Detection in Battery Thermal Images Using Visual Question Answering with Prior Knowledge May 22, 2025 Anomaly Detection Question Answering
— Unverified 0Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports May 22, 2025 Answer Generation Question Answering
— Unverified 0SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding May 22, 2025 Motion Estimation Question Answering
Code Code Available 2Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification May 21, 2025 Data Augmentation Large Language Model
— Unverified 0RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language May 21, 2025 Question Answering
Code Code Available 0ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation May 21, 2025 Decision Making Language Modeling
Code Code Available 0BR-TaxQA-R: A Dataset for Question Answering with References for Brazilian Personal Income Tax Law, including case law May 21, 2025 Answer Generation Question Answering
— Unverified 0CRAFT: Training-Free Cascaded Retrieval for Tabular QA May 21, 2025 Natural Language Queries Natural Questions
— Unverified 0Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems May 21, 2025 Benchmarking Math
— Unverified 0Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model May 21, 2025 Language Modeling Language Modelling
Code Code Available 0Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack May 21, 2025 Multiple-choice Multiple Choice Question Answering (MCQA)
— Unverified 0Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering May 21, 2025 Benchmarking Language Modeling
Code Code Available 0HopWeaver: Synthesizing Authentic Multi-Hop Questions Across Text Corpora May 21, 2025 Multi-hop Question Answering Question Answering
Code Code Available 1UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking May 21, 2025 Benchmarking Claim Verification
Code Code Available 0KaFT: Knowledge-aware Fine-tuning for Boosting LLMs' Domain-specific Question-Answering Performance May 21, 2025 Hallucination Question Answering
— Unverified 0SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks May 21, 2025 image-classification Image Classification
Code Code Available 0The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation May 21, 2025 Answer Generation In-Context Learning
Code Code Available 1ChartCards: A Chart-Metadata Generation Framework for Multi-Task Chart Understanding May 21, 2025 Chart Question Answering Chart Understanding
Code Code Available 0Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs May 21, 2025 Benchmarking Question Answering
Code Code Available 0Set-LLM: A Permutation-Invariant LLM May 21, 2025 Multiple-choice Question Answering
— Unverified 0From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning May 21, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 1