ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding Jun 4, 2025 Negation Negation Detection
— Unverified 0Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model Jun 4, 2025 Language Modeling Language Modelling
— Unverified 0EgoVLM: Policy Optimization for Egocentric Video Understanding Jun 3, 2025 EgoSchema Question Answering
Code Code Available 0FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes Jun 3, 2025 Benchmarking Feature Engineering
Code Code Available 0A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems Jun 3, 2025 Question Answering
— Unverified 0Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation Jun 2, 2025 Multiple-choice Question Answering
— Unverified 0iQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering Jun 2, 2025 Graph Neural Network Knowledge Base Question Answering
— Unverified 0ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists Jun 2, 2025 Benchmarking Form
— Unverified 0Learning Sparsity for Effective and Efficient Music Performance Question Answering Jun 2, 2025 Audio-visual Question Answering Question Answering
— Unverified 0Parameter Efficient Fine Tuning Llama 3.1 for Answering Arabic Legal Questions: A Case Study on Jordanian Laws Jun 2, 2025 Language Modeling Language Modelling
Code Code Available 0Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering Jun 1, 2025 All MME
— Unverified 0anyECG-chat: A Generalist ECG-MLLM for Flexible ECG Input and Multi-Task Understanding Jun 1, 2025 Open-Ended Question Answering Question Answering
— Unverified 0Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models Jun 1, 2025 Chunking Multi-hop Question Answering
Code Code Available 0A Graph-Retrieval-Augmented Generation Framework Enhances Decision-Making in the Circular Economy Jun 1, 2025 Decision Making Multi-hop Question Answering
— Unverified 0Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks Jun 1, 2025 In-Context Learning Negation
Code Code Available 0MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility May 30, 2025 Decision Making Medical Diagnosis
— Unverified 0A Simple Linear Patch Revives Layer-Pruned Large Language Models May 30, 2025 Knowledge Distillation Question Answering
— Unverified 0ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases May 30, 2025 Medical Question Answering Multiple-choice
— Unverified 0Exploring the Impact of Occupational Personas on Domain-Specific QA May 30, 2025 Question Answering
— Unverified 0Drop Dropout on Single-Epoch Language Model Pretraining May 30, 2025 Language Modeling Language Modelling
Code Code Available 0Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty? May 30, 2025 Question Answering
Code Code Available 0Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck May 30, 2025 Question Answering Visual Question Answering
— Unverified 0Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs May 30, 2025 Fact Checking Hallucination
— Unverified 0Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering May 30, 2025 Language Modeling Language Modelling
— Unverified 0Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models May 30, 2025 Image Captioning Question Answering
— Unverified 0Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning May 30, 2025 Question Answering Reinforcement Learning (RL)
— Unverified 0LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews May 30, 2025 Binary Classification Question Answering
Code Code Available 0LaMP-QA: A Benchmark for Personalized Long-form Question Answering May 30, 2025 Answer Generation Form
— Unverified 0VUDG: A Dataset for Video Understanding Domain Generalization May 30, 2025 Domain Generalization Multiple-choice
— Unverified 0Differential Information: An Information-Theoretic Perspective on Preference Optimization May 29, 2025 Inductive Bias Instruction Following
— Unverified 0Spoken question answering for visual queries May 29, 2025 Question Answering Visual Question Answering (VQA)
— Unverified 0ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering May 29, 2025 Chart Question Answering Chart Understanding
— Unverified 0MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering May 29, 2025 Medical Question Answering Question Answering
— Unverified 0Synthetic Document Question Answering in Hungarian May 29, 2025 Optical Character Recognition (OCR) Question Answering
Code Code Available 0TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine May 29, 2025 Diagnostic Multiple-choice
— Unverified 0Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models May 29, 2025 Question Answering Reinforcement Learning (RL)
— Unverified 0QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining May 29, 2025 Question Answering Representation Learning
Code Code Available 0mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation May 29, 2025 Question Answering RAG
— Unverified 0Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking May 29, 2025 Benchmarking Graph Question Answering
— Unverified 0Multi-Sourced Compositional Generalization in Visual Question Answering May 29, 2025 Question Answering Visual Question Answering
Code Code Available 0Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability May 29, 2025 Math Mathematical Reasoning
— Unverified 0Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs May 29, 2025 Dimensionality Reduction Hallucination
— Unverified 0From Chat Logs to Collective Insights: Aggregative Question Answering May 29, 2025 Chatbot Question Answering
— Unverified 0VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos May 29, 2025 Question Answering Video Generation
Code Code Available 0Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation May 29, 2025 Form Hallucination
— Unverified 0Climate Finance Bench May 28, 2025 Logical Reasoning Quantization
Code Code Available 0Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs May 28, 2025 Question Answering
— Unverified 0Improving QA Efficiency with DistilBERT: Fine-Tuning and Inference on mobile Intel CPUs May 28, 2025 Computational Efficiency CPU
— Unverified 0Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems May 28, 2025 Large Language Model Question Answering
— Unverified 0StressTest: Can YOUR Speech LM Handle the Stress? May 28, 2025 Question Answering Sentence
— Unverified 0