MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks May 18, 2025 Benchmarking Medical Visual Question Answering
Code Code Available 1Table-R1: Region-based Reinforcement Learning for Table Understanding May 18, 2025 Question Answering reinforcement-learning
— Unverified 0Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering May 17, 2025 Document Ranking Large Language Model
— Unverified 0AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation May 17, 2025 Question Answering
— Unverified 0BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering May 17, 2025 Multi-hop Question Answering Question Answering
— Unverified 0Recursive Question Understanding for Complex Question Answering over Heterogeneous Personal Data May 17, 2025 Language Modeling Language Modelling
— Unverified 0Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation May 17, 2025 Open-Domain Question Answering Question Answering
— Unverified 0CCNU at SemEval-2025 Task 3: Leveraging Internal and External Knowledge of Large Language Models for Multilingual Hallucination Annotation May 17, 2025 Hallucination Question Answering
— Unverified 0TinyRS-R1: Compact Multimodal Language Model for Remote Sensing May 17, 2025 Language Modeling Language Modelling
— Unverified 0LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation May 17, 2025 Benchmarking Question Answering
Code Code Available 1Time-R1: Towards Comprehensive Temporal Reasoning in LLMs May 16, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 0Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation May 16, 2025 Decoder Multi-hop Question Answering
Code Code Available 0THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering May 16, 2025 Language Modeling Language Modelling
— Unverified 0MatTools: Benchmarking Large Language Models for Materials Science Tools May 16, 2025 Benchmarking Question Answering
Code Code Available 1Ranked Voting based Self-Consistency of Large Language Models May 16, 2025 Multiple-choice Open-Ended Question Answering
Code Code Available 1TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs May 16, 2025 Benchmarking Question Answering
Code Code Available 0HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation May 16, 2025 Benchmarking Ethics
Code Code Available 0Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner May 16, 2025 Cross-Modal Retrieval Diagnostic
Code Code Available 2Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models May 16, 2025 Image Captioning Question Answering
Code Code Available 0Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models May 16, 2025 Question Answering Retrieval
— Unverified 0mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs May 16, 2025 Information Retrieval Knowledge Graphs
Code Code Available 1Scaling Reasoning can Improve Factuality in Large Language Models May 16, 2025 Knowledge Graphs Large Language Model
Code Code Available 0ALLM4ADD: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection May 16, 2025 Audio Deepfake Detection Audio Question Answering
— Unverified 0A Dataset for Spatiotemporal-Sensitive POI Question Answering May 16, 2025 Question Answering RAG
Code Code Available 0DO-RAG: A Domain-Specific QA Framework Using Knowledge Graph-Enhanced Retrieval-Augmented Generation May 15, 2025 graph construction Hallucination
Code Code Available 0Enhancing Multi-Image Question Answering via Submodular Subset Selection May 15, 2025 Question Answering Retrieval
— Unverified 0DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs May 15, 2025 Benchmarking Fairness
— Unverified 0What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs May 15, 2025 All Benchmarking
— Unverified 0CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA Capability May 15, 2025 Question Answering RAG
— Unverified 0Leveraging Graph Retrieval-Augmented Generation to Support Learners' Understanding of Knowledge Concepts in MOOCs May 15, 2025 Knowledge Graphs Question Answering
— Unverified 0End-to-End Vision Tokenizer Tuning May 15, 2025 Image Generation Question Answering
— Unverified 0Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence May 15, 2025 Computational Efficiency Continual Learning
Code Code Available 0The Impact of Large Language Models on Task Automation in Manufacturing Services May 14, 2025 Hallucination Question Answering
— Unverified 0Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM? May 14, 2025 Audio Question Answering Question Answering
— Unverified 0Variational Visual Question Answering May 14, 2025 Question Answering Visual Question Answering
— Unverified 0Focus, Merge, Rank: Improved Question Answering Based on Semi-structured Knowledge Bases May 14, 2025 Knowledge Graphs Multi-hop Question Answering
Code Code Available 0Atomic Consistency Preference Optimization for Long-Form Question Answering May 14, 2025 Form Long Form Question Answering
Code Code Available 0SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation May 14, 2025 Autonomous Driving Autonomous Navigation
— Unverified 0PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning May 14, 2025 Math Mathematical Problem-Solving
Code Code Available 0Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging May 14, 2025 Question Answering Retrieval
Code Code Available 0WixQA: A Multi-Dataset Benchmark for Enterprise Retrieval-Augmented Generation May 13, 2025 Question Answering RAG
— Unverified 0Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage May 13, 2025 Knowledge Distillation Large Language Model
— Unverified 0Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning? May 13, 2025 Chart Question Answering Fact Checking
Code Code Available 0Efficient and Reproducible Biomedical Question Answering using Retrieval Augmented Generation May 12, 2025 Question Answering RAG
Code Code Available 1Visually Interpretable Subtask Reasoning for Visual Question Answering May 12, 2025 Attribute Object Recognition
Code Code Available 0Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge May 12, 2025 Audio Question Answering Question Answering
— Unverified 0ReCDAP: Relation-Based Conditional Diffusion with Attention Pooling for Few-Shot Knowledge Graph Completion May 12, 2025 Information Retrieval Knowledge Graph Completion
Code Code Available 1Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning May 12, 2025 Language Modeling Language Modelling
Code Code Available 1Private LoRA Fine-tuning of Open-Source LLMs with Homomorphic Encryption May 12, 2025 GPU Knowledge Base Question Answering
— Unverified 0DocVXQA: Context-Aware Visual Explanations for Document Question Answering May 12, 2025 Question Answering
Code Code Available 1