SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge May 27, 2025 Benchmarking Multiple-choice
— Unverified 0Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration May 27, 2025 Multi-hop Question Answering Question Answering
Code Code Available 1GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation May 26, 2025 Question Answering Synthetic Data Generation
Code Code Available 4AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare May 26, 2025 Benchmarking Medical Diagnosis
Code Code Available 0MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents May 26, 2025 Benchmarking Minecraft
Code Code Available 1ExAnte: A Benchmark for Ex-Ante Inference in Large Language Models May 26, 2025 Prediction Question Answering
Code Code Available 0MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding May 26, 2025 Question Answering Visual Question Answering
Code Code Available 1Visualized Text-to-Image Retrieval May 26, 2025 Image Retrieval Question Answering
Code Code Available 1DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue May 26, 2025 Diagnostic Question Answering
Code Code Available 2GenKI: Enhancing Open-Domain Question Answering with Knowledge Integration and Controllable Generation in Large Language Models May 26, 2025 Open-Domain Question Answering Passage Retrieval
Code Code Available 0CulFiT: A Fine-grained Cultural-aware LLM Training Paradigm via Multilingual Critique Data Synthesis May 26, 2025 Diversity Open-Ended Question Answering
Code Code Available 0Interleaved Reasoning for Large Language Models via Reinforcement Learning May 26, 2025 Logical Reasoning Math
— Unverified 0MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering May 26, 2025 Continual Learning Question Answering
Code Code Available 0NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering May 26, 2025 Chunking Large Language Model
Code Code Available 1Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models May 26, 2025 image-classification Image Classification
Code Code Available 0SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback May 26, 2025 Prompt Learning Question Answering
— Unverified 0Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities May 26, 2025 Knowledge Graphs Natural Language Understanding
Code Code Available 2Self-Reflective Planning with Knowledge Graphs: Enhancing LLM Reasoning Reliability for Question Answering May 26, 2025 Knowledge Graphs Question Answering
— Unverified 0Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs May 26, 2025 Hallucination Question Answering
— Unverified 0Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking Insights May 26, 2025 Benchmarking Question Answering
Code Code Available 0It's High Time: A Survey of Temporal Information Retrieval and Question Answering May 26, 2025 Articles Information Retrieval
— Unverified 0Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat May 26, 2025 Benchmarking Question Answering
— Unverified 0WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models May 26, 2025 Multi-Label Classification MUlTI-LABEL-ClASSIFICATION
Code Code Available 0S2LPP: Small-to-Large Prompt Prediction across LLMs May 26, 2025 Natural Language Inference Prediction
— Unverified 0CP-Router: An Uncertainty-Aware Router Between LLM and LRM May 26, 2025 Conformal Prediction Logical Reasoning
— Unverified 0KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge Tracing May 26, 2025 Knowledge Tracing Multi-hop Question Answering
Code Code Available 1DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems May 26, 2025 Answer Generation Knowledge Graphs
— Unverified 0BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs May 26, 2025 Question Answering
Code Code Available 0ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs May 26, 2025 cross-modal alignment Emotion Recognition
— Unverified 0MASKSEARCH: A Universal Pre-Training Framework to Enhance Agentic Search Capability May 26, 2025 Multi-hop Question Answering Question Answering
Code Code Available 2VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use May 25, 2025 Multimodal Reasoning Question Answering
Code Code Available 2Hypercube-RAG: Hypercube-Based Retrieval-Augmented Generation for In-domain Scientific Question-Answering May 25, 2025 Question Answering RAG
Code Code Available 0ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment May 25, 2025 Code Generation Mathematical Reasoning
— Unverified 0Weaver: Interweaving SQL and LLM for Table Reasoning May 25, 2025 Question Answering Table-based Question Answering
— Unverified 0SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs May 25, 2025 Benchmarking Diversity
— Unverified 0InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts May 25, 2025 Chart Understanding Question Answering
Code Code Available 3Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering May 25, 2025 Anatomy Benchmarking
Code Code Available 1Medical Large Vision Language Models with Multi-Image Visual Ability May 25, 2025 Question Answering Visual Question Answering (VQA)
Code Code Available 0UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models May 25, 2025 Machine Translation Question Answering
— Unverified 0Self-Critique Guided Iterative Reasoning for Multi-hop Question Answering May 25, 2025 Multi-hop Question Answering Question Answering
Code Code Available 0GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance May 25, 2025 Caption Generation Question Answering
— Unverified 0SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards May 25, 2025 Image Captioning Multimodal Reasoning
Code Code Available 1SituatedThinker: Grounding LLM Reasoning with Real-World through Situated Thinking May 25, 2025 Mathematical Reasoning Multi-hop Question Answering
Code Code Available 0When Two LLMs Debate, Both Think They'll Win May 25, 2025 Question Answering
— Unverified 0The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems May 24, 2025 Answer Generation Question Answering
— Unverified 0Climate-Eval: A Comprehensive Benchmark for NLP Tasks Related to Climate Change May 24, 2025 News Classification Question Answering
— Unverified 0Multilingual Question Answering in Low-Resource Settings: A Dzongkha-English Benchmark for Foundation Models May 24, 2025 Question Answering
Code Code Available 0Benchmarking Poisoning Attacks against Retrieval-Augmented Generation May 24, 2025 Benchmarking Question Answering
— Unverified 0BRIT: Bidirectional Retrieval over Unified Image-Text Graph May 24, 2025 Image to text Question Answering
— Unverified 0Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning May 23, 2025 Decoder Image Captioning
Code Code Available 4