From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents Jun 18, 2025 Language Modeling Language Modelling
— Unverified 0WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts Jun 18, 2025 document understanding Multiple-choice
— Unverified 0video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models Jun 18, 2025 Audio captioning Large Language Model
Code Code Available 2MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering Jun 18, 2025 Multimodal Reasoning Question Answering
— Unverified 0GenerationPrograms: Fine-grained Attribution with Executable Programs Jun 17, 2025 Document Summarization Long Form Question Answering
Code Code Available 0Adapting Lightweight Vision Language Models for Radiological Visual Question Answering Jun 17, 2025 Diagnostic Question Answering
Code Code Available 0Re-Initialization Token Learning for Tool-Augmented Large Language Models Jun 17, 2025 GSM8K Question Answering
Code Code Available 0Enhancing Omics Cohort Discovery for Research on Neurodegeneration through Ontology-Augmented Embedding Models Jun 16, 2025 Question Answering
Code Code Available 0SeqPE: Transformer with Sequential Position Encoding Jun 16, 2025 image-classification Image Classification
Code Code Available 1SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement Jun 16, 2025 document understanding Question Answering
Code Code Available 1CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making Jun 15, 2025 Answer Generation Decision Making
— Unverified 0MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval Jun 14, 2025 Instruction Following Multimodal Reasoning
Code Code Available 0Training-free LLM Merging for Multi-task Learning Jun 14, 2025 Multiple-choice Multi-Task Learning
Code Code Available 0AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making Jun 14, 2025 Decision Making Question Answering
— Unverified 0Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables Jun 13, 2025 Benchmarking Descriptive
— Unverified 0A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions Jun 13, 2025 Conformal Prediction Question Answering
— Unverified 0MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space Jun 13, 2025 Question Answering Visual Question Answering
— Unverified 0Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs Jun 13, 2025 Medical Question Answering MedQA
— Unverified 0CogStream: Context-guided Streaming Video Question Answering Jun 12, 2025 Question Answering Video Question Answering
— Unverified 0HalLoc: Token-level Localization of Hallucinations for Vision Language Models Jun 12, 2025 Hallucination Image Captioning
Code Code Available 0Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications Jun 12, 2025 Code Generation Question Answering
Code Code Available 0EQA-RM: A Generative Embodied Reward Model with Test-time Scaling Jun 12, 2025 Embodied Question Answering Question Answering
Code Code Available 0Think before You Simulate: Symbolic Reasoning to Orchestrate Neural Computation for Counterfactual Question Answering Jun 12, 2025 counterfactual Counterfactual Reasoning
Code Code Available 0Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors Jun 12, 2025 Question Answering Safety Alignment
Code Code Available 0SlotPi: Physics-informed Object-centric Reasoning Models Jun 12, 2025 Object Question Answering
Code Code Available 0MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models Jun 12, 2025 Image Segmentation Medical Diagnosis
— Unverified 0VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos Jun 12, 2025 Question Answering
— Unverified 0Can We Infer Confidential Properties of Training Data from LLMs? Jun 12, 2025 image-classification Image Classification
— Unverified 0Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering Jun 12, 2025 Answer Generation Question Answering
— Unverified 0Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs Jun 12, 2025 Multiple-choice Question Answering
— Unverified 0AC/DC: LLM-based Audio Comprehension via Dialogue Continuation Jun 12, 2025 AudioCaps Audio captioning
— Unverified 0TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning Jun 12, 2025 Answer Generation Chunking
Code Code Available 2Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection Jun 11, 2025 Medical Question Answering MedQA
Code Code Available 0ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Jun 11, 2025 Medical Question Answering Question Answering
Code Code Available 2CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation Jun 11, 2025 Common Sense Reasoning Question Answering
— Unverified 0Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering Jun 11, 2025 Graph Question Answering Knowledge Graphs
Code Code Available 0Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning Jun 11, 2025 In-Context Learning Question Answering
— Unverified 0OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment Jun 11, 2025 cross-modal alignment Question Answering
Code Code Available 0CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models Jun 11, 2025 counterfactual Descriptive
Code Code Available 2Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy Jun 11, 2025 Medical Visual Question Answering Question Answering
Code Code Available 0Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos Jun 11, 2025 Question Answering Visual Question Answering
Code Code Available 0V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Jun 11, 2025 Action Anticipation Large Language Model
Code Code Available 7Bridging the Gap Between Open-Source and Proprietary LLMs in Table QA Jun 11, 2025 Code Generation Language Modeling
Code Code Available 0ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering Jun 11, 2025 Chart Question Answering Image to text
— Unverified 0VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks Jun 10, 2025 Multiple-choice Open-Ended Question Answering
— Unverified 0An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models Jun 10, 2025 Action Generation Image Captioning
— Unverified 0FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Jun 10, 2025 Image-text Retrieval Question Answering
Code Code Available 2Improved LLM Agents for Financial Document Question Answering Jun 10, 2025 Question Answering
— Unverified 0PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly Jun 10, 2025 Question Answering Scene Understanding
— Unverified 0Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-k Jun 10, 2025 Open-Domain Question Answering Question Answering
— Unverified 0