AGI-Elo: How Far Are We From Mastering A Task? May 19, 2025 Code Generation Image Classification
Code Code Available 1MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks May 18, 2025 Benchmarking Medical Visual Question Answering
Code Code Available 1LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation May 17, 2025 Benchmarking Question Answering
Code Code Available 1MatTools: Benchmarking Large Language Models for Materials Science Tools May 16, 2025 Benchmarking Question Answering
Code Code Available 1Ranked Voting based Self-Consistency of Large Language Models May 16, 2025 Multiple-choice Open-Ended Question Answering
Code Code Available 1mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs May 16, 2025 Information Retrieval Knowledge Graphs
Code Code Available 1Efficient and Reproducible Biomedical Question Answering using Retrieval Augmented Generation May 12, 2025 Question Answering RAG
Code Code Available 1ReCDAP: Relation-Based Conditional Diffusion with Attention Pooling for Few-Shot Knowledge Graph Completion May 12, 2025 Information Retrieval Knowledge Graph Completion
Code Code Available 1Kalman Filter Enhanced GRPO for Reinforcement Learning-Based Language Model Reasoning May 12, 2025 Language Modeling Language Modelling
Code Code Available 1DocVXQA: Context-Aware Visual Explanations for Document Question Answering May 12, 2025 Question Answering
Code Code Available 1BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning May 11, 2025 Question Answering
Code Code Available 1SmartPilot: A Multiagent CoPilot for Adaptive and Intelligent Manufacturing May 10, 2025 Decision Making Production Forecasting
Code Code Available 1MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks May 9, 2025 Diagnostic Instruction Following
Code Code Available 1IndicSQuAD: A Comprehensive Multilingual Question Answering Dataset for Indic Languages May 6, 2025 Question Answering
Code Code Available 1Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering May 5, 2025 Hallucination Question Answering
Code Code Available 1RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video May 4, 2025 Benchmarking Question Answering
Code Code Available 1UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation Apr 30, 2025 Diagnostic Large Language Model
Code Code Available 1ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification Apr 29, 2025 Diagnostic Question Answering
Code Code Available 1TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering Apr 28, 2025 Multi-hop Question Answering Question Answering
Code Code Available 1VideoMultiAgents: A Multi-Agent Framework for Video Question Answering Apr 25, 2025 Caption Generation EgoSchema
Code Code Available 1Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency Apr 24, 2025 Benchmarking Math
Code Code Available 1Survey of Video Diffusion Models: Foundations, Implementations, and Applications Apr 22, 2025 Computational Efficiency Denoising
Code Code Available 1Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations Apr 19, 2025 Language Modeling Language Modelling
Code Code Available 1Learning to Attribute with Attention Apr 18, 2025 Attribute Language Modeling
Code Code Available 1ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models Apr 14, 2025 Autonomous Driving Autonomous Vehicles
Code Code Available 1A Survey on Efficient Vision-Language Models Apr 13, 2025 Image Captioning Question Answering
Code Code Available 1LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs Apr 11, 2025 Benchmarking Image Generation
Code Code Available 1MRD-RAG: Enhancing Medical Diagnosis with Multi-Round Retrieval-Augmented Generation Apr 10, 2025 Diagnostic Medical Diagnosis
Code Code Available 1Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration Apr 7, 2025 Language Modeling Language Modelling
Code Code Available 1ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering Apr 7, 2025 Chart Question Answering Chart Understanding
Code Code Available 1SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding Apr 4, 2025 Language Modeling Language Modelling
Code Code Available 1Single-Pass Document Scanning for Question Answering Apr 4, 2025 Question Answering
Code Code Available 1STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection Apr 3, 2025 Instruction Following Language Modeling
Code Code Available 1GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning Apr 2, 2025 Decision Making Diagnostic
Code Code Available 1RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning Mar 29, 2025 Chart Question Answering Chart Understanding
Code Code Available 1EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos Mar 28, 2025 Benchmarking Question Answering
Code Code Available 1FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs Mar 27, 2025 Attribute Benchmarking
Code Code Available 1Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving Mar 27, 2025 Attribute Autonomous Driving
Code Code Available 1ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation Mar 27, 2025 Question Answering RAG
Code Code Available 1PAVE: Patching and Adapting Video Large Language Models Mar 25, 2025 Audio-visual Question Answering Multi-Task Learning
Code Code Available 1Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models Mar 25, 2025 Benchmarking Image Captioning
Code Code Available 1MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering Mar 21, 2025 Question Answering Time Series
Code Code Available 1Agentic Keyframe Search for Video Question Answering Mar 20, 2025 EgoSchema Question Answering
Code Code Available 1Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems Mar 19, 2025 Question Answering RAG
Code Code Available 1MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Mar 17, 2025 Articles Benchmarking
Code Code Available 1Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos Mar 17, 2025 Benchmarking Question Answering
Code Code Available 1NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models Mar 17, 2025 Question Answering Scene Understanding
Code Code Available 1How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game Mar 13, 2025 Multimodal Reasoning Question Answering
Code Code Available 1Question-Aware Gaussian Experts for Audio-Visual Question Answering Mar 6, 2025 Audio-visual Question Answering Audio-Visual Question Answering (AVQA)
Code Code Available 1Cross-modal Causal Relation Alignment for Video Question Grounding Mar 5, 2025 Contrastive Learning cross-modal alignment
Code Code Available 1