BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Oct 11, 2018 Citation Intent Classification Common Sense Reasoning
Code Code Available 3Attention Is All You Need Jun 12, 2017 Abstractive Text Summarization All
Code Code Available 3LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Jun 27, 2025 Question Answering Video Question Answering
Code Code Available 2video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models Jun 18, 2025 Audio captioning Large Language Model
Code Code Available 2TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning Jun 12, 2025 Answer Generation Chunking
Code Code Available 2CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models Jun 11, 2025 counterfactual Descriptive
Code Code Available 2ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Jun 11, 2025 Medical Question Answering Question Answering
Code Code Available 2FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Jun 10, 2025 Image-text Retrieval Question Answering
Code Code Available 2Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning Jun 2, 2025 Fact Verification Language Modeling
Code Code Available 2VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning May 29, 2025 Anomaly Detection Descriptive
Code Code Available 2Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities May 26, 2025 Knowledge Graphs Natural Language Understanding
Code Code Available 2MASKSEARCH: A Universal Pre-Training Framework to Enhance Agentic Search Capability May 26, 2025 Multi-hop Question Answering Question Answering
Code Code Available 2DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue May 26, 2025 Diagnostic Question Answering
Code Code Available 2VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use May 25, 2025 Multimodal Reasoning Question Answering
Code Code Available 2DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding May 23, 2025 Language Modeling Language Modelling
Code Code Available 2SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding May 22, 2025 Motion Estimation Question Answering
Code Code Available 2Learnware of Language Models: Specialized Small Language Models Can Do Big May 19, 2025 Privacy Preserving Question Answering
Code Code Available 2Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner May 16, 2025 Cross-Modal Retrieval Diagnostic
Code Code Available 2EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning May 7, 2025 Multiple-choice Question Answering
Code Code Available 2UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities Apr 29, 2025 Question Answering RAG
Code Code Available 2FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models Apr 24, 2025 Answer Selection Information Retrieval
Code Code Available 2TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning Apr 13, 2025 Question Answering reinforcement-learning
Code Code Available 2MedM-VL: What Makes a Good Medical LVLM? Apr 6, 2025 Medical Image Analysis Question Answering
Code Code Available 2FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Apr 1, 2025 Audio-visual Question Answering Audio-Visual Question Answering (AVQA)
Code Code Available 2Unified Multimodal Discrete Diffusion Mar 26, 2025 Image Captioning Image Generation
Code Code Available 2Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis Mar 25, 2025 Contrastive Learning Image-text Retrieval
Code Code Available 2MC-LLaVA: Multi-Concept Personalized Vision-Language Model Mar 24, 2025 Language Modeling Language Modelling
Code Code Available 2LLaVAction: evaluating and training multi-modal large language models for action recognition Mar 24, 2025 Action Recognition Action Understanding
Code Code Available 2Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models Mar 21, 2025 GSM8K Question Answering
Code Code Available 2Where do Large Vision-Language Models Look at when Answering Questions? Mar 18, 2025 Question Answering Visual Question Answering
Code Code Available 2DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding Mar 13, 2025 4k Autonomous Driving
Code Code Available 2Teaching LMMs for Image Quality Scoring and Interpreting Mar 12, 2025 Descriptive Image Quality Assessment
Code Code Available 2A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis Mar 10, 2025 Question Answering
Code Code Available 2MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Mar 10, 2025 Benchmarking Medical Question Answering
Code Code Available 2Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model Mar 6, 2025 General Knowledge Image Captioning
Code Code Available 2AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM Mar 6, 2025 Anomaly Detection Language Modeling
Code Code Available 2SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking Mar 2, 2025 Fact Checking Fact Verification
Code Code Available 2Streaming Video Question-Answering with In-context Video KV-Cache Retrieval Mar 1, 2025 GPU Question Answering
Code Code Available 2LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers Feb 25, 2025 Multi-hop Question Answering Question Answering
Code Code Available 2Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts Feb 24, 2025 Benchmarking Fact Verification
Code Code Available 2Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models Feb 20, 2025 Question Answering Visual Question Answering
Code Code Available 2Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization Feb 18, 2025 Image Retrieval Question Answering
Code Code Available 2Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems Feb 16, 2025 Open-Domain Question Answering Question Answering
Code Code Available 2SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding Feb 15, 2025 Question Answering Streaming video understanding
Code Code Available 2KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG Feb 13, 2025 Knowledge Graphs Large Language Model
Code Code Available 2ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization Feb 6, 2025 Language Modeling Language Modelling
Code Code Available 2LUCY: Linguistic Understanding and Control Yielding Early Stage of Her Jan 27, 2025 Question Answering
Code Code Available 2Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning Jan 25, 2025 Answer Generation Multi-agent Reinforcement Learning
Code Code Available 2Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models Jan 25, 2025 Attribute Contrastive Learning
Code Code Available 2EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents Jan 21, 2025 Attribute Question Answering
Code Code Available 2