Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models Mar 21, 2025 GSM8K Question Answering
Code Code Available 25 DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue May 26, 2025 Diagnostic Question Answering
Code Code Available 25 Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering Apr 23, 2024 Graph Question Answering Hallucination
Code Code Available 25 Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs Jun 13, 2024 Arithmetic Reasoning Fact Verification
Code Code Available 25 Frozen Transformers in Language Models Are Effective Visual Encoder Layers Oct 19, 2023 Action Recognition Image-text Retrieval
Code Code Available 25 GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Jun 17, 2024 Audio Question Answering Instruction Following
Code Code Available 25 From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models Oct 13, 2023 Hallucination Image Captioning
Code Code Available 25 Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs Oct 14, 2024 Computational Efficiency Question Answering
Code Code Available 25 From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks Jun 4, 2024 Image Captioning Language Modelling
Code Code Available 25 GeoChat: Grounded Large Vision-Language Model for Remote Sensing Nov 24, 2023 Instruction Following Language Modeling
Code Code Available 25 CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models Jun 11, 2025 counterfactual Descriptive
Code Code Available 25 FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Jun 10, 2025 Image-text Retrieval Question Answering
Code Code Available 25 F-LMM: Grounding Frozen Large Multimodal Models Jun 9, 2024 General Knowledge Instruction Following
Code Code Available 25 Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Jun 2, 2023 Language Modeling Language Modelling
Code Code Available 25 Atlas: Few-shot Learning with Retrieval Augmented Language Models Aug 5, 2022 Fact Checking Few-Shot Learning
Code Code Available 25 Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering Sep 29, 2023 Image to text Passage Retrieval
Code Code Available 25 FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Apr 1, 2025 Audio-visual Question Answering Audio-Visual Question Answering (AVQA)
Code Code Available 25 EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis Sep 10, 2024 Contrastive Learning Cross-Modal Retrieval
Code Code Available 25 Evaluating LLM Reasoning in the Operations Research Domain with ORQA Dec 22, 2024 Question Answering
Code Code Available 25 FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models Apr 20, 2024 Binary Classification Fake Image Detection
Code Code Available 25 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding Sep 26, 2024 Question Answering Video Understanding
Code Code Available 25 ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis Mar 11, 2024 Question Answering
Code Code Available 25 Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement May 24, 2024 Hallucination Image Comprehension
Code Code Available 25 FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models Feb 21, 2024 Question Answering
Code Code Available 25 FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models Dec 30, 2024 Question Answering Token Reduction
Code Code Available 25 EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records Jan 13, 2024 Code Generation Few-Shot Learning
Code Code Available 25 Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction Mar 27, 2024 Image Captioning Language Modeling
Code Code Available 25 EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents Jan 21, 2025 Attribute Question Answering
Code Code Available 25 Egocentric Video-Language Pretraining Jun 3, 2022 Action Recognition Contrastive Learning
Code Code Available 25 Retrieval with Learned Similarities Jul 22, 2024 Question Answering Recommendation Systems
Code Code Available 25 Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning May 27, 2024 Question Answering RAG
Code Code Available 25 Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework Jun 20, 2024 Hallucination Question Answering
Code Code Available 25 Explore the Limits of Omni-modal Pretraining at Scale Jun 13, 2024 Language Modeling Language Modelling
Code Code Available 25 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Oct 23, 2019 Answer Generation Common Sense Reasoning
Code Code Available 25 CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios Mar 7, 2024 Audio-visual Question Answering Audio-Visual Question Answering (AVQA)
Code Code Available 25 An Embodied Generalist Agent in 3D World Nov 18, 2023 3D dense captioning 3D Question Answering (3D-QA)
Code Code Available 25 Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models Jan 25, 2025 Attribute Contrastive Learning
Code Code Available 25 FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models Apr 24, 2025 Answer Selection Information Retrieval
Code Code Available 25 Efficient One-Pass End-to-End Entity Linking for Questions Oct 6, 2020 CPU Entity Linking
Code Code Available 25 FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design Nov 23, 2023 Decision Making Language Modelling
Code Code Available 25 EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education Aug 5, 2023 Chatbot Language Modeling
Code Code Available 25 EfficientRAG: Efficient Retriever for Multi-Hop Question Answering Aug 8, 2024 Multi-hop Question Answering Question Answering
Code Code Available 25 End-To-End Memory Networks Mar 31, 2015 Language Modeling Language Modelling
Code Code Available 25 ANAH: Analytical Annotation of Hallucinations in Large Language Models May 30, 2024 Generative Question Answering Hallucination
Code Code Available 25 ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models Jul 5, 2024 Hallucination Long Form Question Answering
Code Code Available 25 Dual Diffusion for Unified Image Generation and Understanding Dec 31, 2024 Image Generation Language Modeling
Code Code Available 25 DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding Mar 13, 2025 4k Autonomous Driving
Code Code Available 25 Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding Jan 9, 2024 Fact Verification In-Context Learning
Code Code Available 25 Can AI Assistants Know What They Don't Know? Jan 24, 2024 Math Open-Domain Question Answering
Code Code Available 25 An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM Mar 27, 2024 Language Modeling Language Modelling
Code Code Available 25