TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document Mar 7, 2024 document understanding Key Information Extraction
Code Code Available 5MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments Feb 1, 2024 Embodied Question Answering Language Modeling
Code Code Available 5RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval Jan 31, 2024 Question Answering Retrieval
Code Code Available 5Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine Nov 28, 2023 Electrical Engineering Experimental Design
Code Code Available 5Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Aug 24, 2023 Chart Question Answering FS-MEVQA
Code Code Available 5Tree of Thoughts: Deliberate Problem Solving with Large Language Models May 17, 2023 Arithmetic Reasoning Decision Making
Code Code Available 5GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation May 26, 2025 Question Answering Synthetic Data Generation
Code Code Available 4QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning May 23, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 4Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning May 23, 2025 Decoder Image Captioning
Code Code Available 4VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model May 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model Mar 30, 2025 Autonomous Driving Decision Making
Code Code Available 4Retrieval-Augmented Generation with Hierarchical Knowledge Mar 13, 2025 Multi-hop Question Answering Question Answering
Code Code Available 4ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Feb 25, 2025 Question Answering RAG
Code Code Available 4SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Feb 13, 2025 Question Answering RAG
Code Code Available 4Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation Feb 4, 2025 Benchmarking Information Retrieval
Code Code Available 4Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Jan 14, 2025 Embodied Question Answering Hallucination
Code Code Available 4OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning Dec 31, 2024 Benchmarking Logical Reasoning
Code Code Available 4Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Dec 18, 2024 Question Answering Spatial Reasoning
Code Code Available 4EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations Oct 14, 2024 Answer Generation Question Answering
Code Code Available 4LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA Sep 4, 2024 Question Answering Sentence
Code Code Available 4Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions Aug 1, 2024 Medical Question Answering MedQA
Code Code Available 4The Llama 3 Herd of Models Jul 31, 2024 answerability prediction Language Modeling
Code Code Available 4MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations Jun 13, 2024 3D visual grounding Attribute
Code Code Available 4A Survey on Vision-Language-Action Models for Embodied AI May 23, 2024 Image Captioning Instruction Following
Code Code Available 4OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning May 2, 2024 Autonomous Driving counterfactual
Code Code Available 4MovieChat+: Question-aware Sparse Memory for Long Video Question Answering Apr 26, 2024 2k Question Answering
Code Code Available 4Sailor: Open Language Models for South-East Asia Apr 4, 2024 Language Modeling Language Modelling
Code Code Available 4BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text Mar 27, 2024 Articles Language Modeling
Code Code Available 4Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Mar 14, 2024 GSM8K Language Modelling
Code Code Available 4Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation Feb 28, 2024 Attribute Extractive Question-Answering
Code Code Available 4Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering Feb 26, 2024 Evidence Selection Open-Ended Question Answering
Code Code Available 4FinBen: A Holistic Financial Benchmark for Large Language Models Feb 20, 2024 Question Answering RAG
Code Code Available 4Benchmarking Retrieval-Augmented Generation for Medicine Feb 20, 2024 Benchmarking Information Retrieval
Code Code Available 4OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM Feb 14, 2024 Medical Visual Question Answering Question Answering
Code Code Available 4G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering Feb 12, 2024 Common Sense Reasoning Graph Classification
Code Code Available 4Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models Feb 12, 2024 Hallucination Object Localization
Code Code Available 4Knowledge Fusion of Large Language Models Jan 19, 2024 Code Generation Common Sense Reasoning
Code Code Available 4Mixtral of Experts Jan 8, 2024 Code Generation Common Sense Reasoning
Code Code Available 4GPT-4V(ision) is a Generalist Web Agent, if Grounded Jan 3, 2024 Image Captioning Question Answering
Code Code Available 4MEDITRON-70B: Scaling Medical Pretraining for Large Language Models Nov 27, 2023 Articles Conditional Text Generation
Code Code Available 4Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Nov 16, 2023 Language Modeling Language Modelling
Code Code Available 4SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models Nov 13, 2023 Described Object Detection Language Modeling
Code Code Available 4Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Oct 17, 2023 Fact Verification Question Answering
Code Code Available 4Retrieval-Generation Synergy Augmented Large Language Models Oct 8, 2023 Question Answering Retrieval
Code Code Available 4Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese Sep 8, 2023 Domain Adaptation Hallucination
Code Code Available 4LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day Jun 1, 2023 Image Classification Instruction Following
Code Code Available 4AlignScore: Evaluating Factual Consistency with a Unified Alignment Function May 26, 2023 Fact Verification Information Retrieval
Code Code Available 4VideoChat: Chat-Centric Video Understanding May 10, 2023 Question Answering Video-based Generative Performance Benchmarking
Code Code Available 4SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot Jan 2, 2023 Common Sense Reasoning Language Modelling
Code Code Available 4Holistic Evaluation of Language Models Nov 16, 2022 Fairness Question Answering
Code Code Available 4