Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering Feb 26, 2024 Evidence Selection Open-Ended Question Answering
Code Code Available 4A Survey on Vision-Language-Action Models for Embodied AI May 23, 2024 Image Captioning Instruction Following
Code Code Available 4Knowledge Fusion of Large Language Models Jan 19, 2024 Code Generation Common Sense Reasoning
Code Code Available 4Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning May 23, 2025 Decoder Image Captioning
Code Code Available 4N-Grammer: Augmenting Transformers with latent n-grams Jul 13, 2022 Common Sense Reasoning Coreference Resolution
Code Code Available 4OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning Dec 31, 2024 Benchmarking Logical Reasoning
Code Code Available 4SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models Feb 13, 2025 Question Answering RAG
Code Code Available 4VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model May 6, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 4A Survey of Large Language Models in Finance (FinLLMs) Feb 4, 2024 Named Entity Recognition (NER) Question Answering
Code Code Available 3Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook Mar 23, 2025 3D Generation Medical Report Generation
Code Code Available 3REPLUG: Retrieval-Augmented Black-Box Language Models Jan 30, 2023 Language Modeling Language Modelling
Code Code Available 3EgoLife: Towards Egocentric Life Assistant Mar 5, 2025 Question Answering Video Understanding
Code Code Available 3A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multimodal Dec 12, 2022 General Knowledge Graph Embedding
Code Code Available 3SALMONN: Towards Generic Hearing Abilities for Large Language Models Oct 20, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 3Efficient Multimodal Large Language Models: A Survey May 17, 2024 Edge-computing Question Answering
Code Code Available 3Prompting Is Programming: A Query Language for Large Language Models Dec 12, 2022 Code Generation Language Modeling
Code Code Available 3Q-Bench+: A Benchmark for Multi-modal Foundation Models on Low-level Vision from Single Images to Pairs Feb 11, 2024 Image Quality Assessment Question Answering
Code Code Available 3A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness Nov 4, 2024 Question Answering Text Generation
Code Code Available 3Generating Long Sequences with Sparse Transformers Apr 23, 2019 Diversity Image Generation
Code Code Available 3ERNIE: Enhanced Representation through Knowledge Integration Apr 19, 2019 Chinese Named Entity Recognition Chinese Sentence Pair Classification
Code Code Available 3Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering Mar 14, 2025 Audio Question Answering Question Answering
Code Code Available 3ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation Sep 20, 2024 Descriptive Question Answering
Code Code Available 3Scaling Instruction-Finetuned Language Models Oct 20, 2022 Coreference Resolution Cross-Lingual Question Answering
Code Code Available 3ERNIE 2.0: A Continual Pre-training Framework for Language Understanding Jul 29, 2019 Chinese Named Entity Recognition Chinese Reading Comprehension
Code Code Available 3PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models Mar 26, 2024 Code Completion Few-Shot Learning
Code Code Available 3DriveLM: Driving with Graph Visual Question Answering Dec 21, 2023 Autonomous Driving Question Answering
Code Code Available 3ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities May 18, 2023 1 Image, 2*2 Stitchi Action Classification
Code Code Available 3Odyssey: Empowering Minecraft Agents with Open-World Skills Jul 22, 2024 Language Modelling Large Language Model
Code Code Available 3Detecting hallucinations in large language models using semantic entropy Jun 19, 2024 Large Language Model Question Answering
Code Code Available 3MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Feb 24, 2025 Question Answering Visual Question Answering
Code Code Available 3DARWIN 1.5: Large Language Models as Materials Science Adapted Learners Dec 16, 2024 Large Language Model Multi-Task Learning
Code Code Available 3Evaluating Hallucinations in Chinese Large Language Models Oct 5, 2023 Hallucination Question Answering
Code Code Available 3ST-MoE: Designing Stable and Transferable Sparse Expert Models Feb 17, 2022 ARC Common Sense Reasoning
Code Code Available 3CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models Jan 30, 2024 Knowledge Base Construction Question Answering
Code Code Available 3MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts Apr 22, 2024 Common Sense Reasoning GPU
Code Code Available 3Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models Nov 11, 2023 Image Captioning MMR total
Code Code Available 3PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers Feb 13, 2024 Question Answering Retrieval
Code Code Available 3MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Mar 18, 2025 document understanding Question Answering
Code Code Available 3M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models Mar 31, 2024 Image-text Retrieval Language Modeling
Code Code Available 3LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Oct 14, 2024 Benchmarking Large Language Model
Code Code Available 3MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Apr 8, 2024 GPU Multiple-choice
Code Code Available 3Longformer: The Long-Document Transformer Apr 10, 2020 Decoder Language Modeling
Code Code Available 3CRAG -- Comprehensive RAG Benchmark Jun 7, 2024 Hallucination Language Modelling
Code Code Available 33D-LLM: Injecting the 3D World into Large Language Models Jul 24, 2023 3D Object Captioning 3D Question Answering (3D-QA)
Code Code Available 3LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning Jan 1, 2024 3D dense captioning Dense Captioning
Code Code Available 3Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought Oct 3, 2022 Mathematical Reasoning Question Answering
Code Code Available 3Language Models are Few-Shot Learners May 28, 2020 answerability prediction Articles
Code Code Available 3LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis May 5, 2025 Chatbot Decoder
Code Code Available 3AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework Mar 19, 2024 Benchmarking Financial Analysis
Code Code Available 3KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction May 29, 2025 Question Answering
Code Code Available 3