ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions Mar 12, 2023 Image Captioning Question Answering
Code Code Available 2ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models Oct 13, 2023 Knowledge Base Question Answering Knowledge Graphs
Code Code Available 2GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Jun 17, 2024 Audio Question Answering Instruction Following
Code Code Available 2Frozen Transformers in Language Models Are Effective Visual Encoder Layers Oct 19, 2023 Action Recognition Image-text Retrieval
Code Code Available 2Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering Apr 23, 2024 Graph Question Answering Hallucination
Code Code Available 2Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering Nov 25, 2024 Question Answering Visual Question Answering
Code Code Available 2Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs Oct 14, 2024 Computational Efficiency Question Answering
Code Code Available 2Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models Apr 16, 2024 image-classification Image Classification
Code Code Available 2MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Jul 31, 2023 Multiple-choice Question Answering
Code Code Available 2From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models Oct 13, 2023 Hallucination Image Captioning
Code Code Available 2MuGER^2: Multi-Granularity Evidence Retrieval and Reasoning for Hybrid Question Answering Oct 19, 2022 Navigate Question Answering
Code Code Available 2Multi-Agent Large Language Models for Conversational Task-Solving Oct 30, 2024 Fairness Question Answering
Code Code Available 2From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks Jun 4, 2024 Image Captioning Language Modelling
Code Code Available 2Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models Feb 20, 2025 Question Answering Visual Question Answering
Code Code Available 2F-LMM: Grounding Frozen Large Multimodal Models Jun 9, 2024 General Knowledge Instruction Following
Code Code Available 2FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Jun 10, 2025 Image-text Retrieval Question Answering
Code Code Available 2FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design Nov 23, 2023 Decision Making Language Modelling
Code Code Available 2FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Apr 1, 2025 Audio-visual Question Answering Audio-Visual Question Answering (AVQA)
Code Code Available 2A Survey on Benchmarks of Multimodal Large Language Models Aug 16, 2024 Question Answering Survey
Code Code Available 2One missing piece in Vision and Language: A Survey on Comics Understanding Sep 14, 2024 document understanding image-classification
Code Code Available 2Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering Sep 29, 2023 Image to text Passage Retrieval
Code Code Available 2OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models May 13, 2023 Key Information Extraction Nutrition
Code Code Available 2FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models Apr 24, 2025 Answer Selection Information Retrieval
Code Code Available 2Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Jun 2, 2023 Language Modeling Language Modelling
Code Code Available 2PEDANTS: Cheap but Effective and Interpretable Answer Equivalence Feb 17, 2024 Benchmarking Form
Code Code Available 2Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks Jan 5, 2024 Arithmetic Reasoning Code Generation
Code Code Available 2PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling Oct 8, 2024 document understanding Language Modeling
Code Code Available 2FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models Dec 30, 2024 Question Answering Token Reduction
Code Code Available 2Pengi: An Audio Language Model for Audio Tasks May 19, 2023 Audio captioning Audio Question Answering
Code Code Available 2Perception Test: A Diagnostic Benchmark for Multimodal Models Oct 19, 2022 Diagnostic Multiple-choice
Code Code Available 2Ask Me Anything: A simple strategy for prompting language models Oct 5, 2022 Coreference Resolution Natural Language Inference
Code Code Available 2FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models Apr 20, 2024 Binary Classification Fake Image Detection
Code Code Available 2FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models Feb 21, 2024 Question Answering
Code Code Available 2Compressing Context to Enhance Inference Efficiency of Large Language Models Oct 9, 2023 Articles Question Answering
Code Code Available 2ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents Feb 21, 2024 Active Learning Position
Code Code Available 2AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator Feb 15, 2024 Benchmarking Diagnostic
Code Code Available 2Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Oct 23, 2019 Answer Generation Common Sense Reasoning
Code Code Available 2ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Sep 26, 2019 Common Sense Reasoning GPU
Code Code Available 2EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis Sep 10, 2024 Contrastive Learning Cross-Modal Retrieval
Code Code Available 2Atlas: Few-shot Learning with Retrieval Augmented Language Models Aug 5, 2022 Fact Checking Few-Shot Learning
Code Code Available 2ProtT3: Protein-to-Text Generation for Text-based Protein Understanding May 21, 2024 Property Prediction Question Answering
Code Code Available 2Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework Jun 20, 2024 Hallucination Question Answering
Code Code Available 2A Simple Aerial Detection Baseline of Multimodal Language Models Jan 16, 2025 object-detection Object Detection
Code Code Available 2QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization Mar 11, 2022 image-classification Image Classification
Code Code Available 2AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model Aug 2, 2022 Causal Language Modeling Common Sense Reasoning
Code Code Available 2Evaluating LLM Reasoning in the Operations Research Domain with ORQA Dec 22, 2024 Question Answering
Code Code Available 2Explore the Limits of Omni-modal Pretraining at Scale Jun 13, 2024 Language Modeling Language Modelling
Code Code Available 2FreeVA: Offline MLLM as Training-Free Video Assistant May 13, 2024 Fairness Question Answering
Code Code Available 2ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis Mar 11, 2024 Question Answering
Code Code Available 2A Replication Study of Dense Passage Retriever Apr 12, 2021 Open-Domain Question Answering Question Answering
Code Code Available 2