GIT: A Generative Image-to-text Transformer for Vision and Language May 27, 2022 Decoder Image Captioning
Code Code Available 2GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Aug 6, 2024 Question Answering Visual Question Answering
Code Code Available 2GeoChat: Grounded Large Vision-Language Model for Remote Sensing Nov 24, 2023 Instruction Following Language Modeling
Code Code Available 2GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering Feb 4, 2024 Language Modeling Language Modelling
Code Code Available 2GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI Nov 21, 2024 Decision Making Language Modeling
Code Code Available 2Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering Apr 23, 2024 Graph Question Answering Hallucination
Code Code Available 2GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Jun 17, 2024 Audio Question Answering Instruction Following
Code Code Available 2From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks Jun 4, 2024 Image Captioning Language Modelling
Code Code Available 2Frozen Transformers in Language Models Are Effective Visual Encoder Layers Oct 19, 2023 Action Recognition Image-text Retrieval
Code Code Available 2A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis Mar 10, 2025 Question Answering
Code Code Available 2A Survey on Benchmarks of Multimodal Large Language Models Aug 16, 2024 Question Answering Survey
Code Code Available 2VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis Mar 29, 2024 Hallucination Image Captioning
Code Code Available 2FreeVA: Offline MLLM as Training-Free Video Assistant May 13, 2024 Fairness Question Answering
Code Code Available 2F-LMM: Grounding Frozen Large Multimodal Models Jun 9, 2024 General Knowledge Instruction Following
Code Code Available 2Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs Oct 14, 2024 Computational Efficiency Question Answering
Code Code Available 2FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Jun 10, 2025 Image-text Retrieval Question Answering
Code Code Available 2Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering Sep 29, 2023 Image to text Passage Retrieval
Code Code Available 2Ask Me Anything: A simple strategy for prompting language models Oct 5, 2022 Coreference Resolution Natural Language Inference
Code Code Available 2From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models Oct 13, 2023 Hallucination Image Captioning
Code Code Available 2Atlas: Few-shot Learning with Retrieval Augmented Language Models Aug 5, 2022 Fact Checking Few-Shot Learning
Code Code Available 2FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models Feb 21, 2024 Question Answering
Code Code Available 2FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models Apr 24, 2025 Answer Selection Information Retrieval
Code Code Available 2Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework Jun 20, 2024 Hallucination Question Answering
Code Code Available 2Explore the Limits of Omni-modal Pretraining at Scale Jun 13, 2024 Language Modeling Language Modelling
Code Code Available 2A Replication Study of Dense Passage Retriever Apr 12, 2021 Open-Domain Question Answering Question Answering
Code Code Available 2Evaluating LLM Reasoning in the Operations Research Domain with ORQA Dec 22, 2024 Question Answering
Code Code Available 2Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Oct 23, 2019 Answer Generation Common Sense Reasoning
Code Code Available 2Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Jun 2, 2023 Language Modeling Language Modelling
Code Code Available 2AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM Mar 6, 2025 Anomaly Detection Language Modeling
Code Code Available 2ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis Mar 11, 2024 Question Answering
Code Code Available 2EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis Sep 10, 2024 Contrastive Learning Cross-Modal Retrieval
Code Code Available 2FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models Apr 20, 2024 Binary Classification Fake Image Detection
Code Code Available 2End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering Nov 8, 2024 Language Modeling Language Modelling
Code Code Available 2Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning Mar 6, 2024 Multimodal Reasoning Question Answering
Code Code Available 2End-To-End Memory Networks Mar 31, 2015 Language Modeling Language Modelling
Code Code Available 2Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement May 24, 2024 Hallucination Image Comprehension
Code Code Available 2AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents Jul 5, 2024 Decision Making Multi-hop Question Answering
Code Code Available 2FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design Nov 23, 2023 Decision Making Language Modelling
Code Code Available 2A Simple Aerial Detection Baseline of Multimodal Language Models Jan 16, 2025 object-detection Object Detection
Code Code Available 2EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records Jan 13, 2024 Code Generation Few-Shot Learning
Code Code Available 2FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Apr 1, 2025 Audio-visual Question Answering Audio-Visual Question Answering (AVQA)
Code Code Available 2FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models Dec 30, 2024 Question Answering Token Reduction
Code Code Available 2EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents Jan 21, 2025 Attribute Question Answering
Code Code Available 2A Pilot Study for Chinese SQL Semantic Parsing Sep 29, 2019 Cross-Lingual Word Embeddings Question Answering
Code Code Available 2Egocentric Video-Language Pretraining Jun 3, 2022 Action Recognition Contrastive Learning
Code Code Available 2Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning May 27, 2024 Question Answering RAG
Code Code Available 2E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding Sep 26, 2024 Question Answering Video Understanding
Code Code Available 2EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education Aug 5, 2023 Chatbot Language Modeling
Code Code Available 2Efficient One-Pass End-to-End Entity Linking for Questions Oct 6, 2020 CPU Entity Linking
Code Code Available 2Dual Diffusion for Unified Image Generation and Understanding Dec 31, 2024 Image Generation Language Modeling
Code Code Available 2