From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks Jun 4, 2024 Image Captioning Language Modelling
Code Code Available 25 Revealing Single Frame Bias for Video-and-Language Learning Jun 7, 2022 Action Recognition Fine-grained Action Recognition
Code Code Available 25 FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Jun 10, 2025 Image-text Retrieval Question Answering
Code Code Available 25 An Embodied Generalist Agent in 3D World Nov 18, 2023 3D dense captioning 3D Question Answering (3D-QA)
Code Code Available 25 FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design Nov 23, 2023 Decision Making Language Modelling
Code Code Available 25 F-LMM: Grounding Frozen Large Multimodal Models Jun 9, 2024 General Knowledge Instruction Following
Code Code Available 25 FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models Apr 24, 2025 Answer Selection Information Retrieval
Code Code Available 25 Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts Feb 24, 2025 Benchmarking Fact Verification
Code Code Available 25 Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Jun 2, 2023 Language Modeling Language Modelling
Code Code Available 25 Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction Mar 27, 2024 Image Captioning Language Modeling
Code Code Available 25 Atlas: Few-shot Learning with Retrieval Augmented Language Models Aug 5, 2022 Fact Checking Few-Shot Learning
Code Code Available 25 Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering Sep 29, 2023 Image to text Passage Retrieval
Code Code Available 25 FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Apr 1, 2025 Audio-visual Question Answering Audio-Visual Question Answering (AVQA)
Code Code Available 25 DeBERTa: Decoding-enhanced BERT with Disentangled Attention Jun 5, 2020 Common Sense Reasoning Coreference Resolution
Code Code Available 25 SensorLLM: Human-Intuitive Alignment of Multivariate Sensor Data with LLMs for Activity Recognition Oct 14, 2024 Activity Recognition Descriptive
Code Code Available 25 ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Sep 26, 2019 Common Sense Reasoning GPU
Code Code Available 25 Can AI Assistants Know What They Don't Know? Jan 24, 2024 Math Open-Domain Question Answering
Code Code Available 25 Explore the Limits of Omni-modal Pretraining at Scale Jun 13, 2024 Language Modeling Language Modelling
Code Code Available 25 AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model Aug 2, 2022 Causal Language Modeling Common Sense Reasoning
Code Code Available 25 Evaluating LLM Reasoning in the Operations Research Domain with ORQA Dec 22, 2024 Question Answering
Code Code Available 25 Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework Jun 20, 2024 Hallucination Question Answering
Code Code Available 25 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Oct 23, 2019 Answer Generation Common Sense Reasoning
Code Code Available 25 DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature May 8, 2024 Question Answering
Code Code Available 25 DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding May 23, 2025 Language Modeling Language Modelling
Code Code Available 25 STaR: Bootstrapping Reasoning With Reasoning Mar 28, 2022 Common Sense Reasoning Language Modeling
Code Code Available 25 Beyond Accuracy: Behavioral Testing of NLP models with CheckList May 8, 2020 Question Answering Sentiment Analysis
Code Code Available 25 Data Science with LLMs and Interpretable Models Feb 22, 2024 Additive models Question Answering
Code Code Available 25 Streaming Video Question-Answering with In-context Video KV-Cache Retrieval Mar 1, 2025 GPU Question Answering
Code Code Available 25 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding Sep 26, 2024 Question Answering Video Understanding
Code Code Available 25 Deep Bidirectional Language-Knowledge Graph Pretraining Oct 17, 2022 Common Sense Reasoning Knowledge Graphs
Code Code Available 25 Debiasing Multimodal Large Language Models Mar 8, 2024 Fairness Question Answering
Code Code Available 25 Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models Jan 25, 2025 Attribute Contrastive Learning
Code Code Available 25 SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models Jul 30, 2024 Caption Generation Question Answering
Code Code Available 25 Beyond Text: Frozen Large Language Models in Visual Signal Comprehension Mar 12, 2024 Deblurring Decoder
Code Code Available 25 TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning Jun 12, 2025 Answer Generation Chunking
Code Code Available 25 TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy Jun 3, 2024 Language Modelling Question Answering
Code Code Available 25 BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Dec 10, 2024 Medical Visual Question Answering Question Answering
Code Code Available 25 Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner May 16, 2025 Cross-Modal Retrieval Diagnostic
Code Code Available 25 Task Me Anything Jun 17, 2024 2k Attribute
Code Code Available 25 TCMBench: A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine Jun 3, 2024 Benchmarking Question Answering
Code Code Available 25 EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis Sep 10, 2024 Contrastive Learning Cross-Modal Retrieval
Code Code Available 25 ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models Jul 5, 2024 Hallucination Long Form Question Answering
Code Code Available 25 End-To-End Memory Networks Mar 31, 2015 Language Modeling Language Modelling
Code Code Available 25 End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering Nov 8, 2024 Language Modeling Language Modelling
Code Code Available 25 ANAH: Analytical Annotation of Hallucinations in Large Language Models May 30, 2024 Generative Question Answering Hallucination
Code Code Available 25 Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning May 27, 2024 Question Answering RAG
Code Code Available 25 Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement May 24, 2024 Hallucination Image Comprehension
Code Code Available 25 Egocentric Video-Language Pretraining Jun 3, 2022 Action Recognition Contrastive Learning
Code Code Available 25 EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records Jan 13, 2024 Code Generation Few-Shot Learning
Code Code Available 25 EfficientRAG: Efficient Retriever for Multi-Hop Question Answering Aug 8, 2024 Multi-hop Question Answering Question Answering
Code Code Available 25