A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models Jan 21, 2025 RAG Retrieval
Code Code Available 7h2oGPT: Democratizing Large Language Models Jun 13, 2023 Chatbot Fairness
Code Code Available 6BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval Jul 16, 2024 Question Answering Retrieval
Code Code Available 5BM25S: Orders of magnitude faster lexical search via eager sparse scoring Jul 4, 2024 Passage Retrieval Retrieval
Code Code Available 5BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Jan 28, 2022 Image Captioning Image-text matching
Code Code Available 5FG-CLIP: Fine-Grained Visual and Textual Alignment May 8, 2025 Image-text Retrieval object-detection
Code Code Available 4Multi-label Cluster Discrimination for Visual Representation Learning Jul 24, 2024 Contrastive Learning Image-text Retrieval
Code Code Available 4Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Feb 29, 2024 Retrieval Text Retrieval
Code Code Available 4RETSim: Resilient and Efficient Text Similarity Nov 28, 2023 Adversarial Text Clustering
Code Code Available 4LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment Oct 3, 2023 Audio Classification Contrastive Learning
Code Code Available 4MTEB: Massive Text Embedding Benchmark Oct 13, 2022 Benchmarking Information Retrieval
Code Code Available 4Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers Jul 14, 2022 Retrieval Text Retrieval
Code Code Available 4Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding Feb 9, 2025 Image Captioning Image-text Retrieval
Code Code Available 3M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models Mar 31, 2024 Image-text Retrieval Language Modeling
Code Code Available 3ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities May 18, 2023 1 Image, 2*2 Stitchi Action Classification
Code Code Available 3AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation Apr 4, 2023 Cross-Modal Retrieval Image-text Retrieval
Code Code Available 3Vision-Language Pre-training: Basics, Recent Advances, and Future Trends Oct 17, 2022 Few-Shot Learning Image Captioning
Code Code Available 3DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models Feb 8, 2022 Diagnostic Image Captioning
Code Code Available 3TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning Jun 12, 2025 Answer Generation Chunking
Code Code Available 2GLAP: General contrastive audio-text pretraining across domains and languages Jun 12, 2025 AudioCaps Keyword Spotting
Code Code Available 2FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Jun 10, 2025 Image-text Retrieval Question Answering
Code Code Available 2One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory May 29, 2025 Contrastive Learning Text Retrieval
Code Code Available 2Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis Mar 25, 2025 Contrastive Learning Image-text Retrieval
Code Code Available 2Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion Feb 6, 2025 image-classification Image Classification
Code Code Available 2BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Jan 13, 2025 Articles Image-text Retrieval
Code Code Available 2Where am I? Cross-View Geo-localization with Natural Language Descriptions Dec 22, 2024 geo-localization Image Retrieval
Code Code Available 2Gramian Multimodal Representation Learning and Alignment Dec 16, 2024 Contrastive Learning Representation Learning
Code Code Available 2AudioSetCaps: An Enriched Audio-Caption Dataset using Automated Generation Pipeline with Large Audio and Language Models Nov 28, 2024 Audio captioning Audio to Text Retrieval
Code Code Available 2Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications Oct 29, 2024 Image Retrieval RAG
Code Code Available 2Towards Vision-Language Geo-Foundation Model: A Survey Jun 13, 2024 Earth Observation Image Captioning
Code Code Available 2RWKV-CLIP: A Robust Vision-Language Representation Learner Jun 11, 2024 Image-text Retrieval Representation Learning
Code Code Available 2Accelerating Transformers with Spectrum-Preserving Token Merging May 25, 2024 image-classification Image Classification
Code Code Available 2ProtT3: Protein-to-Text Generation for Text-based Protein Understanding May 21, 2024 Property Prediction Question Answering
Code Code Available 2Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations Apr 29, 2024 Retrieval Text Retrieval
Code Code Available 2Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment Apr 28, 2024 Cross-Modal Retrieval Image Retrieval
Code Code Available 2DreamLIP: Language-Image Pre-training with Long Captions Mar 25, 2024 Contrastive Learning Image-text Retrieval
Code Code Available 2FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions Mar 22, 2024 Information Retrieval Retrieval
Code Code Available 2vid-TLDR: Training Free Token merging for Light-weight Video Transformer Mar 20, 2024 Action Recognition Computational Efficiency
Code Code Available 2Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval Mar 8, 2024 Image-text Retrieval Retrieval
Code Code Available 2Distillation Enhanced Generative Retrieval Feb 16, 2024 Retrieval Text Retrieval
Code Code Available 2M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval Jan 31, 2024 Retrieval Text Retrieval
Code Code Available 2Towards 3D Molecule-Text Interpretation in Language Models Jan 25, 2024 Instruction Following Language Modeling
Code Code Available 2Frozen Transformers in Language Models Are Effective Visual Encoder Layers Oct 19, 2023 Action Recognition Image-text Retrieval
Code Code Available 2VeCLIP: Improving CLIP Training via Visual-enriched Captions Oct 11, 2023 Image-text Retrieval Retrieval
Code Code Available 2The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World Aug 3, 2023 All Question Answering
Code Code Available 2RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing Jun 20, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 2RemoteCLIP: A Vision Language Foundation Model for Remote Sensing Jun 19, 2023 Classification Cross-Modal Retrieval
Code Code Available 2PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents Mar 13, 2023 image-classification Image Classification
Code Code Available 2Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing Dec 21, 2022 Contrastive Learning Drug Design
Code Code Available 2Dense Text Retrieval based on Pretrained Language Models: A Survey Nov 27, 2022 Retrieval Survey
Code Code Available 2