| CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models | Oct 17, 2024 | Contrastive LearningDiversity | CodeCode Available | 2 |
| Omnizart: A General Toolbox for Automatic Music Transcription | Jun 1, 2021 | Chord RecognitionDownbeat Tracking | CodeCode Available | 2 |
| Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Jun 17, 2024 | Benchmarking | CodeCode Available | 2 |
| OpenNRE: An Open and Extensible Toolkit for Neural Relation Extraction | Sep 28, 2019 | Information RetrievalQuestion Answering | CodeCode Available | 2 |
| CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs | Aug 29, 2023 | CPUGPU | CodeCode Available | 2 |
| Multilingual Search with Subword TF-IDF | Sep 28, 2022 | Information RetrievalRetrieval | CodeCode Available | 2 |
| Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations | Feb 19, 2021 | Cultural Vocal Bursts Intensity PredictionInformation Retrieval | CodeCode Available | 2 |
| Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions | Mar 1, 2025 | Information RetrievalRAG | CodeCode Available | 2 |
| Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval | Mar 7, 2022 | Information RetrievalPassage Retrieval | CodeCode Available | 2 |
| CoIR: A Comprehensive Benchmark for Code Information Retrieval Models | Jul 3, 2024 | BenchmarkingCode Search | CodeCode Available | 2 |
| Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers | Mar 22, 2024 | Information Retrieval | CodeCode Available | 2 |
| RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! | Dec 5, 2023 | Information RetrievalReranking | CodeCode Available | 2 |
| Multi-Interest Network with Dynamic Routing for Recommendation at Tmall | Apr 17, 2019 | ClusteringInformation Retrieval | CodeCode Available | 2 |
| Mustango: Toward Controllable Text-to-Music Generation | Nov 14, 2023 | Data AugmentationDenoising | CodeCode Available | 2 |
| PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents | Jul 12, 2024 | Information RetrievalQuestion Answering | CodeCode Available | 2 |
| Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages | Oct 18, 2022 | Information RetrievalRetrieval | CodeCode Available | 2 |
| SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking | Mar 2, 2025 | Fact CheckingFact Verification | CodeCode Available | 2 |
| BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models | Apr 17, 2021 | Argument RetrievalBenchmarking | CodeCode Available | 2 |
| Language Model Powered Digital Biology with BRAD | Sep 4, 2024 | ChatbotCode Generation | CodeCode Available | 2 |
| Lightning IR: Straightforward Fine-tuning and Inference of Transformer-based Language Models for Information Retrieval | Nov 7, 2024 | Information RetrievalRe-Ranking | CodeCode Available | 2 |
| Melody transcription via generative pre-training | Dec 4, 2022 | Chord RecognitionInformation Retrieval | CodeCode Available | 2 |
| Knowledge Representation Learning: A Quantitative Review | Dec 28, 2018 | General ClassificationInformation Retrieval | CodeCode Available | 2 |
| Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents | Apr 19, 2023 | Information RetrievalPassage Ranking | CodeCode Available | 2 |
| LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant | Dec 2, 2024 | Contrastive LearningInformation Retrieval | CodeCode Available | 2 |
| InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval | Jul 10, 2023 | GPUInformation Retrieval | CodeCode Available | 2 |
| BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics | Dec 12, 2023 | Information RetrievalRepresentation Learning | CodeCode Available | 2 |
| Autonomous GIS: the next-generation AI-powered GIS | May 10, 2023 | Code GenerationInformation Retrieval | CodeCode Available | 2 |
| Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion | May 4, 2022 | Information RetrievalKnowledge Graph Completion | CodeCode Available | 2 |
| InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval | Jan 4, 2023 | Information RetrievalRetrieval | CodeCode Available | 2 |
| Large Language Models for Information Retrieval: A Survey | Aug 14, 2023 | Information RetrievalQuestion Answering | CodeCode Available | 2 |
| Infinite Recommendation Networks: A Data-Centric Approach | Jun 3, 2022 | Information RetrievalRecommendation Systems | CodeCode Available | 2 |
| InPars: Data Augmentation for Information Retrieval using Large Language Models | Feb 10, 2022 | Data AugmentationDiversity | CodeCode Available | 2 |
| MemLong: Memory-Augmented Retrieval for Long Text Modeling | Aug 30, 2024 | 4kDecoder | CodeCode Available | 2 |
| FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search | May 20, 2021 | Information RetrievalRetrieval | CodeCode Available | 2 |
| FIRST: Faster Improved Listwise Reranking with Single Token Decoding | Jun 21, 2024 | Information RetrievalLanguage Modeling | CodeCode Available | 2 |
| A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval | Mar 7, 2025 | Information RetrievalLanguage Modeling | CodeCode Available | 2 |
| Autoregressive Search Engines: Generating Substrings as Document Identifiers | Apr 22, 2022 | Information RetrievalRetrieval | CodeCode Available | 2 |
| Backtracing: Retrieving the Cause of the Query | Mar 6, 2024 | Information RetrievalLanguage Modeling | CodeCode Available | 2 |
| FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions | Mar 22, 2024 | Information RetrievalRetrieval | CodeCode Available | 2 |
| GENIUS: A Generative Framework for Universal Multimodal Search | Mar 25, 2025 | Information RetrievalQuantization | CodeCode Available | 2 |
| Atlas: Few-shot Learning with Retrieval Augmented Language Models | Aug 5, 2022 | Fact CheckingFew-Shot Learning | CodeCode Available | 2 |
| A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions | Nov 9, 2023 | HallucinationInformation Retrieval | CodeCode Available | 2 |
| FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models | Apr 24, 2025 | Answer SelectionInformation Retrieval | CodeCode Available | 2 |
| MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval | Jul 2, 2023 | Biomedical Information RetrievalContrastive Learning | CodeCode Available | 2 |
| GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music | Oct 11, 2020 | Information RetrievalMusic Information Retrieval | CodeCode Available | 2 |
| Differential Transformer | Oct 7, 2024 | HallucinationIn-Context Learning | CodeCode Available | 2 |
| All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio | Jul 31, 2023 | AllDownbeat Tracking | CodeCode Available | 2 |
| AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval | Apr 9, 2024 | AllInformation Retrieval | CodeCode Available | 2 |
| AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark | Dec 17, 2024 | Information RetrievalRetrieval | CodeCode Available | 2 |
| Eureka: Evaluating and Understanding Large Foundation Models | Sep 13, 2024 | Information Retrieval | CodeCode Available | 2 |