| The GigaMIDI Dataset with Features for Expressive Music Performance Detection | Feb 24, 2025 | Information RetrievalMusic Information Retrieval | CodeCode Available | 2 |
| VUS: Effective and Efficient Accuracy Measures for Time-Series Anomaly Detection | Feb 18, 2025 | Anomaly DetectionInformation Retrieval | CodeCode Available | 2 |
| GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity? | Feb 7, 2025 | 8kInformation Retrieval | CodeCode Available | 2 |
| AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark | Dec 17, 2024 | Information RetrievalRetrieval | CodeCode Available | 2 |
| LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant | Dec 2, 2024 | Contrastive LearningInformation Retrieval | CodeCode Available | 2 |
| Lightning IR: Straightforward Fine-tuning and Inference of Transformer-based Language Models for Information Retrieval | Nov 7, 2024 | Information RetrievalRe-Ranking | CodeCode Available | 2 |
| DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning | Oct 28, 2024 | Binary ClassificationContrastive Learning | CodeCode Available | 2 |
| CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models | Oct 17, 2024 | Contrastive LearningDiversity | CodeCode Available | 2 |
| Differential Transformer | Oct 7, 2024 | HallucinationIn-Context Learning | CodeCode Available | 2 |
| Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval | Oct 6, 2024 | Community DetectionInformation Retrieval | CodeCode Available | 2 |
| Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models | Sep 17, 2024 | Information RetrievalRetrieval | CodeCode Available | 2 |
| Eureka: Evaluating and Understanding Large Foundation Models | Sep 13, 2024 | Information Retrieval | CodeCode Available | 2 |
| Language Model Powered Digital Biology with BRAD | Sep 4, 2024 | ChatbotCode Generation | CodeCode Available | 2 |
| MemLong: Memory-Augmented Retrieval for Long Text Modeling | Aug 30, 2024 | 4kDecoder | CodeCode Available | 2 |
| Scientific QA System with Verifiable Answers | Jul 16, 2024 | ArticlesInformation Retrieval | CodeCode Available | 2 |
| Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation | Jul 15, 2024 | Information RetrievalKnowledge Graphs | CodeCode Available | 2 |
| PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents | Jul 12, 2024 | Information RetrievalQuestion Answering | CodeCode Available | 2 |
| CoIR: A Comprehensive Benchmark for Code Information Retrieval Models | Jul 3, 2024 | BenchmarkingCode Search | CodeCode Available | 2 |
| FIRST: Faster Improved Listwise Reranking with Single Token Decoding | Jun 21, 2024 | Information RetrievalLanguage Modeling | CodeCode Available | 2 |
| Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Jun 17, 2024 | Benchmarking | CodeCode Available | 2 |
| MidiCaps: A large-scale MIDI dataset with text captions | Jun 4, 2024 | Information RetrievalMusic Information Retrieval | CodeCode Available | 2 |
| Evaluation of Retrieval-Augmented Generation: A Survey | May 13, 2024 | Information RetrievalRAG | CodeCode Available | 2 |
| Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records | May 4, 2024 | Information RetrievalQuestion Answering | CodeCode Available | 2 |
| Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era | Apr 17, 2024 | FairnessInformation Retrieval | CodeCode Available | 2 |
| AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval | Apr 9, 2024 | AllInformation Retrieval | CodeCode Available | 2 |
| FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions | Mar 22, 2024 | Information RetrievalRetrieval | CodeCode Available | 2 |
| Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers | Mar 22, 2024 | Information Retrieval | CodeCode Available | 2 |
| Backtracing: Retrieving the Cause of the Query | Mar 6, 2024 | Information RetrievalLanguage Modeling | CodeCode Available | 2 |
| Verif.ai: Towards an Open-Source Scientific Generative Question-Answering System with Referenced and Verifiable Answers | Feb 9, 2024 | Generative Question AnsweringInformation Retrieval | CodeCode Available | 2 |
| The Power of Noise: Redefining Retrieval for RAG Systems | Jan 26, 2024 | Information RetrievalRAG | CodeCode Available | 2 |
| Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis | Jan 22, 2024 | Document Layout AnalysisDocument Summarization | CodeCode Available | 2 |
| BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics | Dec 12, 2023 | Information RetrievalRepresentation Learning | CodeCode Available | 2 |
| RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! | Dec 5, 2023 | Information RetrievalReranking | CodeCode Available | 2 |
| SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models | Nov 16, 2023 | Conversational SearchIn-Context Learning | CodeCode Available | 2 |
| Mustango: Toward Controllable Text-to-Music Generation | Nov 14, 2023 | Data AugmentationDenoising | CodeCode Available | 2 |
| A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions | Nov 9, 2023 | HallucinationInformation Retrieval | CodeCode Available | 2 |
| A Foundation Model for Music Informatics | Nov 6, 2023 | Information Retrievalmodel | CodeCode Available | 2 |
| RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models | Sep 26, 2023 | Information RetrievalReranking | CodeCode Available | 2 |
| CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs | Aug 29, 2023 | CPUGPU | CodeCode Available | 2 |
| Large Language Models for Information Retrieval: A Survey | Aug 14, 2023 | Information RetrievalQuestion Answering | CodeCode Available | 2 |
| All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio | Jul 31, 2023 | AllDownbeat Tracking | CodeCode Available | 2 |
| InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval | Jul 10, 2023 | GPUInformation Retrieval | CodeCode Available | 2 |
| MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval | Jul 2, 2023 | Biomedical Information RetrievalContrastive Learning | CodeCode Available | 2 |
| RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit | Jun 8, 2023 | Answer GenerationFact Checking | CodeCode Available | 2 |
| WebCPM: Interactive Web Search for Chinese Long-form Question Answering | May 11, 2023 | FormInformation Retrieval | CodeCode Available | 2 |
| Autonomous GIS: the next-generation AI-powered GIS | May 10, 2023 | Code GenerationInformation Retrieval | CodeCode Available | 2 |
| RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models | May 4, 2023 | Information RetrievalOpen-Domain Question Answering | CodeCode Available | 2 |
| Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents | Apr 19, 2023 | Information RetrievalPassage Ranking | CodeCode Available | 2 |
| UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers | Mar 1, 2023 | Domain AdaptationInformation Retrieval | CodeCode Available | 2 |
| InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval | Jan 4, 2023 | Information RetrievalRetrieval | CodeCode Available | 2 |