| LightRAG: Simple and Fast Retrieval-Augmented Generation | Oct 8, 2024 | Information RetrievalRAG | CodeCode Available | 14 |
| Language agents achieve superhuman synthesis of scientific knowledge | Sep 10, 2024 | ArticlesInformation Retrieval | CodeCode Available | 9 |
| MindSearch: Mimicking Human Minds Elicits Deep AI Searcher | Jul 29, 2024 | 2D Semantic Segmentation task 1 (8 classes)graph construction | CodeCode Available | 9 |
| PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods | Jul 9, 2024 | Information RetrievalLEMMA | CodeCode Available | 7 |
| Benchmarking the Myopic Trap: Positional Bias in Information Retrieval | May 20, 2025 | BenchmarkingInformation Retrieval | CodeCode Available | 5 |
| Make Your LLM Fully Utilize the Context | Apr 25, 2024 | 4kInformation Retrieval | CodeCode Available | 5 |
| Retrieval-Augmented Generation for AI-Generated Content: A Survey | Feb 29, 2024 | Information RetrievalLarge Language Model | CodeCode Available | 5 |
| Extreme Compression of Large Language Models via Additive Quantization | Jan 11, 2024 | CPUGPU | CodeCode Available | 5 |
| From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents | Jun 23, 2025 | Information RetrievalRetrieval | CodeCode Available | 4 |
| DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents | Jun 13, 2025 | Information RetrievalRetrieval | CodeCode Available | 4 |
| SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis | May 22, 2025 | DiversityInformation Retrieval | CodeCode Available | 4 |
| Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models | Mar 11, 2025 | FormInformation Retrieval | CodeCode Available | 4 |
| DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning | Feb 28, 2025 | Information Retrievalreinforcement-learning | CodeCode Available | 4 |
| Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation | Feb 4, 2025 | BenchmarkingInformation Retrieval | CodeCode Available | 4 |
| iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models | Sep 5, 2024 | Few-Shot LearningInformation Retrieval | CodeCode Available | 4 |
| Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation | Aug 8, 2024 | ChunkingFact Checking | CodeCode Available | 4 |
| COS-Mix: Cosine Similarity and Distance Fusion for Improved Information Retrieval | Jun 2, 2024 | Information RetrievalRAG | CodeCode Available | 4 |
| Benchmarking Retrieval-Augmented Generation for Medicine | Feb 20, 2024 | BenchmarkingInformation Retrieval | CodeCode Available | 4 |
| Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard | Jun 13, 2023 | Information RetrievalRepresentation Learning | CodeCode Available | 4 |
| AlignScore: Evaluating Factual Consistency with a Unified Alignment Function | May 26, 2023 | Fact VerificationInformation Retrieval | CodeCode Available | 4 |
| AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators | Mar 29, 2023 | Information RetrievalRetrieval | CodeCode Available | 4 |
| ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge | Mar 24, 2023 | Information RetrievalLanguage Modeling | CodeCode Available | 4 |
| SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes | Feb 13, 2023 | Information RetrievalRetrieval | CodeCode Available | 4 |
| One Embedder, Any Task: Instruction-Finetuned Text Embeddings | Dec 19, 2022 | Information RetrievalLearning Word Embeddings | CodeCode Available | 4 |
| MTEB: Massive Text Embedding Benchmark | Oct 13, 2022 | BenchmarkingInformation Retrieval | CodeCode Available | 4 |
| PLAID: An Efficient Engine for Late Interaction Retrieval | May 19, 2022 | CPUGPU | CodeCode Available | 4 |
| A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications | Jun 14, 2025 | Information RetrievalSurvey | CodeCode Available | 3 |
| Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers | May 26, 2025 | Information Retrieval | CodeCode Available | 3 |
| Distance Adaptive Beam Search for Provably Accurate Graph-Based Nearest Neighbor Search | May 21, 2025 | Information Retrieval | CodeCode Available | 3 |
| ReasonIR: Training Retrievers for Reasoning Tasks | Apr 29, 2025 | Information RetrievalMMLU | CodeCode Available | 3 |
| REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites | Apr 15, 2025 | Autonomous Web NavigationBenchmarking | CodeCode Available | 3 |
| Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval | Feb 17, 2025 | Information RetrievalRetrieval | CodeCode Available | 3 |
| BMX: Entropy-weighted Similarity and Semantic-enhanced Lexical Search | Aug 13, 2024 | Information RetrievalRetrieval | CodeCode Available | 3 |
| Music2Latent: Consistency Autoencoders for Latent Audio Compression | Aug 12, 2024 | Audio CompressionInformation Retrieval | CodeCode Available | 3 |
| Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective | Jul 9, 2024 | Information RetrievalRetrieval | CodeCode Available | 3 |
| MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels | May 13, 2024 | Information RetrievalRetrieval | CodeCode Available | 3 |
| From Matching to Generation: A Survey on Generative Information Retrieval | Apr 23, 2024 | Incremental LearningInformation Retrieval | CodeCode Available | 3 |
| RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Mar 8, 2024 | Code GenerationHallucination | CodeCode Available | 3 |
| When Large Language Models Meet Vector Databases: A Survey | Jan 30, 2024 | HallucinationInformation Retrieval | CodeCode Available | 3 |
| INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning | Jan 12, 2024 | Diversitydocument understanding | CodeCode Available | 3 |
| WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia | May 23, 2023 | ChatbotHallucination | CodeCode Available | 3 |
| Dataset and Baseline System for Multi-lingual Extraction and Normalization of Temporal and Numerical Expressions | Mar 31, 2023 | Date UnderstandingInformation Retrieval | CodeCode Available | 3 |
| FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models | Apr 24, 2025 | Answer SelectionInformation Retrieval | CodeCode Available | 2 |
| Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey | Apr 21, 2025 | Computational EfficiencyInformation Retrieval | CodeCode Available | 2 |
| GENIUS: A Generative Framework for Universal Multimodal Search | Mar 25, 2025 | Information RetrievalQuantization | CodeCode Available | 2 |
| UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis | Mar 20, 2025 | Document Layout AnalysisDocument Summarization | CodeCode Available | 2 |
| A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval | Mar 7, 2025 | Information RetrievalLanguage Modeling | CodeCode Available | 2 |
| SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking | Mar 2, 2025 | Fact CheckingFact Verification | CodeCode Available | 2 |
| Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions | Mar 1, 2025 | Information RetrievalRAG | CodeCode Available | 2 |
| Rank1: Test-Time Compute for Reranking in Information Retrieval | Feb 25, 2025 | Information RetrievalInstruction Following | CodeCode Available | 2 |