| Medical large language models are easily distracted | Apr 1, 2025 | RAGRetrieval-augmented Generation | CodeCode Available | 0 | 5 |
| MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation | Feb 24, 2025 | RAGRetrieval | CodeCode Available | 0 | 5 |
| Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science | Jun 4, 2025 | ArticlesCode Generation | CodeCode Available | 0 | 5 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 | 5 |
| ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures | Jun 14, 2024 | Answer GenerationBenchmarking | CodeCode Available | 0 | 5 |
| Climate Finance Bench | May 28, 2025 | Logical ReasoningQuantization | CodeCode Available | 0 | 5 |
| MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification | Oct 19, 2024 | Code GenerationRAG | CodeCode Available | 0 | 5 |
| LSRP: A Leader-Subordinate Retrieval Framework for Privacy-Preserving Cloud-Device Collaboration | May 8, 2025 | Privacy PreservingRAG | CodeCode Available | 0 | 5 |
| LTRR: Learning To Rank Retrievers for LLMs | Jun 16, 2025 | Learning-To-RankRAG | CodeCode Available | 0 | 5 |
| ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling | Feb 21, 2024 | MMLURetrieval | CodeCode Available | 0 | 5 |