| Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories | Sep 8, 2021 | FormLanguage Modeling | CodeCode Available | 0 | 5 |
| MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge | Dec 22, 2024 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| MedMobile: A mobile-sized language model with expert-level clinical capabilities | Oct 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging | Oct 9, 2024 | Age EstimationFairness | CodeCode Available | 0 | 5 |
| MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation | Feb 24, 2025 | RAGRetrieval | CodeCode Available | 0 | 5 |
| Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | Feb 7, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 0 | 5 |
| Medical large language models are easily distracted | Apr 1, 2025 | RAGRetrieval-augmented Generation | CodeCode Available | 0 | 5 |
| Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science | Jun 4, 2025 | ArticlesCode Generation | CodeCode Available | 0 | 5 |
| Ask, Retrieve, Summarize: A Modular Pipeline for Scientific Literature Summarization | May 22, 2025 | Document SummarizationMulti-Document Summarization | CodeCode Available | 0 | 5 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 | 5 |