| BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science | Jun 29, 2024 | AI AgentClaim Verification | CodeCode Available | 0 | 5 |
| Improving RAG for Personalization with Author Features and Contrastive Examples | Mar 24, 2025 | RAGRetrieval-augmented Generation | CodeCode Available | 0 | 5 |
| Awakening Augmented Generation: Learning to Awaken Internal Knowledge of Large Language Models for Question Answering | Mar 22, 2024 | Open-Domain Question AnsweringOut-of-Distribution Generalization | CodeCode Available | 0 | 5 |
| Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems | Sep 29, 2024 | FairnessOpen-Domain Question Answering | CodeCode Available | 0 | 5 |
| Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings | Mar 19, 2025 | Instruction FollowingLarge Language Model | CodeCode Available | 0 | 5 |
| AI-University: An LLM-based platform for instructional alignment to scientific classrooms | Apr 11, 2025 | Large Language ModelRAG | CodeCode Available | 0 | 5 |
| Improving In-Context Learning with Small Language Model Ensembles | Oct 29, 2024 | Domain LabellingIn-Context Learning | CodeCode Available | 0 | 5 |
| Hypercube-RAG: Hypercube-Based Retrieval-Augmented Generation for In-domain Scientific Question-Answering | May 25, 2025 | Question AnsweringRAG | CodeCode Available | 0 | 5 |
| IITK at SemEval-2024 Task 2: Exploring the Capabilities of LLMs for Safe Biomedical Natural Language Inference for Clinical Trials | Apr 6, 2024 | Natural Language InferenceRAG | CodeCode Available | 0 | 5 |
| Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents | Nov 23, 2024 | Question AnsweringRAG | CodeCode Available | 0 | 5 |