| Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments | Dec 16, 2024 | Clinical KnowledgeCollege Medicine | CodeCode Available | 1 |
| MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports | May 16, 2025 | DiagnosticMath | CodeCode Available | 1 |
| Large Language Models Encode Clinical Knowledge | Dec 26, 2022 | Clinical KnowledgeMedQA | CodeCode Available | 1 |
| LOGO -- Long cOntext aliGnment via efficient preference Optimization | Oct 24, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration | Oct 3, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning | Apr 17, 2023 | MMLUZero-shot Generalization | CodeCode Available | 1 |
| An Open Source Data Contamination Report for Large Language Models | Oct 26, 2023 | HellaSwagLanguage Modeling | CodeCode Available | 1 |
| Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models | May 19, 2025 | BenchmarkingChatbot | CodeCode Available | 1 |
| LawInstruct: A Resource for Studying Language Model Adaptation to the Legal Domain | Apr 2, 2024 | Argument MiningDecision Making | CodeCode Available | 1 |
| Gemini: A Family of Highly Capable Multimodal Models | Dec 19, 2023 | 1 Image, 2*2 StitchingArithmetic Reasoning | CodeCode Available | 1 |