| RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation | Aug 15, 2024 | DiagnosticRAG | CodeCode Available | 5 |
| Molecular-driven Foundation Model for Oncologic Pathology | Jan 28, 2025 | BenchmarkingDiagnostic | CodeCode Available | 4 |
| RaTEScore: A Metric for Radiology Report Generation | Jun 24, 2024 | DiagnosticEntity Embeddings | CodeCode Available | 4 |
| Segment Anything in Medical Images | Apr 24, 2023 | DiagnosticImage Segmentation | CodeCode Available | 4 |
| sbi reloaded: a toolkit for simulation-based inference workflows | Nov 26, 2024 | Bayesian InferenceDiagnostic | CodeCode Available | 4 |
| GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images | Mar 8, 2025 | cross-modal alignmentDiagnostic | CodeCode Available | 3 |
| A Practical Probabilistic Benchmark for AI Weather Models | Jan 27, 2024 | DiagnosticWeather Forecasting | CodeCode Available | 3 |
| Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models | May 29, 2025 | Autonomous DrivingDiagnostic | CodeCode Available | 3 |
| A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation | Jan 22, 2024 | BenchmarkingDiagnostic | CodeCode Available | 3 |
| DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models | Feb 8, 2022 | DiagnosticImage Captioning | CodeCode Available | 3 |