| Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding | May 21, 2024 | BenchmarkingKeypoint Detection | CodeCode Available | 1 |
| CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models | May 20, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods | May 20, 2024 | BenchmarkingExplainable artificial intelligence | —Unverified | 0 |
| Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning | May 20, 2024 | BenchmarkingMRI segmentation | CodeCode Available | 2 |
| DispaRisk: Auditing Fairness Through Usable Information | May 20, 2024 | BenchmarkingBias Detection | CodeCode Available | 0 |
| MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering | May 20, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 2 |
| EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models | May 18, 2024 | BenchmarkingSpecificity | —Unverified | 0 |
| From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT | May 17, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge | May 17, 2024 | BenchmarkingSocial Media Popularity Prediction | —Unverified | 0 |
| BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions | May 17, 2024 | BenchmarkingPrognosis | —Unverified | 0 |