| CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models | May 20, 2024 | BenchmarkingDiversity | —Unverified | 0 |
| DispaRisk: Auditing Fairness Through Usable Information | May 20, 2024 | BenchmarkingBias Detection | CodeCode Available | 0 |
| EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models | May 18, 2024 | BenchmarkingSpecificity | —Unverified | 0 |
| From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT | May 17, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge | May 17, 2024 | BenchmarkingSocial Media Popularity Prediction | —Unverified | 0 |
| BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions | May 17, 2024 | BenchmarkingPrognosis | —Unverified | 0 |
| An Integrated Framework for Multi-Granular Explanation of Video Summarization | May 16, 2024 | BenchmarkingPanoptic Segmentation | CodeCode Available | 0 |
| Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions | May 16, 2024 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 0 |
| A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text | May 16, 2024 | Anomaly DetectionBenchmarking | —Unverified | 0 |
| SpeechVerse: A Large-scale Generalizable Audio Language Model | May 14, 2024 | Automatic Speech RecognitionBenchmarking | —Unverified | 0 |