| Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis | Oct 9, 2023 | BenchmarkingMultivariate Time Series Forecasting | CodeCode Available | 3 |
| HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis | Jun 23, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 3 |
| Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks | Jun 12, 2024 | BenchmarkingChatbot | CodeCode Available | 3 |
| Advancing LLM Reasoning Generalists with Preference Trees | Apr 2, 2024 | BenchmarkingCode Generation | CodeCode Available | 3 |
| Benchmarking Automatic Machine Learning Frameworks | Aug 17, 2018 | Automated Feature EngineeringAutoML | CodeCode Available | 3 |
| ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems | Sep 2, 2024 | BenchmarkingInstruction Following | CodeCode Available | 3 |
| DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection | Apr 19, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 3 |
| CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving | Oct 11, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 3 |
| AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking | Jul 23, 2024 | BenchmarkingTransfer Learning | CodeCode Available | 3 |
| Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning | Jan 26, 2023 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 3 |