| Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy | Aug 9, 2024 | BenchmarkingMedical Image Analysis | CodeCode Available | 0 | 5 |
| Chumor 2.0: Towards Benchmarking Chinese Humor Understanding | Dec 23, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance | Sep 22, 2024 | AutoMLBenchmarking | CodeCode Available | 0 | 5 |
| An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science | Feb 23, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 | 5 |
| InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions | Oct 18, 2023 | BenchmarkingVisual Grounding | CodeCode Available | 0 | 5 |
| Learn How to Query from Unlabeled Data Streams in Federated Learning | Dec 11, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM | Oct 8, 2014 | Benchmarking | CodeCode Available | 0 | 5 |
| Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study | Feb 11, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 0 | 5 |
| Inverse Contextual Bandits: Learning How Behavior Evolves over Time | Jul 13, 2021 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-Bench | Apr 1, 2025 | Benchmarking | CodeCode Available | 0 | 5 |