| CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings | Jan 2, 2025 | BenchmarkingCode Generation | —Unverified | 0 |
| MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception | Jan 2, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery | Jan 2, 2025 | BenchmarkingExperimental Design | CodeCode Available | 0 |
| State-of-the-art AI-based Learning Approaches for Deepfake Generation and Detection, Analyzing Opportunities, Threading through Pros, Cons, and Future Prospects | Jan 2, 2025 | BenchmarkingFace Swapping | —Unverified | 0 |
| TabTreeFormer: Tabular Data Generation Using Hybrid Tree-Transformer | Jan 2, 2025 | BenchmarkingQuantization | —Unverified | 0 |
| Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models | Jan 1, 2025 | Benchmarking | —Unverified | 0 |
| Segmenting Maxillofacial Structures in CBCT Volumes | Jan 1, 2025 | AnatomyBenchmarking | —Unverified | 0 |
| CroCoDL: Cross-device Collaborative Dataset for Localization | Jan 1, 2025 | BenchmarkingPose Estimation | —Unverified | 0 |
| CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools | Jan 1, 2025 | Benchmarking | —Unverified | 0 |
| CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices | Jan 1, 2025 | Benchmarking | —Unverified | 0 |