| A Universal Protocol to Benchmark Camera Calibration for Sports | Apr 15, 2024 | BenchmarkingCamera Calibration | —Unverified | 0 |
| A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting | Apr 15, 2024 | Benchmarking | CodeCode Available | 0 |
| nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation | Apr 15, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 1 |
| A Large-Scale Evaluation of Speech Foundation Models | Apr 15, 2024 | Benchmarking | —Unverified | 0 |
| MMInA: Benchmarking Multihop Multimodal Internet Agents | Apr 15, 2024 | Benchmarking | —Unverified | 0 |
| MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems | Apr 15, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations | Apr 15, 2024 | BenchmarkingBias Detection | CodeCode Available | 1 |
| A Review and Efficient Implementation of Scene Graph Generation Metrics | Apr 15, 2024 | BenchmarkingGraph Generation | CodeCode Available | 1 |
| AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides | Apr 15, 2024 | BenchmarkingProtein Language Model | CodeCode Available | 0 |
| RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion | Apr 14, 2024 | BenchmarkingData Augmentation | CodeCode Available | 1 |