| Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA | Dec 29, 2023 | AnatomyBenchmarking | CodeCode Available | 1 |
| APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond | Dec 25, 2023 | Animal Pose EstimationBenchmarking | CodeCode Available | 1 |
| Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models | Dec 21, 2023 | Benchmarking | CodeCode Available | 1 |
| RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation | Dec 21, 2023 | BenchmarkingProduct Recommendation | CodeCode Available | 1 |
| FiFAR: A Fraud Detection Dataset for Learning to Defer | Dec 20, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| TAO-Amodal: A Benchmark for Tracking Any Object Amodally | Dec 19, 2023 | Amodal TrackingAutonomous Driving | CodeCode Available | 1 |
| How to Train Neural Field Representations: A Comprehensive Study and Benchmark | Dec 16, 2023 | Benchmarking | CodeCode Available | 1 |
| Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models | Dec 15, 2023 | BenchmarkingCode Summarization | CodeCode Available | 1 |
| How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation | Dec 12, 2023 | Anomaly DetectionAutonomous Driving | CodeCode Available | 1 |
| EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning | Dec 11, 2023 | BenchmarkingHuman-Object Interaction Detection | CodeCode Available | 1 |