| Benchmarking deep inverse models over time, and the neural-adjoint method | Sep 27, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification | Nov 28, 2022 | Benchmarkingimage-classification | CodeCode Available | 1 | 5 |
| Benchmarking Offline Reinforcement Learning on Real-Robot Hardware | Jul 28, 2023 | Benchmarkingreinforcement-learning | CodeCode Available | 1 | 5 |
| AnomalyHop: An SSL-based Image Anomaly Localization Method | May 8, 2021 | Anomaly LocalizationBenchmarking | CodeCode Available | 1 | 5 |
| Evaluating Multimodal Representations on Visual Semantic Textual Similarity | Apr 4, 2020 | BenchmarkingImage Captioning | CodeCode Available | 1 | 5 |
| Evaluation of large language models for discovery of gene set function | Sep 7, 2023 | BenchmarkingLanguage Modelling | CodeCode Available | 1 | 5 |
| Benchmarking Natural Language Understanding Services for building Conversational Agents | Mar 13, 2019 | BenchmarkingGeneral Classification | CodeCode Available | 1 | 5 |
| Evaluating Adversarial Attacks on ImageNet: A Reality Check on Misclassification Classes | Nov 22, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Deep Learning Interpretability in Time Series Predictions | Oct 26, 2020 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit | Sep 7, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics | Aug 2, 2024 | Adversarial AttackAdversarial Purification | CodeCode Available | 1 | 5 |
| An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition | Oct 17, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Deep Models for Salient Object Detection | Feb 7, 2022 | BenchmarkingObject | CodeCode Available | 1 | 5 |
| Benchmarking Multi-Scene Fire and Smoke Detection | Oct 22, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Evaluating Attribution for Graph Neural Networks | Dec 1, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments | Oct 18, 2024 | Autonomous NavigationBenchmarking | CodeCode Available | 1 | 5 |
| CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer | Dec 2, 2021 | BenchmarkingOrdinal Classification | CodeCode Available | 1 | 5 |
| Benchmarking Neural Network Generalization for Grammar Induction | Aug 16, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Data-Driven Denoising of Stationary Accelerometer Signals | Jun 13, 2022 | BenchmarkingDenoising | CodeCode Available | 1 | 5 |
| Curious Hierarchical Actor-Critic Reinforcement Learning | May 7, 2020 | BenchmarkingHierarchical Reinforcement Learning | CodeCode Available | 1 | 5 |
| Benchmarking emergency department triage prediction models with machine learning and large public electronic health records | Nov 22, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models | May 26, 2025 | BenchmarkingRAG | CodeCode Available | 1 | 5 |
| Benchmarking Detection Transfer Learning with Vision Transformers | Nov 22, 2021 | Benchmarkingobject-detection | CodeCode Available | 1 | 5 |
| 3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding | Oct 16, 2023 | Action RecognitionBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness | Mar 24, 2025 | BenchmarkingSemantic Segmentation | CodeCode Available | 1 | 5 |