| ClearPose: Large-scale Transparent Object Dataset and Benchmark | Mar 8, 2022 | BenchmarkingDepth Completion | CodeCode Available | 1 |
| Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic Materials | Nov 6, 2021 | BenchmarkingNeural Network simulation | CodeCode Available | 1 |
| Large Scale MRI Collection and Segmentation of Cirrhotic Liver | Oct 6, 2024 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| BeHonest: Benchmarking Honesty in Large Language Models | Jun 19, 2024 | BenchmarkingMisinformation | CodeCode Available | 1 |
| A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain | Jun 1, 2022 | BenchmarkingEmotion Recognition | CodeCode Available | 1 |
| AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling | Nov 1, 2021 | Benchmarkingobject-detection | CodeCode Available | 1 |
| EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography | Oct 31, 2024 | BenchmarkingElectromyography (EMG) | CodeCode Available | 1 |
| Geometric Deep Learning for Structure-Based Drug Design: A Survey | Jun 20, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 |
| A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial Attacks | Dec 20, 2022 | 3D Object DetectionBenchmarking | CodeCode Available | 1 |
| Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning | Nov 29, 2024 | BenchmarkingDeepFake Detection | CodeCode Available | 1 |
| A multi-schematic classifier-independent oversampling approach for imbalanced datasets | Jul 15, 2021 | Benchmarking | CodeCode Available | 1 |
| End-to-end Knowledge Retrieval with Multi-modal Queries | Jun 1, 2023 | BenchmarkingCross-Modal Retrieval | CodeCode Available | 1 |
| Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers | Jul 9, 2020 | Benchmarking | CodeCode Available | 1 |
| Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization | May 27, 2025 | Benchmarking | CodeCode Available | 1 |
| Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective | Oct 8, 2024 | AttributeBenchmarking | CodeCode Available | 1 |
| BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models | Dec 5, 2023 | BenchmarkingVisual Question Answering | CodeCode Available | 1 |
| Coarse-to-Fine Q-attention with Learned Path Ranking | Apr 4, 2022 | Benchmarking | CodeCode Available | 1 |
| A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis | Aug 12, 2021 | BenchmarkingMedical Image Analysis | CodeCode Available | 1 |
| Anabranch Network for Camouflaged Object Segmentation | May 20, 2021 | BenchmarkingCamouflaged Object Segmentation | CodeCode Available | 1 |
| Evaluating Attribution for Graph Neural Networks | Dec 1, 2020 | Benchmarking | CodeCode Available | 1 |
| CIBench: Evaluating Your LLMs with a Code Interpreter Plugin | Jul 15, 2024 | Benchmarking | CodeCode Available | 1 |
| Evaluating Multimodal Representations on Visual Semantic Textual Similarity | Apr 4, 2020 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Evaluation of large language models for discovery of gene set function | Sep 7, 2023 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness | Jul 13, 2020 | Benchmarking | CodeCode Available | 1 |
| Benchmarking deep inverse models over time, and the neural-adjoint method | Sep 27, 2020 | Benchmarking | CodeCode Available | 1 |
| A Comprehensive Overview of Large Language Models | Jul 12, 2023 | Benchmarking | CodeCode Available | 1 |
| Examining the Effects of Degree Distribution and Homophily in Graph Learning Models | Jul 17, 2023 | BenchmarkingGraph Clustering | CodeCode Available | 1 |
| Leveraging Trust for Joint Multi-Objective and Multi-Fidelity Optimization | Dec 27, 2021 | Bayesian OptimizationBenchmarking | CodeCode Available | 1 |
| Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling | May 23, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19 | Feb 9, 2021 | BenchmarkingQ-Learning | CodeCode Available | 1 |
| CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning | Feb 20, 2024 | Atomic number classificationBenchmarking | CodeCode Available | 1 |
| Exploring Large Language Models for Classical Philology | May 23, 2023 | BenchmarkingDecoder | CodeCode Available | 1 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 |
| AirSim Drone Racing Lab | Mar 12, 2020 | BenchmarkingOptical Flow Estimation | CodeCode Available | 1 |
| FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs | Mar 27, 2025 | AttributeBenchmarking | CodeCode Available | 1 |
| Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints | Apr 18, 2023 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 |
| Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension | Mar 26, 2022 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms | Aug 25, 2017 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 |
| A SWAT-based Reinforcement Learning Framework for Crop Management | Feb 10, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 |
| featsel: A framework for benchmarking of feature selection algorithms and cost functions | Jul 19, 2017 | BenchmarkingComputational Efficiency | CodeCode Available | 1 |
| Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations | Mar 21, 2024 | BenchmarkingMemorization | CodeCode Available | 1 |
| Benchmarking Adversarial Patch Against Aerial Detection | Oct 30, 2022 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Data Science Agents | Feb 27, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 |
| FELM: Benchmarking Factuality Evaluation of Large Language Models | Oct 1, 2023 | BenchmarkingMath | CodeCode Available | 1 |
| CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling | Jan 21, 2024 | Benchmarking | CodeCode Available | 1 |
| Benchmarking Adversarial Robustness on Image Classification | Jun 1, 2020 | Adversarial AttackAdversarial Robustness | CodeCode Available | 1 |
| CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods | Aug 2, 2022 | BenchmarkingCausal Discovery | CodeCode Available | 1 |
| FineSurE: Fine-grained Summarization Evaluation using LLMs | Jul 1, 2024 | BenchmarkingHallucination | CodeCode Available | 1 |
| CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | Apr 6, 2025 | BenchmarkingCombinatorial Optimization | CodeCode Available | 1 |
| CommonPower: A Framework for Safe Data-Driven Smart Grid Control | Jun 5, 2024 | Benchmarkingenergy management | CodeCode Available | 1 |