| A SWAT-based Reinforcement Learning Framework for Crop Management | Feb 10, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness | Mar 24, 2025 | BenchmarkingSemantic Segmentation | CodeCode Available | 1 | 5 |
| Benchmarks for Deep Off-Policy Evaluation | Mar 30, 2021 | Benchmarkingcontinuous-control | CodeCode Available | 1 | 5 |
| Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions | May 27, 2022 | BenchmarkingFew-Shot Image Classification | CodeCode Available | 1 | 5 |
| Recent Advances on Neural Network Pruning at Initialization | Mar 11, 2021 | BenchmarkingNetwork Pruning | CodeCode Available | 1 | 5 |
| Boosting Neural Image Compression for Machines Using Latent Space Masking | Dec 15, 2021 | BenchmarkingImage Compression | CodeCode Available | 1 | 5 |
| Enhancing Biomedical Relation Extraction with Directionality | Jan 23, 2025 | BenchmarkingDocument-level Relation Extraction | CodeCode Available | 1 | 5 |
| Benchmarking Algorithms for Federated Domain Generalization | Jul 11, 2023 | BenchmarkingDiversity | CodeCode Available | 1 | 5 |
| Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfiler | Feb 2, 2023 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 1 | 5 |
| BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text | Apr 28, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| Evaluating Adversarial Attacks on ImageNet: A Reality Check on Misclassification Classes | Nov 22, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Federated Learning Under Intermittent Client Availability and Time-Varying Communication Constraints | May 13, 2022 | BenchmarkingFederated Learning | CodeCode Available | 1 | 5 |
| Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single Images | Dec 8, 2023 | BenchmarkingObject | CodeCode Available | 1 | 5 |
| Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase | Jun 21, 2023 | 3D-Aware Image SynthesisBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms | Sep 21, 2022 | 3D human pose and shape estimationBenchmarking | CodeCode Available | 1 | 5 |
| A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular Data | Jun 20, 2024 | BenchmarkingKolmogorov-Arnold Networks | CodeCode Available | 1 | 5 |
| Benchmarking and scaling of deep learning models for land cover image classification | Nov 18, 2021 | BenchmarkingClassification | CodeCode Available | 1 | 5 |
| Benchmarking and Analyzing Point Cloud Classification under Corruptions | Feb 7, 2022 | BenchmarkingClassification | CodeCode Available | 1 | 5 |
| Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples | Jul 31, 2023 | Adversarial RobustnessBenchmarking | CodeCode Available | 1 | 5 |
| 4D Panoptic LiDAR Segmentation | Feb 24, 2021 | 4D Panoptic SegmentationBenchmarking | CodeCode Available | 1 | 5 |
| Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding | Jul 17, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| Ego-Body Pose Estimation via Ego-Head Pose Estimation | Dec 9, 2022 | BenchmarkingDisentanglement | CodeCode Available | 1 | 5 |
| Benchmarking Micro-action Recognition: Dataset, Methods, and Applications | Mar 8, 2024 | Action RecognitionBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models | Dec 21, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| A Closer Look at Mortality Risk Prediction from Electrocardiograms | Jun 24, 2024 | BenchmarkingPrediction | CodeCode Available | 1 | 5 |
| EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios | May 22, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling | Oct 14, 2022 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| ByzFL: Research Framework for Robust Federated Learning | May 30, 2025 | BenchmarkingFederated Learning | CodeCode Available | 1 | 5 |
| Benchmarking of DL Libraries and Models on Mobile Devices | Feb 14, 2022 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach | Oct 10, 2023 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Benchmarking Meta-embeddings: What Works and What Does Not | Nov 1, 2021 | BenchmarkingEmbeddings Evaluation | CodeCode Available | 1 | 5 |
| EgoNormia: Benchmarking Physical Social Norm Understanding | Feb 27, 2025 | Answer GenerationBenchmarking | CodeCode Available | 1 | 5 |
| A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges | Oct 21, 2022 | BenchmarkingCommunity Detection | CodeCode Available | 1 | 5 |
| COSMOS: Catching Out-of-Context Misinformation with Self-Supervised Learning | Jan 15, 2021 | BenchmarkingMisinformation | CodeCode Available | 1 | 5 |
| AIPerf: Automated machine learning as an AI-HPC benchmark | Aug 17, 2020 | AutoMLBenchmarking | CodeCode Available | 1 | 5 |
| Can Language Models Make Fun? A Case Study in Chinese Comical Crosstalk | Jul 2, 2022 | BenchmarkingMachine Translation | CodeCode Available | 1 | 5 |
| Benchmarking machine learning models on multi-centre eICU critical care dataset | Oct 2, 2019 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 1 | 5 |
| Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym | Dec 6, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| Benchmarking Low-Shot Robustness to Natural Distribution Shifts | Apr 21, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection | Mar 12, 2025 | BenchmarkingCode Classification | CodeCode Available | 1 | 5 |
| Improving and Benchmarking Offline Reinforcement Learning Algorithms | Jun 1, 2023 | AttributeBenchmarking | CodeCode Available | 1 | 5 |
| IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds | Apr 25, 2023 | BenchmarkingPose Estimation | CodeCode Available | 1 | 5 |
| 4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs | Apr 28, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking and Survey of Explanation Methods for Black Box Models | Feb 25, 2021 | BenchmarkingSurvey | CodeCode Available | 1 | 5 |
| An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders | Jun 4, 2024 | BenchmarkingClustering | CodeCode Available | 1 | 5 |
| ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning | Feb 8, 2022 | BenchmarkingLanguage Modelling | CodeCode Available | 1 | 5 |
| Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition | Sep 25, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 1 | 5 |
| AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets | May 7, 2024 | BenchmarkingCancer Classification | CodeCode Available | 1 | 5 |
| CattleFace-RGBT: RGB-T Cattle Facial Landmark Benchmark | Jun 5, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Meaning Representations in Neural Semantic Parsing | Nov 1, 2020 | BenchmarkingSemantic Parsing | CodeCode Available | 1 | 5 |