| Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset | Jun 5, 2023 | BenchmarkingMultiple-choice | CodeCode Available | 1 | 5 |
| DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios | Oct 31, 2024 | BenchmarkingLLM-generated Text Detection | CodeCode Available | 1 | 5 |
| Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers | Jul 3, 2020 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| Descending through a Crowded Valley — Benchmarking Deep Learning Optimizers | Jan 1, 2021 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| EvalCrafter: Benchmarking and Evaluating Large Video Generation Models | Oct 17, 2023 | BenchmarkingLanguage Modelling | CodeCode Available | 1 | 5 |
| Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering | Aug 31, 2023 | BenchmarkingDataset Generation | CodeCode Available | 1 | 5 |
| Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining | Nov 22, 2017 | Benchmarkingfeature selection | CodeCode Available | 1 | 5 |
| Evaluating Attribution for Graph Neural Networks | Dec 1, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models on Controllable Generation under Diversified Instructions | Jan 1, 2024 | BenchmarkingInstruction Following | CodeCode Available | 1 | 5 |
| A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain | Jun 1, 2022 | BenchmarkingEmotion Recognition | CodeCode Available | 1 | 5 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 | 5 |
| Geometric Deep Learning for Structure-Based Drug Design: A Survey | Jun 20, 2023 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial Attacks | Dec 20, 2022 | 3D Object DetectionBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions | Feb 28, 2024 | BenchmarkingMultiple-choice | CodeCode Available | 1 | 5 |
| Benchmarking Robustness of 3D Object Detection to Common Corruptions | Jan 1, 2023 | 3D Object DetectionAutonomous Driving | CodeCode Available | 1 | 5 |
| DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects | May 9, 2023 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| EventEA: Benchmarking Entity Alignment for Event-centric Knowledge Graphs | Nov 5, 2022 | AttributeBenchmarking | CodeCode Available | 1 | 5 |
| A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis | Aug 12, 2021 | BenchmarkingMedical Image Analysis | CodeCode Available | 1 | 5 |
| Benchmarking saliency methods for chest X-ray interpretation | Oct 10, 2022 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| Benchmarking Robustness to Adversarial Image Obfuscations | Jan 30, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Beacon, a lightweight deep reinforcement learning benchmark library for flow control | Feb 27, 2024 | BenchmarkingCPU | CodeCode Available | 1 | 5 |
| Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging | Apr 22, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Explainable Benchmarking for Iterative Optimization Heuristics | Jan 31, 2024 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 1 | 5 |
| Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency | Jun 14, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models for News Summarization | Jan 31, 2023 | BenchmarkingNews Summarization | CodeCode Available | 1 | 5 |