| Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments | May 8, 2025 | BenchmarkingPrompt Engineering | CodeCode Available | 1 | 5 |
| Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests | May 15, 2025 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow | May 23, 2025 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning | Dec 11, 2024 | AttributeBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset | Nov 5, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking Study | Feb 26, 2025 | BenchmarkingBlood pressure estimation | CodeCode Available | 1 | 5 |
| Generating a Doppelganger Graph: Resembling but Distinct | Jan 23, 2021 | BenchmarkingGraph Representation Learning | CodeCode Available | 1 | 5 |
| 4D Panoptic LiDAR Segmentation | Feb 24, 2021 | 4D Panoptic SegmentationBenchmarking | CodeCode Available | 1 | 5 |
| Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks | May 13, 2021 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| Benchmark on Drug Target Interaction Modeling from a Structure Perspective | Jul 4, 2024 | BenchmarkingDrug Discovery | CodeCode Available | 1 | 5 |
| Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints | Apr 18, 2023 | BenchmarkingDeep Reinforcement Learning | CodeCode Available | 1 | 5 |
| DocuMint: Docstring Generation for Python using Small Language Models | May 16, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica | Sep 6, 2021 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset | Jun 5, 2023 | BenchmarkingMultiple-choice | CodeCode Available | 1 | 5 |
| BEND: Benchmarking DNA Language Models on biologically meaningful tasks | Nov 21, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 | 5 |
| Benchmarking Adversarial Patch Against Aerial Detection | Oct 30, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| dMelodies: A Music Dataset for Disentanglement Learning | Jul 29, 2020 | BenchmarkingDisentanglement | CodeCode Available | 1 | 5 |
| GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks | Mar 23, 2025 | BenchmarkingHallucination | CodeCode Available | 1 | 5 |
| Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models | Jul 16, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Benchmarking Adversarial Robustness on Image Classification | Jun 1, 2020 | Adversarial AttackAdversarial Robustness | CodeCode Available | 1 | 5 |
| Benchmarking of DL Libraries and Models on Mobile Devices | Feb 14, 2022 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras | Jun 11, 2025 | Benchmarking | CodeCode Available | 1 | 5 |
| DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training | Mar 13, 2020 | BenchmarkingQuantization | CodeCode Available | 1 | 5 |
| Does your model understand genes? A benchmark of gene properties for biological and text models | Dec 5, 2024 | BenchmarkingMulti-class Classification | CodeCode Available | 1 | 5 |