| Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State Decoding | Nov 6, 2023 | BenchmarkingData Compression | CodeCode Available | 1 | 5 |
| Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels | Jan 30, 2024 | Benchmarkingimage-classification | CodeCode Available | 1 | 5 |
| Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery | Mar 24, 2025 | BenchmarkingHumanitarian | CodeCode Available | 1 | 5 |
| Benchmarking Object Detectors with COCO: A New Path Forward | Mar 27, 2024 | BenchmarkingObject | CodeCode Available | 1 | 5 |
| Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs | Sep 18, 2021 | BenchmarkingComplex Query Answering | CodeCode Available | 1 | 5 |
| CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms | Aug 2, 2021 | Benchmarkingcounterfactual | CodeCode Available | 1 | 5 |
| Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA | Dec 29, 2023 | AnatomyBenchmarking | CodeCode Available | 1 | 5 |
| Benchmarking the Generation of Fact Checking Explanations | Aug 29, 2023 | Abstractive Text SummarizationArticles | CodeCode Available | 1 | 5 |
| CausalTime: Realistically Generated Time-series for Benchmarking of Causal Discovery | Oct 3, 2023 | BenchmarkingCausal Discovery | CodeCode Available | 1 | 5 |
| Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework | Jun 12, 2024 | BenchmarkingCausal Inference | CodeCode Available | 1 | 5 |
| Working Memory Capacity of ChatGPT: An Empirical Study | Apr 30, 2023 | BenchmarkingLanguage Modeling | CodeCode Available | 1 | 5 |
| CBench: Towards Better Evaluation of Question Answering Over Knowledge Graphs | Apr 5, 2021 | BenchmarkingKnowledge Graphs | CodeCode Available | 1 | 5 |
| How to Train Neural Field Representations: A Comprehensive Study and Benchmark | Dec 16, 2023 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification | Jul 6, 2023 | BenchmarkingDomain Adaptation | CodeCode Available | 1 | 5 |
| Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond | Jun 16, 2023 | BenchmarkingEvidence Selection | CodeCode Available | 1 | 5 |
| 3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding | Oct 16, 2023 | Action RecognitionBenchmarking | CodeCode Available | 1 | 5 |
| HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims | Feb 17, 2025 | BenchmarkingFact Checking | CodeCode Available | 1 | 5 |
| CoDEx: A Comprehensive Knowledge Graph Completion Benchmark | Sep 16, 2020 | BenchmarkingKnowledge Graph Completion | CodeCode Available | 1 | 5 |
| On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing | Jun 7, 2023 | BenchmarkingPrompt Engineering | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT | Apr 3, 2024 | BenchmarkingGeneral Knowledge | CodeCode Available | 1 | 5 |
| Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle? | Sep 29, 2023 | BenchmarkingKnowledge Graph Completion | CodeCode Available | 1 | 5 |
| LEMUR Neural Network Dataset: Towards Seamless AutoML | Apr 14, 2025 | AutoMLBenchmarking | CodeCode Available | 1 | 5 |
| Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics | Aug 2, 2024 | Adversarial AttackAdversarial Purification | CodeCode Available | 1 | 5 |
| CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness | Jul 13, 2020 | Benchmarking | CodeCode Available | 1 | 5 |
| Benchmarking Omni-Vision Representation through the Lens of Visual Realms | Jul 14, 2022 | BenchmarkingContrastive Learning | CodeCode Available | 1 | 5 |
| CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning | Feb 20, 2024 | Atomic number classificationBenchmarking | CodeCode Available | 1 | 5 |
| HINT3: Raising the bar for Intent Detection in the Wild | Sep 29, 2020 | BenchmarkingIntent Detection | CodeCode Available | 1 | 5 |
| CIDEr: Consensus-based Image Description Evaluation | Nov 20, 2014 | Action RecognitionAttribute | CodeCode Available | 1 | 5 |
| Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images | Mar 15, 2024 | BenchmarkingKnowledge Distillation | CodeCode Available | 1 | 5 |
| Large Scale MRI Collection and Segmentation of Cirrhotic Liver | Oct 6, 2024 | BenchmarkingDiagnostic | CodeCode Available | 1 | 5 |
| Benchmarking Large Language Models for Automated Verilog RTL Code Generation | Dec 13, 2022 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| Hierarchical graph neural nets can capture long-range interactions | Jul 15, 2021 | BenchmarkingMolecular Property Prediction | CodeCode Available | 1 | 5 |
| Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection | Jun 28, 2023 | BenchmarkingData Augmentation | CodeCode Available | 1 | 5 |
| A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless Systems | May 11, 2021 | BenchmarkingEdge-computing | CodeCode Available | 1 | 5 |
| How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation | Dec 12, 2023 | Anomaly DetectionAutonomous Driving | CodeCode Available | 1 | 5 |
| Benchmarking Language Models for Code Syntax Understanding | Oct 26, 2022 | Benchmarking | CodeCode Available | 1 | 5 |
| LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild | May 30, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction | Nov 16, 2023 | BenchmarkingEvent Extraction | CodeCode Available | 1 | 5 |
| AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM | Nov 26, 2024 | BenchmarkingText-to-Video Generation | CodeCode Available | 1 | 5 |
| ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models | Nov 29, 2021 | BenchmarkingPhysical Simulations | CodeCode Available | 1 | 5 |
| A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care | Sep 16, 2022 | BenchmarkingDeep Learning | CodeCode Available | 1 | 5 |
| HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis | Feb 13, 2021 | BenchmarkingClustering | CodeCode Available | 1 | 5 |
| Benchmarking Language Model Creativity: A Case Study on Code Generation | Jul 12, 2024 | BenchmarkingCode Generation | CodeCode Available | 1 | 5 |
| CLoG: Benchmarking Continual Learning of Image Generation Models | Jun 7, 2024 | BenchmarkingContinual Learning | CodeCode Available | 1 | 5 |
| Clinical Prompt Learning with Frozen Language Models | May 11, 2022 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| Benchmarking structure-based three-dimensional molecular generative models using GenBench3D: ligand conformation quality matters | Jul 5, 2024 | Benchmarkingvalid | CodeCode Available | 1 | 5 |
| HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing | Sep 25, 2024 | BenchmarkingImage Dehazing | CodeCode Available | 1 | 5 |
| HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning | Jul 22, 2024 | BenchmarkingHallucination | CodeCode Available | 1 | 5 |
| Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency | Jun 14, 2024 | Benchmarking | CodeCode Available | 1 | 5 |
| HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns | Jan 28, 2025 | Adversarial AttackBenchmarking | CodeCode Available | 1 | 5 |