| PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding | Dec 6, 2018 | 3D Instance Segmentation3D Semantic Segmentation | CodeCode Available | 0 |
| CVC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language Models | Jun 2, 2025 | Benchmarking | CodeCode Available | 0 |
| Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022 | Jan 31, 2023 | Action DetectionBenchmarking | CodeCode Available | 0 |
| PATCH! Psychometrics-AssisTed BenCHmarking of Large Language Models against Human Populations: A Case Study of Proficiency in 8th Grade Mathematics | Apr 2, 2024 | Benchmarking | CodeCode Available | 0 |
| Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models | Jul 23, 2024 | BenchmarkingSegmentation | CodeCode Available | 0 |
| A Position Paper on the Automatic Generation of Machine Learning Leaderboards | May 23, 2025 | BenchmarkingPosition | CodeCode Available | 0 |
| Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification | Jan 14, 2025 | BenchmarkingGraph Representation Learning | CodeCode Available | 0 |
| ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees | Apr 24, 2024 | BenchmarkingMolecular Property Prediction | CodeCode Available | 0 |
| PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset | May 30, 2025 | BenchmarkingMultiple Instance Learning | CodeCode Available | 0 |
| Attribution of Predictive Uncertainties in Classification Models | Jul 19, 2021 | BenchmarkingClassification | CodeCode Available | 0 |
| Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs | Sep 26, 2024 | BenchmarkingConformal Prediction | CodeCode Available | 0 |
| Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024) | Dec 2, 2024 | BenchmarkingHigh-Level Synthesis | CodeCode Available | 0 |
| Towards Objectively Benchmarking Social Intelligence for Language Agents at Action Level | Apr 8, 2024 | Benchmarking | CodeCode Available | 0 |
| Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA | Jul 22, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity | Oct 12, 2018 | Activity RecognitionBenchmarking | CodeCode Available | 0 |
| CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems | Jun 9, 2025 | AttributeBenchmarking | CodeCode Available | 0 |
| PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models | Dec 9, 2024 | BenchmarkingInstruction Following | CodeCode Available | 0 |
| CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants | Oct 28, 2024 | Benchmarking | CodeCode Available | 0 |
| CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head Redirection | Jun 21, 2021 | BenchmarkingDomain Adaptation | CodeCode Available | 0 |
| Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels | Nov 21, 2024 | BenchmarkingMachine Translation | CodeCode Available | 0 |
| Ants can orienteer a thief in their robbery | Apr 15, 2020 | BenchmarkingCombinatorial Optimization | CodeCode Available | 0 |
| 3DOS: Towards 3D Open Set Learning -- Benchmarking and Understanding Semantic Novelty Detection on Point Clouds | Jul 23, 2022 | BenchmarkingNovelty Detection | CodeCode Available | 0 |
| Benchmarking Generative Latent Variable Models for Speech | Feb 22, 2022 | BenchmarkingImage Generation | CodeCode Available | 0 |
| Benchmarking Generative AI Models for Deep Learning Test Input Generation | Dec 23, 2024 | BenchmarkingDeep Learning | CodeCode Available | 0 |
| Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding | Sep 26, 2022 | BenchmarkingNatural Language Queries | CodeCode Available | 0 |
| C-TLSAN: Content-Enhanced Time-Aware Long- and Short-Term Attention Network for Personalized Recommendation | Jun 16, 2025 | BenchmarkingRecommendation Systems | CodeCode Available | 0 |
| Performance Evaluation of Real-Time Object Detection for Electric Scooters | May 5, 2024 | Autonomous VehiclesBenchmarking | CodeCode Available | 0 |
| Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis | Feb 14, 2018 | BenchmarkingCausal Inference | CodeCode Available | 0 |
| A General Benchmarking Framework for Text Generation | Dec 1, 2020 | BenchmarkingKnowledge Graphs | CodeCode Available | 0 |
| Performance Modeling of Data Storage Systems using Generative Models | Jul 5, 2023 | Benchmarking | CodeCode Available | 0 |
| Zero-Shot Hyperspectral Pansharpening Using Hysteresis-Based Tuning for Spectral Quality Control | May 22, 2025 | BenchmarkingPansharpening | CodeCode Available | 0 |
| Vector-Based Data Improves Left-Right Eye-Tracking Classifier Performance After a Covariate Distributional Shift | Jul 31, 2022 | BenchmarkingEEG | CodeCode Available | 0 |
| AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge | Dec 18, 2024 | BenchmarkingWorld Knowledge | CodeCode Available | 0 |
| Periodic Extrapolative Generalisation in Neural Networks | Sep 21, 2022 | Benchmarking | CodeCode Available | 0 |
| Standardizing Structural Causal Models | Jun 17, 2024 | BenchmarkingCausal Inference | CodeCode Available | 0 |
| Standard Vs Uniform Binary Search and Their Variants in Learned Static Indexing: The Case of the Searching on Sorted Data Benchmarking Software Platform | Jan 5, 2022 | Benchmarking | CodeCode Available | 0 |
| StarBASE-GP: Biologically-Guided Automated Machine Learning for Genotype-to-Phenotype Association Analysis | May 28, 2025 | Benchmarking | CodeCode Available | 0 |
| Benchmarking framework for machine learning classification from fNIRS data | Mar 3, 2023 | BenchmarkingBrain Computer Interface | CodeCode Available | 0 |
| PersoBench: Benchmarking Personalized Response Generation in Large Language Models | Oct 4, 2024 | BenchmarkingDialogue Generation | CodeCode Available | 0 |
| STA: Self-controlled Text Augmentation for Improving Text Classifications | Feb 24, 2023 | BenchmarkingText Augmentation | CodeCode Available | 0 |
| Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical Segmentation | Feb 5, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 0 |
| XCompress: LLM assisted Python-based text compression toolkit | Aug 12, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| A Framework for Generating Informative Benchmark Instances | May 29, 2022 | Benchmarking | CodeCode Available | 0 |
| What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility? | Oct 26, 2022 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| Towards Robust Metrics for Concept Representation Evaluation | Jan 25, 2023 | BenchmarkingDisentanglement | CodeCode Available | 0 |
| Statistical Multicriteria Evaluation of LLM-Generated Text | Jun 22, 2025 | BenchmarkingDiversity | CodeCode Available | 0 |
| ANTHROPOS-V: benchmarking the novel task of Crowd Volume Estimation | Jan 3, 2025 | BenchmarkingCrowd Counting | CodeCode Available | 0 |
| Answer Consolidation: Formulation and Benchmarking | Apr 29, 2022 | BenchmarkingQuestion Answering | CodeCode Available | 0 |
| A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches | May 22, 2023 | BenchmarkingClassification | CodeCode Available | 0 |
| A novel evaluation methodology for supervised Feature Ranking algorithms | Jul 9, 2022 | BenchmarkingFeature Importance | CodeCode Available | 0 |