| HATE-ITA: New Baselines for Hate Speech Detection in Italian | Jul 1, 2022 | BenchmarkingHate Speech Detection | CodeCode Available | 0 | 5 |
| A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models | Apr 28, 2022 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios | Dec 21, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder | Sep 20, 2023 | BenchmarkingClustering | CodeCode Available | 0 | 5 |
| A Review of Testing Object-Based Environment Perception for Safe Automated Driving | Feb 16, 2021 | BenchmarkingSensor Modeling | CodeCode Available | 0 | 5 |
| Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces | May 31, 2023 | BenchmarkingRecommendation Systems | CodeCode Available | 0 | 5 |
| Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique | Dec 6, 2023 | BenchmarkingKnowledge Graphs | CodeCode Available | 0 | 5 |
| DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning | Mar 9, 2025 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| Hard-Label Cryptanalytic Extraction of Neural Network Models | Sep 18, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| IdeaBench: Benchmarking Large Language Models for Research Idea Generation | Oct 31, 2024 | Benchmarkingscientific discovery | CodeCode Available | 0 | 5 |
| DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems | Jun 8, 2023 | BenchmarkingDescriptive | CodeCode Available | 0 | 5 |
| Effective Stabilized Self-Training on Few-Labeled Graph Data | Oct 7, 2019 | BenchmarkingModel Selection | CodeCode Available | 0 | 5 |
| Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora | May 13, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 | 5 |
| A Deep Reinforcement Learning Framework for Dynamic Portfolio Optimization: Evidence from China's Stock Market | Dec 24, 2024 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes Prosthesis | Mar 18, 2022 | BenchmarkingObject Recognition | CodeCode Available | 0 | 5 |
| GRATIS: GeneRAting TIme Series with diverse and controllable characteristics | Mar 7, 2019 | BenchmarkingClustering | CodeCode Available | 0 | 5 |
| Grounded Intuition of GPT-Vision's Abilities with Scientific Images | Nov 3, 2023 | Benchmarkingcounterfactual | CodeCode Available | 0 | 5 |
| Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time Apps | Jan 8, 2019 | BenchmarkingCPU | CodeCode Available | 0 | 5 |
| Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective | Dec 10, 2024 | Benchmarking | CodeCode Available | 0 | 5 |
| Learning Conjoint Attentions for Graph Neural Nets | Feb 5, 2021 | BenchmarkingGraph Attention | CodeCode Available | 0 | 5 |
| Benchmarking LLM-based Relevance Judgment Methods | Apr 17, 2025 | BenchmarkingOpen-Domain Question Answering | CodeCode Available | 0 | 5 |
| Graph Convolutional Networks Meet with High Dimensionality Reduction | Nov 7, 2019 | BenchmarkingDimensionality Reduction | CodeCode Available | 0 | 5 |
| Inverse Contextual Bandits: Learning How Behavior Evolves over Time | Jul 13, 2021 | BenchmarkingDecision Making | CodeCode Available | 0 | 5 |
| Graph-theoretical approach to robust 3D normal extraction of LiDAR data | May 23, 2022 | Benchmarking | CodeCode Available | 0 | 5 |
| Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data | Jan 31, 2024 | BenchmarkingChange Detection | CodeCode Available | 0 | 5 |
| DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs | Oct 1, 2019 | BenchmarkingDialogue Generation | CodeCode Available | 0 | 5 |
| Benchmarking Linguistic Diversity of Large Language Models | Dec 13, 2024 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| GOAL: Towards Benchmarking Few-Shot Sports Game Summarization | Jul 18, 2022 | Benchmarking | CodeCode Available | 0 | 5 |
| GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking | May 24, 2023 | BenchmarkingGraph Mining | CodeCode Available | 0 | 5 |
| IOLBENCH: Benchmarking LLMs on Linguistic Reasoning | Jan 8, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition | Jul 16, 2025 | BenchmarkingKnowledge Distillation | CodeCode Available | 0 | 5 |
| Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation | Sep 24, 2024 | BenchmarkingMovie Recommendation | CodeCode Available | 0 | 5 |
| GNNMerge: Merging of GNN Models Without Accessing Training Data | Mar 5, 2025 | BenchmarkingComputational Efficiency | CodeCode Available | 0 | 5 |
| Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions? | May 7, 2025 | BenchmarkingSemantic Segmentation | CodeCode Available | 0 | 5 |
| GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search | Jan 26, 2025 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural Networks | Jan 7, 2024 | BenchmarkingGraph Neural Network | CodeCode Available | 0 | 5 |
| DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations | Jan 24, 2022 | BenchmarkingDrug Discovery | CodeCode Available | 0 | 5 |
| Benchmarking Learning Efficiency in Deep Reservoir Computing | Sep 29, 2022 | Benchmarking | CodeCode Available | 0 | 5 |
| Geological Inference from Textual Data using Word Embeddings | Apr 10, 2025 | BenchmarkingWord Embeddings | CodeCode Available | 0 | 5 |
| Flexible Generation of Preference Data for Recommendation Analysis | Jul 23, 2024 | BenchmarkingRecommendation Systems | CodeCode Available | 0 | 5 |
| DQI: Measuring Data Quality in NLP | May 2, 2020 | Active LearningBenchmarking | CodeCode Available | 0 | 5 |
| Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation | Apr 21, 2025 | Benchmarking | CodeCode Available | 0 | 5 |
| A General Benchmarking Framework for Text Generation | Dec 1, 2020 | BenchmarkingKnowledge Graphs | CodeCode Available | 0 | 5 |
| A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric | Jan 22, 2021 | BenchmarkingSentence | CodeCode Available | 0 | 5 |
| Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider | Apr 26, 2025 | BenchmarkingGPU | CodeCode Available | 0 | 5 |
| Benchmarking Large Language Model Uncertainty for Prompt Optimization | Sep 16, 2024 | BenchmarkingDiversity | CodeCode Available | 0 | 5 |
| Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems | Oct 8, 2023 | Benchmarking | CodeCode Available | 0 | 5 |
| Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction | May 23, 2023 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 0 | 5 |
| Arena-Rosnav 2.0: A Development and Benchmarking Platform for Robot Navigation in Highly Dynamic Environments | Feb 20, 2023 | BenchmarkingRobot Navigation | CodeCode Available | 0 | 5 |
| GenderBench: Evaluation Suite for Gender Biases in LLMs | May 17, 2025 | Benchmarking | CodeCode Available | 0 | 5 |