Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5051–5100 of 5548 papers

Title	Date	Tasks	Status
Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients	Jul 17, 2023	BenchmarkingGPU	CodeCode Available
SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data	Apr 8, 2023	BenchmarkingData Augmentation	CodeCode Available
NSINA: A News Corpus for Sinhala	Mar 25, 2024	ArticlesBenchmarking	CodeCode Available
Improving Sequential Recommendation Models with an Enhanced Loss Function	Jan 3, 2023	BenchmarkingRecommendation Systems	CodeCode Available
Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks	Sep 8, 2019	BenchmarkingClassification	CodeCode Available
Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models	Feb 28, 2024	BenchmarkingHallucination	CodeCode Available
SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLM's Ability to Generate Digital Twins	Aug 21, 2024	Benchmarking	CodeCode Available
A Seq2Seq approach to Symbolic Regression	Oct 17, 2020	Benchmarkingregression	CodeCode Available
A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models	Apr 28, 2022	BenchmarkingDiversity	CodeCode Available
Simitate: A Hybrid Imitation Learning Benchmark	May 15, 2019	BenchmarkingImitation Learning	CodeCode Available
Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere	Mar 27, 2019	Benchmarking	CodeCode Available
ECBD: Evidence-Centered Benchmark Design for NLP	Jun 13, 2024	Benchmarking	CodeCode Available
A Continuous Optimisation Benchmark Suite from Neural Network Regression	Sep 12, 2021	BenchmarkingEvolutionary Algorithms	CodeCode Available
An Evaluation of Machine Learning Approaches for Early Diagnosis of Autism Spectrum Disorder	Sep 20, 2023	BenchmarkingClustering	CodeCode Available
Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique	Dec 6, 2023	BenchmarkingKnowledge Graphs	CodeCode Available
DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning	Mar 9, 2025	BenchmarkingDecision Making	CodeCode Available
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems	Jun 8, 2023	BenchmarkingDescriptive	CodeCode Available
Simple GNNs with Low Rank Non-parametric Aggregators	Oct 8, 2023	BenchmarkingNode Classification	CodeCode Available
Effective Stabilized Self-Training on Few-Labeled Graph Data	Oct 7, 2019	BenchmarkingModel Selection	CodeCode Available
Simulated Contextual Bandits for Personalization Tasks from Recommendation Datasets	Oct 12, 2022	BenchmarkingMulti-Armed Bandits	CodeCode Available
A Deep Reinforcement Learning Framework for Dynamic Portfolio Optimization: Evidence from China's Stock Market	Dec 24, 2024	BenchmarkingDecision Making	CodeCode Available
DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs	Oct 1, 2019	BenchmarkingDialogue Generation	CodeCode Available
DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition	Jul 16, 2025	BenchmarkingKnowledge Distillation	CodeCode Available
Referenced Thermodynamic Integration for Bayesian Model Selection: Application to COVID-19 Model Selection	Sep 8, 2020	BenchmarkingEpidemiology	CodeCode Available
Simulation-based Benchmarking for Causal Structure Learning in Gene Perturbation Experiments	Jul 8, 2024	BenchmarkingDecision Making	CodeCode Available
Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation	Sep 24, 2024	BenchmarkingMovie Recommendation	CodeCode Available
OG-SPACE: Optimized Stochastic Simulation of Spatial Models of Cancer Evolution	Oct 13, 2021	Benchmarking	CodeCode Available
Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions	May 16, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available
Okapi: Generalising Better by Making Statistical Matches Match	Nov 7, 2022	BenchmarkingBinary Classification	CodeCode Available
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations	Jan 24, 2022	BenchmarkingDrug Discovery	CodeCode Available
DQI: Measuring Data Quality in NLP	May 2, 2020	Active LearningBenchmarking	CodeCode Available
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks	Jan 29, 2024	BenchmarkingCross-Lingual Transfer	CodeCode Available
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction	May 23, 2023	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	CodeCode Available
WebSuite: Systematically Evaluating Why Web Agents Fail	Jun 1, 2024	BenchmarkingDiagnostic	CodeCode Available
Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation	Jul 17, 2020	BenchmarkingDisentanglement	CodeCode Available
Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence Classification	Jul 18, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks	Nov 15, 2023	BenchmarkingNetwork Pruning	CodeCode Available
Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M	May 15, 2025	BenchmarkingMemorization	CodeCode Available
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs	Apr 7, 2025	BenchmarkingFairness	CodeCode Available
A Review of Testing Object-Based Environment Perception for Safe Automated Driving	Feb 16, 2021	BenchmarkingSensor Modeling	CodeCode Available
Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo	May 3, 2024	BenchmarkingMulti-hop Question Answering	CodeCode Available
Benchmarking machine learning for bowel sound pattern classification from tabular features to pretrained models	Feb 21, 2025	BenchmarkingDiagnostic	CodeCode Available
On dataset transferability in medical image classification	Dec 28, 2024	BenchmarkingClassification	CodeCode Available
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?	May 7, 2025	BenchmarkingSemantic Segmentation	CodeCode Available
Do LLM Evaluators Prefer Themselves for a Reason?	Apr 4, 2025	BenchmarkingCode Generation	CodeCode Available
YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems	Jul 26, 2023	BenchmarkingCPU	CodeCode Available
Benchmarking Long-tail Generalization with Likelihood Splits	Oct 13, 2022	BenchmarkingLanguage Modeling	CodeCode Available
UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking	May 21, 2025	BenchmarkingClaim Verification	CodeCode Available
On Empirical Comparisons of Optimizers for Deep Learning	Oct 11, 2019	BenchmarkingDeep Learning	CodeCode Available
Benchmarking LLMs' Judgments with No Gold Standard	Nov 11, 2024	BenchmarkingMachine Translation	CodeCode Available

Show:10 25 50

← PrevPage 102 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified