Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1351–1400 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Beyond neural scaling laws: beating power law scaling via data pruning	Jun 29, 2022	Benchmarking	CodeCode Available	1	5
ClinicRealm: Re-evaluating Large Language Models with Conventional Machine Learning for Non-Generative Clinical Prediction Tasks	Jul 26, 2024	BenchmarkingModel Selection	CodeCode Available	1	5
IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics	Oct 11, 2018	Benchmarking	CodeCode Available	1	5
Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification	Jul 6, 2023	BenchmarkingDomain Adaptation	CodeCode Available	1	5
A framework for benchmarking clustering algorithms	Sep 20, 2022	BenchmarkingClustering	CodeCode Available	1	5
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?	Sep 29, 2023	BenchmarkingKnowledge Graph Completion	CodeCode Available	1	5
ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset	Jun 14, 2022	BenchmarkingIschemic Stroke Lesion Segmentation	CodeCode Available	1	5
Open Radar Initiative: Large Scale Dataset for Benchmarking of micro-Doppler Recognition Algorithms	May 7, 2021	Benchmarking	CodeCode Available	1	5
Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification	Nov 11, 2024	BenchmarkingImage Segmentation	CodeCode Available	1	5
DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training	Mar 13, 2020	BenchmarkingQuantization	CodeCode Available	1	5
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models	Apr 22, 2024	BenchmarkingWorld Knowledge	CodeCode Available	1	5
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs	Sep 18, 2021	BenchmarkingComplex Query Answering	CodeCode Available	1	5
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing	Apr 2, 2025	3D ReconstructionBenchmarking	CodeCode Available	1	5
OPF-Learn: An Open-Source Framework for Creating Representative AC Optimal Power Flow Datasets	Nov 1, 2021	Benchmarking	CodeCode Available	1	5
Does your model understand genes? A benchmark of gene properties for biological and text models	Dec 5, 2024	BenchmarkingMulti-class Classification	CodeCode Available	1	5
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication	Sep 16, 2021	3D Object DetectionBenchmarking	CodeCode Available	1	5
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models	Jul 16, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
A framework for benchmarking class-out-of-distribution detection and its application to ImageNet	Feb 23, 2023	BenchmarkingKnowledge Distillation	CodeCode Available	1	5
Don’t be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System	Nov 1, 2021	BenchmarkingResponse Generation	CodeCode Available	1	5
IOHexperimenter: Benchmarking Platform for Iterative Optimization Heuristics	Nov 7, 2021	Bayesian OptimizationBenchmarking	CodeCode Available	1	5
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment	Feb 21, 2024	Adversarial RobustnessBenchmarking	CodeCode Available	1	5
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning	Jul 21, 2023	BenchmarkingCombinatorial Optimization	CodeCode Available	1	5
Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy	Oct 23, 2020	BenchmarkingDiagnostic	CodeCode Available	1	5
DomainLab: A modular Python package for domain generalization in deep learning	Mar 21, 2024	BenchmarkingDomain Generalization	CodeCode Available	1	5
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks	May 13, 2021	BenchmarkingDrug Discovery	CodeCode Available	1	5
Introducing Milabench: Benchmarking Accelerators for AI	Nov 18, 2024	BenchmarkingDeep Learning	CodeCode Available	1	5
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms	Jul 8, 2021	Benchmarking	CodeCode Available	1	5
BEND: Benchmarking DNA Language Models on biologically meaningful tasks	Nov 21, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Introducing the VoicePrivacy Initiative	May 4, 2020	Benchmarking	CodeCode Available	1	5
BenchML: an extensible pipelining framework for benchmarking representations of materials and molecules at scale	Dec 4, 2021	BenchmarkingHyperparameter Optimization	CodeCode Available	1	5
Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital Pathology	Jun 30, 2022	BenchmarkingDiagnostic	CodeCode Available	1	5
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM	Mar 28, 2024	Benchmarking	CodeCode Available	1	5
Benchmark on Drug Target Interaction Modeling from a Structure Perspective	Jul 4, 2024	BenchmarkingDrug Discovery	CodeCode Available	1	5
Benchmarks for Deep Off-Policy Evaluation	Mar 30, 2021	Benchmarkingcontinuous-control	CodeCode Available	1	5
Intrinsic Image Harmonization	Jun 19, 2021	BenchmarkingImage Harmonization	CodeCode Available	1	5
Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets	Oct 22, 2020	ArticlesBenchmarking	CodeCode Available	1	5
Align and Distill: Unifying and Improving Domain Adaptive Object Detection	Mar 18, 2024	Benchmarkingobject-detection	CodeCode Available	1	5
Event-Free Moving Object Segmentation from Moving Ego Vehicle	Apr 28, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation	Mar 7, 2024	BenchmarkingMultimodal Recommendation	CodeCode Available	1	5
Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions	Oct 13, 2021	BenchmarkingComputational Efficiency	CodeCode Available	1	5
Benchmarking Image Retrieval for Visual Localization	Nov 24, 2020	Autonomous DrivingBenchmarking	CodeCode Available	1	5
ArabicaQA: A Comprehensive Dataset for Arabic Question Answering	Mar 26, 2024	BenchmarkingMachine Reading Comprehension	CodeCode Available	1	5
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets	Dec 10, 2021	Benchmarking	CodeCode Available	1	5
Interpretable statistical representations of neural population dynamics and geometry	Apr 6, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems	Jun 19, 2025	BenchmarkingDescriptive	CodeCode Available	1	5
Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks	Apr 5, 2022	Benchmarking	CodeCode Available	1	5
Physiology-based simulation of the retinal vasculature enables annotation-free segmentation of OCT angiographs	Jul 22, 2022	BenchmarkingRetinal Vessel Segmentation	CodeCode Available	1	5
PIC4rl-gym: a ROS2 modular framework for Robots Autonomous Navigation with Deep Reinforcement Learning	Nov 19, 2022	Autonomous NavigationBenchmarking	CodeCode Available	1	5
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning	May 30, 2024	Autonomous DrivingBenchmarking	CodeCode Available	1	5
IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation	Jul 13, 2023	BenchmarkingGraph Embedding	CodeCode Available	1	5

Show:10 25 50

← PrevPage 28 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified