Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1001–1050 of 5548 papers

Title	Date	Tasks	Status	Hype
Data Generating Process to Evaluate Causal Discovery Techniques for Time Series Data	Apr 16, 2021	BenchmarkingCausal Discovery	CodeCode Available	1
OceanBench: The Sea Surface Height Edition	Sep 27, 2023	BenchmarkingSensor Fusion	CodeCode Available	1
Delving into Out-of-Distribution Detection with Medical Vision-Language Models	Mar 2, 2025	Benchmarkingimage-classification	CodeCode Available	1
Working Memory Capacity of ChatGPT: An Empirical Study	Apr 30, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1
Contemporary Symbolic Regression Methods and their Relative Performance	Jul 29, 2021	Benchmarkingparameter estimation	CodeCode Available	1
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection	Apr 25, 2024	Benchmarkingobject-detection	CodeCode Available	1
Comprehensive benchmarking of large language models for RNA secondary structure prediction	Oct 21, 2024	Benchmarking	CodeCode Available	1
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices	Jun 21, 2025	BenchmarkingCPU	CodeCode Available	1
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms	Nov 30, 2023	BenchmarkingOpenAI Gym	CodeCode Available	1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks	Jun 14, 2020	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification	Jun 18, 2023	BenchmarkingRetrieval	CodeCode Available	1
Benchmarking Simulation-Based Inference	Jan 12, 2021	Benchmarking	CodeCode Available	1
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK	Aug 8, 2023	BenchmarkingGPU	CodeCode Available	1
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089	Nov 6, 2023	BenchmarkingKnowledge Base Question Answering	CodeCode Available	1
A Comparison of Image Denoising Methods	Apr 18, 2023	BenchmarkingDenoising	CodeCode Available	1
CommonPower: A Framework for Safe Data-Driven Smart Grid Control	Jun 5, 2024	Benchmarkingenergy management	CodeCode Available	1
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies	Jun 15, 2025	Benchmarking	CodeCode Available	1
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics	May 6, 2025	Benchmarking	CodeCode Available	1
AI Agents That Matter	Jul 1, 2024	Benchmarking	CodeCode Available	1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences	May 28, 2024	BenchmarkingFeature Engineering	CodeCode Available	1
OpenDataVal: a Unified Benchmark for Data Valuation	Jun 18, 2023	BenchmarkingData Valuation	CodeCode Available	1
Combinatorial Optimization with Policy Adaptation using Latent Space Search	Nov 13, 2023	BenchmarkingCombinatorial Optimization	CodeCode Available	1
New Protocols and Negative Results for Textual Entailment Data Collection	Apr 24, 2020	BenchmarkingDiversity	CodeCode Available	1
AI Accelerator Survey and Trends	Sep 18, 2021	BenchmarkingComputational Efficiency	CodeCode Available	1
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs	Nov 2, 2020	Benchmarking	CodeCode Available	1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking	Jul 3, 2024	BenchmarkingObject	CodeCode Available	1
CoDEx: A Comprehensive Knowledge Graph Completion Benchmark	Sep 16, 2020	BenchmarkingKnowledge Graph Completion	CodeCode Available	1
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates	Jul 8, 2024	Benchmarkingknowledge editing	CodeCode Available	1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses	Mar 3, 2025	Benchmarking	CodeCode Available	1
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1
A skeletonization algorithm for gradient-based optimization	Sep 5, 2023	BenchmarkingDeep Learning	CodeCode Available	1
Benchmarking Visual Localization for Autonomous Navigation	Mar 24, 2022	Autonomous NavigationBenchmarking	CodeCode Available	1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework	Dec 7, 2022	Benchmarking	CodeCode Available	1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1
A GPU-accelerated Large-scale Simulator for Transportation System Optimization Benchmarking	Jun 15, 2024	BenchmarkingGPU	CodeCode Available	1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform	Oct 12, 2021	Benchmarking	CodeCode Available	1
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization	Aug 10, 2023	BenchmarkingDecision Making	CodeCode Available	1
Coarse-to-Fine Q-attention with Learned Path Ranking	Apr 4, 2022	Benchmarking	CodeCode Available	1
Benchmarking Pathology Feature Extractors for Whole Slide Image Classification	Nov 20, 2023	Benchmarkingimage-classification	CodeCode Available	1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation	Nov 10, 2023	BenchmarkingCloud Computing	CodeCode Available	1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization	Apr 6, 2025	BenchmarkingCombinatorial Optimization	CodeCode Available	1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Feb 26, 2025	BenchmarkingCode Generation	CodeCode Available	1
AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction	Jul 25, 2024	BenchmarkingDeep Learning	CodeCode Available	1
A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging	Apr 26, 2020	BenchmarkingLeft Atrium Segmentation	CodeCode Available	1
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models	Mar 31, 2023	BenchmarkingCausal Discovery	CodeCode Available	1
A global analysis of metrics used for measuring performance in natural language processing	Apr 25, 2022	BenchmarkingMachine Translation	CodeCode Available	1
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces	May 23, 2023	Benchmarking	CodeCode Available	1
Clinical Prompt Learning with Frozen Language Models	May 11, 2022	BenchmarkingGPU	CodeCode Available	1

Show:10 25 50

← PrevPage 21 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified