Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–650 of 5548 papers

Title	Date	Tasks	Status	Hype
Contemporary Symbolic Regression Methods and their Relative Performance	Jul 29, 2021	Benchmarkingparameter estimation	CodeCode Available	1
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling	Jun 10, 2025	Benchmarking	CodeCode Available	1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks	Jun 14, 2020	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
CommonPower: A Framework for Safe Data-Driven Smart Grid Control	Jun 5, 2024	Benchmarkingenergy management	CodeCode Available	1
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification	Jun 18, 2023	BenchmarkingRetrieval	CodeCode Available	1
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies	Jun 15, 2025	Benchmarking	CodeCode Available	1
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics	May 6, 2025	Benchmarking	CodeCode Available	1
Active-Passive SimStereo -- Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods	Sep 17, 2022	BenchmarkingStereo Matching	CodeCode Available	1
Combinatorial Optimization with Policy Adaptation using Latent Space Search	Nov 13, 2023	BenchmarkingCombinatorial Optimization	CodeCode Available	1
MatTools: Benchmarking Large Language Models for Materials Science Tools	May 16, 2025	BenchmarkingQuestion Answering	CodeCode Available	1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking	Jul 3, 2024	BenchmarkingObject	CodeCode Available	1
Comprehensive benchmarking of large language models for RNA secondary structure prediction	Oct 21, 2024	Benchmarking	CodeCode Available	1
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation	Dec 26, 2019	BenchmarkingDomain Adaptation	CodeCode Available	1
CoDEx: A Comprehensive Knowledge Graph Completion Benchmark	Sep 16, 2020	BenchmarkingKnowledge Graph Completion	CodeCode Available	1
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages	Mar 11, 2024	BenchmarkingData Augmentation	CodeCode Available	1
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates	Jul 8, 2024	Benchmarkingknowledge editing	CodeCode Available	1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1
ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization	Jul 19, 2022	BenchmarkingImage Registration	CodeCode Available	1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Feb 26, 2025	BenchmarkingCode Generation	CodeCode Available	1
Bag of Tricks for Adversarial Training	Oct 1, 2020	Adversarial RobustnessBenchmarking	CodeCode Available	1
CODEMENV: Benchmarking Large Language Models on Code Migration	Jun 1, 2025	Benchmarking	CodeCode Available	1
New Protocols and Negative Results for Textual Entailment Data Collection	Apr 24, 2020	BenchmarkingDiversity	CodeCode Available	1
A Critical Assessment of State-of-the-Art in Entity Alignment	Oct 30, 2020	BenchmarkingEntity Alignment	CodeCode Available	1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1
Coarse-to-Fine Q-attention with Learned Path Ranking	Apr 4, 2022	Benchmarking	CodeCode Available	1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization	Apr 6, 2025	BenchmarkingCombinatorial Optimization	CodeCode Available	1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform	Oct 12, 2021	Benchmarking	CodeCode Available	1
Clinical Prompt Learning with Frozen Language Models	May 11, 2022	BenchmarkingGPU	CodeCode Available	1
CLoG: Benchmarking Continual Learning of Image Generation Models	Jun 7, 2024	BenchmarkingContinual Learning	CodeCode Available	1
ClearPose: Large-scale Transparent Object Dataset and Benchmark	Mar 8, 2022	BenchmarkingDepth Completion	CodeCode Available	1
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models	Nov 29, 2021	BenchmarkingPhysical Simulations	CodeCode Available	1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation	Nov 10, 2023	BenchmarkingCloud Computing	CodeCode Available	1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework	Dec 7, 2022	Benchmarking	CodeCode Available	1
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs	Nov 2, 2020	Benchmarking	CodeCode Available	1
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions	Jun 26, 2025	BenchmarkingDrug Design	CodeCode Available	1
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery	Oct 31, 2024	BenchmarkingCloud Removal	CodeCode Available	1
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin	Jul 15, 2024	Benchmarking	CodeCode Available	1
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness	Jul 13, 2020	Benchmarking	CodeCode Available	1
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning	Feb 20, 2024	Atomic number classificationBenchmarking	CodeCode Available	1
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods	Aug 2, 2022	BenchmarkingCausal Discovery	CodeCode Available	1
Align and Distill: Unifying and Improving Domain Adaptive Object Detection	Mar 18, 2024	Benchmarkingobject-detection	CodeCode Available	1
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing	Jun 7, 2023	BenchmarkingPrompt Engineering	CodeCode Available	1
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models	Jun 2, 2023	BenchmarkingLanguage Acquisition	CodeCode Available	1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling	Jan 21, 2024	Benchmarking	CodeCode Available	1
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning	Nov 29, 2024	BenchmarkingDeepFake Detection	CodeCode Available	1
CCTV-Gun: Benchmarking Handgun Detection in CCTV Images	Mar 19, 2023	Benchmarkingobject-detection	CodeCode Available	1
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?	Jun 15, 2023	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1

Show:10 25 50

← PrevPage 13 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified