Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1251–1300 of 5548 papers

Title	Date	Tasks	Status	Hype
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction	Nov 16, 2023	BenchmarkingEvent Extraction	CodeCode Available	1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks	Feb 4, 2023	Adversarial AttackAdversarial Robustness	CodeCode Available	1
Benchmarking Language Model Creativity: A Case Study on Code Generation	Jul 12, 2024	BenchmarkingCode Generation	CodeCode Available	1
Benchmarking emergency department triage prediction models with machine learning and large public electronic health records	Nov 22, 2021	Benchmarking	CodeCode Available	1
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs	Apr 28, 2024	Benchmarking	CodeCode Available	1
GuacaMol: Benchmarking Models for De Novo Molecular Design	Nov 22, 2018	BenchmarkingDrug Discovery	CodeCode Available	1
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling	Jun 10, 2025	Benchmarking	CodeCode Available	1
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms	Nov 30, 2023	BenchmarkingOpenAI Gym	CodeCode Available	1
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices	Jun 21, 2025	BenchmarkingCPU	CodeCode Available	1
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate	Aug 12, 2021	Benchmarking	CodeCode Available	1
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges	Oct 21, 2022	BenchmarkingCommunity Detection	CodeCode Available	1
HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing	Sep 25, 2024	BenchmarkingImage Dehazing	CodeCode Available	1
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation	Feb 18, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1
HINT3: Raising the bar for Intent Detection in the Wild	Sep 29, 2020	BenchmarkingIntent Detection	CodeCode Available	1
Contemporary Symbolic Regression Methods and their Relative Performance	Jul 29, 2021	Benchmarkingparameter estimation	CodeCode Available	1
Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images	Mar 15, 2024	BenchmarkingKnowledge Distillation	CodeCode Available	1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models	Nov 27, 2024	BenchmarkingEarth Observation	CodeCode Available	1
Benchmarking Quantized Neural Networks on FPGAs with FINN	Feb 2, 2021	BenchmarkingQuantization	CodeCode Available	1
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation	Dec 26, 2019	BenchmarkingDomain Adaptation	CodeCode Available	1
How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models	Aug 29, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks	Jun 14, 2020	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies	Jun 15, 2025	Benchmarking	CodeCode Available	1
CommonPower: A Framework for Safe Data-Driven Smart Grid Control	Jun 5, 2024	Benchmarkingenergy management	CodeCode Available	1
HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media	Oct 14, 2021	3D Pose EstimationBenchmarking	CodeCode Available	1
4D Panoptic LiDAR Segmentation	Feb 24, 2021	4D Panoptic SegmentationBenchmarking	CodeCode Available	1
A framework for benchmarking clustering algorithms	Sep 20, 2022	BenchmarkingClustering	CodeCode Available	1
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification	Jun 18, 2023	BenchmarkingRetrieval	CodeCode Available	1
Hyperparameter optimization in deep multi-target prediction	Nov 8, 2022	AutoMLBenchmarking	CodeCode Available	1
Comprehensive benchmarking of large language models for RNA secondary structure prediction	Oct 21, 2024	Benchmarking	CodeCode Available	1
Combinatorial Optimization with Policy Adaptation using Latent Space Search	Nov 13, 2023	BenchmarkingCombinatorial Optimization	CodeCode Available	1
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining	Nov 22, 2017	Benchmarkingfeature selection	CodeCode Available	1
Image Colorization: A Survey and Dataset	Aug 25, 2020	BenchmarkingColorization	CodeCode Available	1
Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification	Nov 11, 2024	BenchmarkingImage Segmentation	CodeCode Available	1
A SWAT-based Reinforcement Learning Framework for Crop Management	Feb 10, 2023	BenchmarkingDecision Making	CodeCode Available	1
AirSim Drone Racing Lab	Mar 12, 2020	BenchmarkingOptical Flow Estimation	CodeCode Available	1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking	Jul 3, 2024	BenchmarkingObject	CodeCode Available	1
Benchmarking Simulation-Based Inference	Jan 12, 2021	Benchmarking	CodeCode Available	1
A Comprehensive Overview of Large Language Models	Jul 12, 2023	Benchmarking	CodeCode Available	1
A framework for benchmarking class-out-of-distribution detection and its application to ImageNet	Feb 23, 2023	BenchmarkingKnowledge Distillation	CodeCode Available	1
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics	May 6, 2025	Benchmarking	CodeCode Available	1
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection	Apr 25, 2024	Benchmarkingobject-detection	CodeCode Available	1
Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation Model	Apr 10, 2024	BenchmarkingImage-to-Image Translation	CodeCode Available	1
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis	Aug 12, 2021	BenchmarkingMedical Image Analysis	CodeCode Available	1
Improving and Benchmarking Offline Reinforcement Learning Algorithms	Jun 1, 2023	AttributeBenchmarking	CodeCode Available	1
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions	Jun 26, 2025	BenchmarkingDrug Design	CodeCode Available	1
DataRec: A Python Library for Standardized and Reproducible Data Management in Recommender Systems	Oct 30, 2024	BenchmarkingManagement	CodeCode Available	1
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates	Jul 8, 2024	Benchmarkingknowledge editing	CodeCode Available	1
Geometric Deep Learning for Structure-Based Drug Design: A Survey	Jun 20, 2023	BenchmarkingDeep Learning	CodeCode Available	1
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1
CoDEx: A Comprehensive Knowledge Graph Completion Benchmark	Sep 16, 2020	BenchmarkingKnowledge Graph Completion	CodeCode Available	1

Show:10 25 50

← PrevPage 26 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified