Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 801–850 of 5548 papers

Title	Date	Tasks	Status	Hype
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Feb 26, 2025	BenchmarkingCode Generation	CodeCode Available	1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework	Dec 7, 2022	Benchmarking	CodeCode Available	1
A Comprehensive Overview of Large Language Models	Jul 12, 2023	Benchmarking	CodeCode Available	1
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code	Jun 22, 2022	BenchmarkingText Generation	CodeCode Available	1
Benchmarking AI scientists in omics data-driven biological research	May 13, 2025	BenchmarkingMultiple-choice	CodeCode Available	1
CODEMENV: Benchmarking Large Language Models on Code Migration	Jun 1, 2025	Benchmarking	CodeCode Available	1
A Dataset for Answering Time-Sensitive Questions	Aug 13, 2021	Benchmarking	CodeCode Available	1
Benchmarking Algorithms for Federated Domain Generalization	Jul 11, 2023	BenchmarkingDiversity	CodeCode Available	1
Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfiler	Feb 2, 2023	BenchmarkingEvolutionary Algorithms	CodeCode Available	1
Generating a Doppelganger Graph: Resembling but Distinct	Jan 23, 2021	BenchmarkingGraph Representation Learning	CodeCode Available	1
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs	Mar 10, 2020	BenchmarkingEntity Alignment	CodeCode Available	1
Generative Evaluation of Complex Reasoning in Large Language Models	Apr 3, 2025	BenchmarkingMemorization	CodeCode Available	1
Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single Images	Dec 8, 2023	BenchmarkingObject	CodeCode Available	1
Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase	Jun 21, 2023	3D-Aware Image SynthesisBenchmarking	CodeCode Available	1
Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms	Sep 21, 2022	3D human pose and shape estimationBenchmarking	CodeCode Available	1
A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular Data	Jun 20, 2024	BenchmarkingKolmogorov-Arnold Networks	CodeCode Available	1
AirSim Drone Racing Lab	Mar 12, 2020	BenchmarkingOptical Flow Estimation	CodeCode Available	1
A SWAT-based Reinforcement Learning Framework for Crop Management	Feb 10, 2023	BenchmarkingDecision Making	CodeCode Available	1
Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples	Jul 31, 2023	Adversarial RobustnessBenchmarking	CodeCode Available	1
Geoclidean: Few-Shot Generalization in Euclidean Geometry	Nov 30, 2022	Benchmarking	CodeCode Available	1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1
GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras	Jun 11, 2025	Benchmarking	CodeCode Available	1
Benchmarking Deep Learning Interpretability in Time Series Predictions	Oct 26, 2020	BenchmarkingDeep Learning	CodeCode Available	1
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models	Dec 21, 2023	Benchmarking	CodeCode Available	1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform	Oct 12, 2021	Benchmarking	CodeCode Available	1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1
Grad DFT: a software library for machine learning enhanced density functional theory	Sep 23, 2023	Benchmarking	CodeCode Available	1
GraphArena: Benchmarking Large Language Models on Graph Computational Problems	Jun 29, 2024	BenchmarkingHallucination	CodeCode Available	1
Graph Neural Network-Based Anomaly Detection for River Network Systems	Apr 19, 2023	Anomaly DetectionBenchmarking	CodeCode Available	1
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach	Oct 10, 2023	BenchmarkingCode Generation	CodeCode Available	1
CommonPower: A Framework for Safe Data-Driven Smart Grid Control	Jun 5, 2024	Benchmarkingenergy management	CodeCode Available	1
Replication in Visual Diffusion Models: A Survey and Outlook	Jul 7, 2024	BenchmarkingSurvey	CodeCode Available	1
DFGC 2021: A DeepFake Game Competition	Jun 2, 2021	BenchmarkingDeepFake Detection	CodeCode Available	1
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models	Nov 29, 2021	BenchmarkingPhysical Simulations	CodeCode Available	1
Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning	Dec 18, 2024	BenchmarkingGraph Learning	CodeCode Available	1
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?	Aug 14, 2023	BenchmarkingDrug Design	CodeCode Available	1
4D Panoptic LiDAR Segmentation	Feb 24, 2021	4D Panoptic SegmentationBenchmarking	CodeCode Available	1
Clinical Prompt Learning with Frozen Language Models	May 11, 2022	BenchmarkingGPU	CodeCode Available	1
Large Scale MRI Collection and Segmentation of Cirrhotic Liver	Oct 6, 2024	BenchmarkingDiagnostic	CodeCode Available	1
Benchmarking of DL Libraries and Models on Mobile Devices	Feb 14, 2022	BenchmarkingGPU	CodeCode Available	1
Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox	Jul 17, 2023	Benchmarking	CodeCode Available	1
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning	Nov 29, 2024	BenchmarkingDeepFake Detection	CodeCode Available	1
A BFS-Tree of Ranking References for Unsupervised Manifold Learning	Sep 24, 2020	BenchmarkingImage Retrieval	CodeCode Available	1
Benchmarking and Survey of Explanation Methods for Black Box Models	Feb 25, 2021	BenchmarkingSurvey	CodeCode Available	1
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders	Jun 4, 2024	BenchmarkingClustering	CodeCode Available	1
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089	Nov 6, 2023	BenchmarkingKnowledge Base Question Answering	CodeCode Available	1
ClearPose: Large-scale Transparent Object Dataset and Benchmark	Mar 8, 2022	BenchmarkingDepth Completion	CodeCode Available	1
CLoG: Benchmarking Continual Learning of Image Generation Models	Jun 7, 2024	BenchmarkingContinual Learning	CodeCode Available	1
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges	Oct 21, 2022	BenchmarkingCommunity Detection	CodeCode Available	1
AIPerf: Automated machine learning as an AI-HPC benchmark	Aug 17, 2020	AutoMLBenchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 17 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified