Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 776–800 of 5548 papers

Title	Date	Tasks	Status	Hype
A Comprehensive Overview of Large Language Models	Jul 12, 2023	Benchmarking	CodeCode Available	1
Examining the Effects of Degree Distribution and Homophily in Graph Learning Models	Jul 17, 2023	BenchmarkingGraph Clustering	CodeCode Available	1
Leveraging Trust for Joint Multi-Objective and Multi-Fidelity Optimization	Dec 27, 2021	Bayesian OptimizationBenchmarking	CodeCode Available	1
Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling	May 23, 2024	Benchmarking	CodeCode Available	1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19	Feb 9, 2021	BenchmarkingQ-Learning	CodeCode Available	1
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning	Feb 20, 2024	Atomic number classificationBenchmarking	CodeCode Available	1
Exploring Large Language Models for Classical Philology	May 23, 2023	BenchmarkingDecoder	CodeCode Available	1
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1
AirSim Drone Racing Lab	Mar 12, 2020	BenchmarkingOptical Flow Estimation	CodeCode Available	1
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs	Mar 27, 2025	AttributeBenchmarking	CodeCode Available	1
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints	Apr 18, 2023	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension	Mar 26, 2022	BenchmarkingQuestion Answering	CodeCode Available	1
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms	Aug 25, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1
A SWAT-based Reinforcement Learning Framework for Crop Management	Feb 10, 2023	BenchmarkingDecision Making	CodeCode Available	1
featsel: A framework for benchmarking of feature selection algorithms and cost functions	Jul 19, 2017	BenchmarkingComputational Efficiency	CodeCode Available	1
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations	Mar 21, 2024	BenchmarkingMemorization	CodeCode Available	1
Benchmarking Adversarial Patch Against Aerial Detection	Oct 30, 2022	Benchmarking	CodeCode Available	1
Benchmarking Data Science Agents	Feb 27, 2024	BenchmarkingCode Generation	CodeCode Available	1
FELM: Benchmarking Factuality Evaluation of Large Language Models	Oct 1, 2023	BenchmarkingMath	CodeCode Available	1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling	Jan 21, 2024	Benchmarking	CodeCode Available	1
Benchmarking Adversarial Robustness on Image Classification	Jun 1, 2020	Adversarial AttackAdversarial Robustness	CodeCode Available	1
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods	Aug 2, 2022	BenchmarkingCausal Discovery	CodeCode Available	1
FineSurE: Fine-grained Summarization Evaluation using LLMs	Jul 1, 2024	BenchmarkingHallucination	CodeCode Available	1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization	Apr 6, 2025	BenchmarkingCombinatorial Optimization	CodeCode Available	1
CommonPower: A Framework for Safe Data-Driven Smart Grid Control	Jun 5, 2024	Benchmarkingenergy management	CodeCode Available	1

Show:10 25 50

← PrevPage 32 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified