Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+	Mar 16, 2023	BenchmarkingGraph Regression	CodeCode Available	3	5
General Geospatial Inference with a Population Dynamics Foundation Model	Nov 11, 2024	BenchmarkingGraph Neural Network	CodeCode Available	3	5
BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction	Feb 26, 2025	BenchmarkingTime Series	CodeCode Available	3	5
Personalized Benchmarking with the Ludwig Benchmarking Toolkit	Nov 8, 2021	BenchmarkingHyperparameter Optimization	CodeCode Available	3	5
HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis	Jun 23, 2024	BenchmarkingRepresentation Learning	CodeCode Available	3	5
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites	Apr 15, 2025	Autonomous Web NavigationBenchmarking	CodeCode Available	3	5
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation	Jul 24, 2024	BenchmarkingHuman Animation	CodeCode Available	3	5
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis	Oct 9, 2023	BenchmarkingMultivariate Time Series Forecasting	CodeCode Available	3	5
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems	Sep 2, 2024	BenchmarkingInstruction Following	CodeCode Available	3	5
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning	Jan 26, 2023	BenchmarkingDeep Reinforcement Learning	CodeCode Available	3	5
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs	Mar 14, 2022	BenchmarkingGraph Embedding	CodeCode Available	3	5
Benchmarking Automatic Machine Learning Frameworks	Aug 17, 2018	Automated Feature EngineeringAutoML	CodeCode Available	3	5
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances	Oct 24, 2024	BenchmarkingImage to Video Generation	CodeCode Available	3	5
CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms	Nov 16, 2021	BenchmarkingDeep Reinforcement Learning	CodeCode Available	3	5
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation	Jun 19, 2024	BenchmarkingImage Generation	CodeCode Available	3	5
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models	May 22, 2025	BenchmarkingInstruction Following	CodeCode Available	3	5
DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks	Jun 13, 2024	Benchmarking	CodeCode Available	3	5
A Survey on Performance Metrics for Object-Detection Algorithms	Jul 21, 2020	BenchmarkingObject	CodeCode Available	3	5
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework	Aug 2, 2024	BenchmarkingDataset Generation	CodeCode Available	3	5
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making	Oct 9, 2024	BenchmarkingDecision Making	CodeCode Available	3	5
AER: Auto-Encoder with Regression for Time Series Anomaly Detection	Dec 27, 2022	Anomaly DetectionBenchmarking	CodeCode Available	3	5
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models	May 22, 2025	BenchmarkingFairness	CodeCode Available	3	5
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents	Jan 24, 2024	Benchmarking	CodeCode Available	3	5
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents	Oct 3, 2024	Autonomous DrivingBackdoor Attack	CodeCode Available	3	5
ACEGEN: Reinforcement learning of generative chemical agents for drug discovery	May 7, 2024	BenchmarkingDecision Making	CodeCode Available	3	5

Show:10 25 50

← PrevPage 5 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified