SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 61–70 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset	May 17, 2024	16kBenchmarking	CodeCode Available	3
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation	Jul 24, 2024	BenchmarkingHuman Animation	CodeCode Available	3
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents	Oct 3, 2024	Autonomous DrivingBackdoor Attack	CodeCode Available	3
Benchmarking Multimodal AutoML for Tabular Data with Text Fields	Nov 4, 2021	AutoMLBenchmarking	CodeCode Available	3
Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+	Mar 16, 2023	BenchmarkingGraph Regression	CodeCode Available	3
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems	Sep 2, 2024	BenchmarkingInstruction Following	CodeCode Available	3
Benchmarking Automatic Machine Learning Frameworks	Aug 17, 2018	Automated Feature EngineeringAutoML	CodeCode Available	3
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis	Oct 9, 2023	BenchmarkingMultivariate Time Series Forecasting	CodeCode Available	3
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation	Jun 19, 2024	BenchmarkingImage Generation	CodeCode Available	3
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent	Nov 5, 2024	BenchmarkingHallucination	CodeCode Available	3

Show:10 25 50

← PrevPage 7 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified