SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 91–100 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset	May 17, 2024	16kBenchmarking	CodeCode Available	3	5
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making	Oct 9, 2024	BenchmarkingDecision Making	CodeCode Available	3	5
Benchmarking LLMs via Uncertainty Quantification	Jan 23, 2024	BenchmarkingUncertainty Quantification	CodeCode Available	3	5
Benchmarking Multimodal AutoML for Tabular Data with Text Fields	Nov 4, 2021	AutoMLBenchmarking	CodeCode Available	3	5
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents	Oct 31, 2024	Benchmarking	CodeCode Available	3	5
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning	Jan 26, 2023	BenchmarkingDeep Reinforcement Learning	CodeCode Available	3	5
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs	Mar 14, 2022	BenchmarkingGraph Embedding	CodeCode Available	3	5
mlpack 3: a fast, flexible machine learning library	Jun 18, 2018	BenchmarkingBIG-bench Machine Learning	CodeCode Available	3	5
DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks	Jun 13, 2024	Benchmarking	CodeCode Available	3	5
A Survey on Performance Metrics for Object-Detection Algorithms	Jul 21, 2020	BenchmarkingObject	CodeCode Available	3	5

Show:10 25 50

← PrevPage 10 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified