SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–760 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
A Comprehensive Overview of Large Language Models	Jul 12, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking Simulation-Based Inference	Jan 12, 2021	Benchmarking	CodeCode Available	1	5
A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation Models	Aug 2, 2022	BenchmarkingSynthetic Data Generation	CodeCode Available	1	5
BeHonest: Benchmarking Honesty in Large Language Models	Jun 19, 2024	BenchmarkingMisinformation	CodeCode Available	1	5
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset	Jun 5, 2023	BenchmarkingMultiple-choice	CodeCode Available	1	5
AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling	Nov 1, 2021	Benchmarkingobject-detection	CodeCode Available	1	5
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences	May 28, 2024	BenchmarkingFeature Engineering	CodeCode Available	1	5
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs	Sep 18, 2021	BenchmarkingComplex Query Answering	CodeCode Available	1	5
AirSim Drone Racing Lab	Mar 12, 2020	BenchmarkingOptical Flow Estimation	CodeCode Available	1	5
A SWAT-based Reinforcement Learning Framework for Crop Management	Feb 10, 2023	BenchmarkingDecision Making	CodeCode Available	1	5

Show:10 25 50

← PrevPage 76 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified