SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1621–1630 of 5548 papers

Title	Date	Tasks	Status	Hype
Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra	Sep 22, 2024	Benchmarking	—Unverified	0
The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests	Sep 22, 2024	Benchmarking	—Unverified	0
A Survey on Multimodal Benchmarks: In the Era of Large AI Models	Sep 21, 2024	BenchmarkingSurvey	CodeCode Available	2
Efficient and Effective Model Extraction	Sep 21, 2024	Benchmarkingmodel	CodeCode Available	0
CONGRA: Benchmarking Automatic Conflict Resolution	Sep 21, 2024	Benchmarking	CodeCode Available	0
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology	Sep 21, 2024	BenchmarkingDepth Estimation	—Unverified	0
Present and Future Generalization of Synthetic Image Detectors	Sep 21, 2024	BenchmarkingDiversity	CodeCode Available	0
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science Communicators	Sep 21, 2024	Benchmarking	CodeCode Available	0
An Evolutionary Algorithm For the Vehicle Routing Problem with Drones with Interceptions	Sep 21, 2024	BenchmarkingScheduling	—Unverified	0
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection	Sep 20, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0

Show:10 25 50

← PrevPage 163 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified