SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1291–1300 of 5548 papers

Title	Date	Tasks	Status	Hype
Performance Benchmarking of Psychomotor Skills Using Wearable Devices: An Application in Sport	Nov 25, 2024	Benchmarking	—Unverified	0
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation	Nov 25, 2024	Active LearningBayesian Inference	—Unverified	0
Benchmarking Active Learning for NILM	Nov 24, 2024	Active LearningBenchmarking	—Unverified	0
ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain	Nov 23, 2024	BenchmarkingDiversity	CodeCode Available	0
Reassessing Layer Pruning in LLMs: New Insights and Methods	Nov 23, 2024	BenchmarkingGPU	CodeCode Available	0
Benchmarking the Robustness of Optical Flow Estimation to Corruptions	Nov 22, 2024	Autonomous DrivingBenchmarking	CodeCode Available	0
AdamZ: An Enhanced Optimisation Method for Neural Network Training	Nov 22, 2024	Benchmarking	CodeCode Available	0
Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains	Nov 22, 2024	BenchmarkingCaption Generation	—Unverified	0
StackEval: Benchmarking LLMs in Coding Assistance	Nov 21, 2024	Benchmarking	CodeCode Available	1
Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels	Nov 21, 2024	BenchmarkingMachine Translation	CodeCode Available	0

Show:10 25 50

← PrevPage 130 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified