SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 271–280 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Representations for Speech, Music, and Acoustic Events	May 2, 2024	Audio ClassificationBenchmarking	CodeCode Available	2
HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond	May 1, 2024	BenchmarkingHigh-Level Synthesis	CodeCode Available	2
SIDBench: A Python Framework for Reliably Assessing Synthetic Image Detection Methods	Apr 29, 2024	BenchmarkingImage Generation	CodeCode Available	2
Benchmarking Benchmark Leakage in Large Language Models	Apr 29, 2024	BenchmarkingMathematical Reasoning	CodeCode Available	2
LongEmbed: Extending Embedding Models for Long Context Retrieval	Apr 18, 2024	4k8k	CodeCode Available	2
VBR: A Vision Benchmark in Rome	Apr 17, 2024	Autonomous VehiclesBenchmarking	CodeCode Available	2
Revealing data leakage in protein interaction benchmarks	Apr 16, 2024	Benchmarking	CodeCode Available	2
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance	Apr 4, 2024	BenchmarkingImage Generation	CodeCode Available	2
EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking	Apr 2, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available	2
Are large language models superhuman chemists?	Apr 1, 2024	Benchmarking	CodeCode Available	2

Show:10 25 50

← PrevPage 28 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified