SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2341–2350 of 5548 papers

Title	Date	Tasks	Status	Hype
PREGO: online mistake detection in PRocedural EGOcentric videos	Apr 2, 2024	Action RecognitionBenchmarking	CodeCode Available	1
Advancing LLM Reasoning Generalists with Preference Trees	Apr 2, 2024	BenchmarkingCode Generation	CodeCode Available	3
EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and Benchmarking	Apr 2, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available	2
Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach	Apr 2, 2024	BenchmarkingCommon Sense Reasoning	—Unverified	0
Diffusion-Driven Domain Adaptation for Generating 3D Molecules	Apr 1, 2024	BenchmarkingDecoder	—Unverified	0
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations	Apr 1, 2024	BenchmarkingMath	—Unverified	0
Are large language models superhuman chemists?	Apr 1, 2024	Benchmarking	CodeCode Available	2
SpiralMLP: A Lightweight Vision MLP Architecture	Mar 31, 2024	Benchmarking	—Unverified	0
Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells	Mar 29, 2024	Benchmarking	—Unverified	0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context	Mar 29, 2024	BenchmarkingSentence	CodeCode Available	0

Show:10 25 50

← PrevPage 235 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified