SOTAVerified

Benchmarking

Papers

Showing 23412350 of 5548 papers

TitleStatusHype
PREGO: online mistake detection in PRocedural EGOcentric videosCode1
Advancing LLM Reasoning Generalists with Preference TreesCode3
EV2Gym: A Flexible V2G Simulator for EV Smart Charging Research and BenchmarkingCode2
Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach0
Diffusion-Driven Domain Adaptation for Generating 3D Molecules0
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations0
Are large language models superhuman chemists?Code2
SpiralMLP: A Lightweight Vision MLP Architecture0
Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian ContextCode0
Show:102550
← PrevPage 235 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified