SOTAVerified

Benchmarking

Papers

Showing 12511260 of 5548 papers

TitleStatusHype
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model EvaluationCode1
HINT3: Raising the bar for Intent Detection in the WildCode1
Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State DecodingCode1
HazeSpace2M: A Dataset for Haze Aware Single Image DehazingCode1
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBsCode1
CODEMENV: Benchmarking Large Language Models on Code MigrationCode1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarkingCode1
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and TasksCode1
AIPerf: Automated machine learning as an AI-HPC benchmarkCode1
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and EfficiencyCode1
Show:102550
← PrevPage 126 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified