SOTAVerified

Benchmarking

Papers

Showing 27712780 of 5548 papers

TitleStatusHype
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data0
The Disagreement Problem in Faithfulness Metrics0
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language ModelsCode1
Flames: Benchmarking Value Alignment of LLMs in ChineseCode1
Identification of vortex in unstructured mesh with graph neural networks0
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
MultiIoT: Benchmarking Machine Learning for the Internet of ThingsCode1
SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification0
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMsCode1
An efficiency analysis of Spanish airports0
Show:102550
← PrevPage 278 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified