SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2771–2780 of 5548 papers

Title	Date	Tasks	Status	Hype
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data	Nov 13, 2023	BenchmarkingFeature Importance	—Unverified	0
The Disagreement Problem in Faithfulness Metrics	Nov 13, 2023	BenchmarkingExplainable artificial intelligence	—Unverified	0
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models	Nov 13, 2023	BenchmarkingInstruction Following	CodeCode Available	1
Flames: Benchmarking Value Alignment of LLMs in Chinese	Nov 12, 2023	BenchmarkingFairness	CodeCode Available	1
Identification of vortex in unstructured mesh with graph neural networks	Nov 11, 2023	BenchmarkingGraph Generation	—Unverified	0
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation	Nov 10, 2023	BenchmarkingCloud Computing	CodeCode Available	1
MultiIoT: Benchmarking Machine Learning for the Internet of Things	Nov 10, 2023	BenchmarkingRepresentation Learning	CodeCode Available	1
SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification	Nov 9, 2023	BenchmarkingInstance Segmentation	—Unverified	0
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs	Nov 9, 2023	BenchmarkingQuestion Answering	CodeCode Available	1
An efficiency analysis of Spanish airports	Nov 8, 2023	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 278 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified