SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3291–3300 of 5548 papers

Title	Date	Tasks	Status	Hype
Improvements & Evaluations on the MLCommons CloudMask Benchmark	Mar 7, 2024	Benchmarking	CodeCode Available	0
NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems	Mar 7, 2024	BenchmarkingDependency Parsing	—Unverified	0
Benchmarking News Recommendation in the Era of Green AI	Mar 7, 2024	BenchmarkingGPU	—Unverified	0
Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI	Mar 7, 2024	Benchmarking	CodeCode Available	0
Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task	Mar 6, 2024	Benchmarking	CodeCode Available	0
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving	Mar 6, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0
Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural Networks	Mar 6, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem	Mar 6, 2024	BenchmarkingHallucination	CodeCode Available	0
A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video	Mar 6, 2024	BenchmarkingCrowd Counting	—Unverified	0
Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation	Mar 5, 2024	BenchmarkingIn-Context Learning	—Unverified	0

Show:10 25 50

← PrevPage 330 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified