SOTAVerified

Benchmarking

Papers

Showing 31213130 of 5548 papers

TitleStatusHype
FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections0
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning0
Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension0
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension0
FarsBase-KBP: A Knowledge Base Population System for the Persian Knowledge Graph0
Fast, approximate kinetics of RNA folding0
FastDraft: How to Train Your Draft0
Fast Empirical Scenarios0
FastEnsemble: Benchmarking and Accelerating Ensemble-based Uncertainty Estimation for Image-to-Image Translation0
Fast Labeling and Transcription with the Speechalyzer Toolkit0
Show:102550
← PrevPage 313 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified