SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4251–4260 of 5548 papers

Title	Date	Tasks	Status	Hype
Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling	Dec 15, 2020	BenchmarkingDeep Learning	—Unverified	0
Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI	Jan 13, 2025	ARCBenchmarking	—Unverified	0
Understanding Foundation Models: Are We Back in 1924?	Sep 11, 2024	Benchmarking	—Unverified	0
Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems	Oct 11, 2022	BenchmarkingRecommendation Systems	—Unverified	0
Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets	Oct 6, 2018	BenchmarkingLanguage Modeling	—Unverified	0
Understanding the Limits of Lifelong Knowledge Editing in LLMs	Mar 7, 2025	Benchmarkingknowledge editing	—Unverified	0
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective	Jun 19, 2024	BenchmarkingContinual Pretraining	—Unverified	0
Understanding the User: An Intent-Based Ranking Dataset	Aug 30, 2024	BenchmarkingInformation Retrieval	—Unverified	0
Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models	Dec 5, 2024	BenchmarkingFeature Importance	—Unverified	0
Unifying Few- and Zero-Shot Egocentric Action Recognition	May 27, 2020	Action RecognitionBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 426 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified