SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2921–2930 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Ultra-Low-Power μNPUs	Mar 28, 2025	Benchmarking	—Unverified	0	0
How Good is a Video Summary? A New Benchmarking Dataset and Evaluation Framework Towards Realistic Video Summarization	Jan 26, 2021	BenchmarkingSupervised Video Summarization	—Unverified	0	0
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making	Jun 25, 2024	BenchmarkingDecision Making	—Unverified	0	0
How Good Is Neural Combinatorial Optimization? A Systematic Evaluation on the Traveling Salesman Problem	Sep 22, 2022	BenchmarkingCombinatorial Optimization	—Unverified	0	0
How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference	May 14, 2025	Benchmarking	—Unverified	0	0
How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers	Oct 19, 2020	BenchmarkingGraph Mining	—Unverified	0	0
How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study	Dec 25, 2024	BenchmarkingCode Generation	—Unverified	0	0
Benchmarking Ultra-High-Definition Image Super-Resolution	Jan 1, 2021	4k8k	—Unverified	0	0
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input	Jan 6, 2025	BenchmarkingForm	—Unverified	0	0
Benchmarking Twitter Sentiment Analysis Tools	May 1, 2014	BenchmarkingDecision Making	—Unverified	0	0

Show:10 25 50

← PrevPage 293 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified