SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2661–2670 of 5548 papers

Title	Date	Tasks	Status	Hype
Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification	May 24, 2024	BenchmarkingData Augmentation	—Unverified	0
Benchmarking Single-Image Reflection Removal Algorithms	Oct 1, 2017	BenchmarkingReflection Removal	—Unverified	0
Benchmarking projective simulation in navigation problems	Apr 23, 2018	BenchmarkingQ-Learning	—Unverified	0
From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano	Jul 5, 2024	AttributeBenchmarking	—Unverified	0
A Survey on LLM-based News Recommender Systems	Feb 13, 2025	BenchmarkingFairness	—Unverified	0
From Blind Solvers to Logical Thinkers: Benchmarking LLMs' Logical Integrity on Faulty Mathematical Problems	Oct 24, 2024	BenchmarkingCommon Sense Reasoning	—Unverified	0
Benchmarking SMT Performance for Farsi Using the TEP++ Corpus	May 1, 2015	BenchmarkingMachine Translation	—Unverified	0
From Code to Play: Benchmarking Program Search for Games Using Large Language Models	Dec 5, 2024	Atari GamesBenchmarking	—Unverified	0
From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks	Apr 14, 2022	Adversarial AttackAdversarial Robustness	—Unverified	0
Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning	Nov 20, 2023	BenchmarkingInverse Rendering	—Unverified	0

Show:10 25 50

← PrevPage 267 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified