SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4121–4130 of 5548 papers

Title	Date	Tasks	Status	Hype
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input	Jan 6, 2025	BenchmarkingForm	—Unverified	0
The Forchheim Image Database for Camera Identification in the Wild	Nov 4, 2020	BenchmarkingFact Checking	—Unverified	0
The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech	Apr 17, 2021	Benchmarking	—Unverified	0
The Impact of Genomic Variation on Function (IGVF) Consortium	Jul 24, 2023	Benchmarking	—Unverified	0
The iNaturalist Sounds Dataset	May 31, 2025	Benchmarking	—Unverified	0
The Interactive Effects of Operators and Parameters to GA Performance Under Different Problem Sizes	Aug 1, 2015	Benchmarking	—Unverified	0
The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine	Sep 12, 2024	Autonomous DrivingBenchmarking	—Unverified	0
The Jungle of Generative Drug Discovery: Traps, Treasures, and Ways Out	Dec 24, 2024	BenchmarkingDeep Learning	—Unverified	0
The Karp Dataset	Jan 24, 2025	BenchmarkingMathematical Reasoning	—Unverified	0
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Oct 2, 2024	BenchmarkingHallucination	—Unverified	0

Show:10 25 50

← PrevPage 413 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified