SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1711–1720 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy	Aug 9, 2024	BenchmarkingMedical Image Analysis	CodeCode Available	0	5
Chumor 2.0: Towards Benchmarking Chinese Humor Understanding	Dec 23, 2024	Benchmarking	CodeCode Available	0	5
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance	Sep 22, 2024	AutoMLBenchmarking	CodeCode Available	0	5
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science	Feb 23, 2025	BenchmarkingCode Generation	CodeCode Available	0	5
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions	Oct 18, 2023	BenchmarkingVisual Grounding	CodeCode Available	0	5
Learn How to Query from Unlabeled Data Streams in Federated Learning	Dec 11, 2024	BenchmarkingDecision Making	CodeCode Available	0	5
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM	Oct 8, 2014	Benchmarking	CodeCode Available	0	5
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study	Feb 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0	5
Inverse Contextual Bandits: Learning How Behavior Evolves over Time	Jul 13, 2021	BenchmarkingDecision Making	CodeCode Available	0	5
Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-Bench	Apr 1, 2025	Benchmarking	CodeCode Available	0	5

Show:10 25 50

← PrevPage 172 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified