SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1681–1690 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPs	Apr 21, 2022	BenchmarkingChange Point Detection	CodeCode Available	0	5
Benchmarking and optimizing organism wide single-cell RNA alignment methods	Mar 26, 2025	BenchmarkingDecoder	CodeCode Available	0	5
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines	Jun 20, 2024	BenchmarkingDecision Making	CodeCode Available	0	5
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation	Apr 5, 2024	AttributeBenchmarking	CodeCode Available	0	5
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models	May 23, 2025	BenchmarkingDiversity	CodeCode Available	0	5
Knowledge Enhanced Conditional Imputation for Healthcare Time-series	Dec 27, 2023	BenchmarkingImputation	CodeCode Available	0	5
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning	Jan 8, 2025	Benchmarking	CodeCode Available	0	5
IoT Data Trust Evaluation via Machine Learning	Aug 15, 2023	BenchmarkingTime Series	CodeCode Available	0	5
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions	Oct 18, 2023	BenchmarkingVisual Grounding	CodeCode Available	0	5
A Benchmarking Study of Vision-based Robotic Grasping Algorithms	Mar 14, 2025	BenchmarkingRobotic Grasping	CodeCode Available	0	5

Show:10 25 50

← PrevPage 169 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified