SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2841–2850 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Guidelines for Fine-grained Sentence-level Arabic Readability Annotation	Oct 11, 2024	BenchmarkingSentence	—Unverified	0	0
Guidelines for the Quality Assessment of Energy-Aware NAS Benchmarks	May 21, 2025	BenchmarkingGPU	—Unverified	0	0
Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge	Apr 3, 2025	AnatomyBenchmarking	—Unverified	0	0
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance	Mar 1, 2024	BenchmarkingStance Detection	—Unverified	0	0
VoiceWukong: Benchmarking Deepfake Voice Detection	Sep 10, 2024	BenchmarkingFace Swapping	—Unverified	0	0
h4rm3l: A language for Composable Jailbreak Attack Synthesis	Aug 9, 2024	BenchmarkingProgram Synthesis	—Unverified	0	0
Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text	Aug 3, 2022	BenchmarkingData Augmentation	—Unverified	0	0
Benchmarking YOLOv8 for Optimal Crack Detection in Civil Infrastructure	Jan 12, 2025	BenchmarkingHyperparameter Optimization	—Unverified	0	0
AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems	May 26, 2025	BenchmarkingRecommendation Systems	—Unverified	0	0
HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images	Nov 7, 2024	AnatomyBenchmarking	—Unverified	0	0

Show:10 25 50

← PrevPage 285 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified