SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2261–2270 of 5548 papers

Title	Date	Tasks	Status	Hype
Causal Analysis of ASR Errors for Children: Quantifying the Impact of Physiological, Cognitive, and Extrinsic Factors	Feb 12, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Handwritten Text Recognition: A Survey	Feb 12, 2025	BenchmarkingHandwritten Text Recognition	—Unverified	0
One-Shot Federated Learning with Classifier-Free Diffusion Models	Feb 12, 2025	BenchmarkingDataset Generation	—Unverified	0
The Devil is in the Prompts: De-Identification Traces Enhance Memorization Risks in Synthetic Chest X-Ray Generation	Feb 11, 2025	BenchmarkingDe-identification	CodeCode Available	0
exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem	Feb 11, 2025	BenchmarkingDiversity	CodeCode Available	0
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories	Feb 10, 2025	Benchmarking	—Unverified	0
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations	Feb 10, 2025	BenchmarkingIn-Context Learning	—Unverified	0
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation	Feb 10, 2025	Benchmarking	—Unverified	0
Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring	Feb 10, 2025	Benchmarking	CodeCode Available	0
Decoding Complexity: Intelligent Pattern Exploration with CHPDA (Context Aware Hybrid Pattern Detection Algorithm)	Feb 9, 2025	BenchmarkingCPU	—Unverified	0

Show:10 25 50

← PrevPage 227 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified