SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2801–2810 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
GPTs and Language Barrier: A Cross-Lingual Legal QA Examination	Mar 26, 2024	ArticlesBenchmarking	—Unverified	0	0
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models	Apr 14, 2025	BenchmarkingDescriptive	—Unverified	0	0
Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems	Mar 9, 2025	Benchmarking	—Unverified	0	0
Variational Laplace for Bayesian neural networks	Nov 20, 2020	BenchmarkingVariational Inference	—Unverified	0	0
Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities	May 13, 2025	automatic-speech-translationBenchmarking	—Unverified	0	0
Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking	Mar 17, 2024	BenchmarkingDialogue State Tracking	—Unverified	0	0
Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings	May 19, 2025	BenchmarkingCombinatorial Optimization	—Unverified	0	0
Beyond Benchmarks: On The False Promise of AI Regulation	Jan 26, 2025	Benchmarking	—Unverified	0	0
Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms	Jun 10, 2025	BenchmarkingGraph Attention	—Unverified	0	0
Graph-based Deep-Tree Recursive Neural Network (DTRNN) for Text Classification	Sep 4, 2018	BenchmarkingGeneral Classification	—Unverified	0	0

Show:10 25 50

← PrevPage 281 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified