SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1661–1670 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines	Apr 2, 2021	Benchmarking	CodeCode Available	0	5
Benchmarking and optimizing organism wide single-cell RNA alignment methods	Mar 26, 2025	BenchmarkingDecoder	CodeCode Available	0	5
An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations	Aug 26, 2019	BenchmarkingClustering	CodeCode Available	0	5
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models	May 23, 2025	BenchmarkingDiversity	CodeCode Available	0	5
JExplore: Design Space Exploration Tool for Nvidia Jetson Boards	Feb 16, 2025	BenchmarkingGPU	CodeCode Available	0	5
A Dataset for Web-Scale Knowledge Base Population	Jun 3, 2018	BenchmarkingKnowledge Base Population	CodeCode Available	0	5
Benchmarking and Improving Text-to-SQL Generation under Ambiguity	Oct 20, 2023	BenchmarkingDiversity	CodeCode Available	0	5
An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State Estimation	Feb 21, 2023	BenchmarkingState Estimation	CodeCode Available	0	5
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs	Apr 10, 2024	Benchmarkingknowledge editing	CodeCode Available	0	5
A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches	May 22, 2023	BenchmarkingClassification	CodeCode Available	0	5

Show:10 25 50

← PrevPage 167 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified