SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 91–100 of 5548 papers

Title	Date	Tasks	Status	Hype
ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications	Jun 14, 2025	Benchmarking	CodeCode Available	3
Learning Best Paths in Quantum Networks	Jun 14, 2025	Benchmarking	—Unverified	0
Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables	Jun 13, 2025	BenchmarkingDescriptive	—Unverified	0
SemanticST: Spatially Informed Semantic Graph Learning for Clustering, Integration, and Scalable Analysis of Spatial Transcriptomics	Jun 13, 2025	BenchmarkingContrastive Learning	—Unverified	0
Temporal cross-validation impacts multivariate time series subsequence anomaly detection evaluation	Jun 13, 2025	Anomaly DetectionBenchmarking	—Unverified	0
crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023	Jun 13, 2025	BenchmarkingDomain Adaptation	—Unverified	0
EconGym: A Scalable AI Testbed with Diverse Economic Tasks	Jun 13, 2025	Benchmarking	—Unverified	0
Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AI	Jun 13, 2025	BenchmarkingIn-Context Learning	CodeCode Available	0
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks	Jun 13, 2025	BenchmarkingLarge Language Model	CodeCode Available	2
HyBiomass: Global Hyperspectral Imagery Benchmark Dataset for Evaluating Geospatial Foundation Models in Forest Aboveground Biomass Estimation	Jun 12, 2025	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 10 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified