SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1581–1590 of 5548 papers

Title	Date	Tasks	Status	Hype
HyBiomass: Global Hyperspectral Imagery Benchmark Dataset for Evaluating Geospatial Foundation Models in Forest Aboveground Biomass Estimation	Jun 12, 2025	Benchmarking	—Unverified	0
Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning	Jun 12, 2025	Benchmarking	—Unverified	0
ScholarSearch: Benchmarking Scholar Searching Ability of LLMs	Jun 11, 2025	BenchmarkingInformation Retrieval	—Unverified	0
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios	Jun 11, 2025	Action RecognitionAction Segmentation	CodeCode Available	0
FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models	Jun 11, 2025	BenchmarkingFederated Learning	—Unverified	0
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents	Jun 11, 2025	Benchmarking	—Unverified	0
ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution	Jun 11, 2025	Benchmarking	—Unverified	0
Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models	Jun 11, 2025	BenchmarkingCode Generation	—Unverified	0
GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments	Jun 11, 2025	Active LearningBenchmarking	—Unverified	0
A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild	Jun 11, 2025	Age EstimationBenchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 159 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified