SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–110 of 5548 papers

Title	Date	Tasks	Status	Hype
Sum Rate Maximization for Pinching Antennas Assisted RSMA System With Multiple Waveguides	Jun 12, 2025	Benchmarking	—Unverified	0
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics	Jun 12, 2025	Benchmarking	—Unverified	0
Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning	Jun 12, 2025	Benchmarking	—Unverified	0
SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis	Jun 12, 2025	BenchmarkingDialogue Generation	CodeCode Available	2
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents	Jun 11, 2025	Benchmarking	—Unverified	0
ICE-ID: A Novel Historical Census Data Benchmark Comparing NARS against LLMs, \& a ML Ensemble on Longitudinal Identity Resolution	Jun 11, 2025	Benchmarking	—Unverified	0
ScholarSearch: Benchmarking Scholar Searching Ability of LLMs	Jun 11, 2025	BenchmarkingInformation Retrieval	—Unverified	0
Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models	Jun 11, 2025	BenchmarkingCode Generation	—Unverified	0
Attention, Please! Revisiting Attentive Probing for Masked Image Modeling	Jun 11, 2025	BenchmarkingComputational Efficiency	CodeCode Available	1
GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras	Jun 11, 2025	Benchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 11 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified