SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1481–1490 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
featsel: A framework for benchmarking of feature selection algorithms and cost functions	Jul 19, 2017	BenchmarkingComputational Efficiency	CodeCode Available	1	5
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation	May 14, 2024	BenchmarkingMultiple-choice	CodeCode Available	1	5
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things	Sep 29, 2023	BenchmarkingFederated Learning	CodeCode Available	1	5
FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks	Nov 22, 2021	BenchmarkingFederated Learning	CodeCode Available	1	5
Beyond neural scaling laws: beating power law scaling via data pruning	Jun 29, 2022	Benchmarking	CodeCode Available	1	5
Beyond Normal: On the Evaluation of Mutual Information Estimators	Jun 19, 2023	BenchmarkingDomain Generalization	CodeCode Available	1	5
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs	Jun 22, 2023	Arithmetic ReasoningBenchmarking	CodeCode Available	1	5
KoLA: Carefully Benchmarking World Knowledge of Large Language Models	Jun 15, 2023	BenchmarkingHallucination	CodeCode Available	1	5
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite	Sep 28, 2023	Benchmarking	CodeCode Available	1	5
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement	Apr 22, 2025	BenchmarkingLanguage Modeling	CodeCode Available	1	5

Show:10 25 50

← PrevPage 149 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified