SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2591–2600 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling	Nov 21, 2024	ArticlesBenchmarking	CodeCode Available	0	5
Aesthetic Image Captioning From Weakly-Labelled Photographs	Aug 29, 2019	Aesthetic Image CaptioningBenchmarking	CodeCode Available	0	5
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty	Nov 5, 2020	Adversarial AttackBenchmarking	CodeCode Available	0	5
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation	Jun 13, 2024	BenchmarkingHallucination	CodeCode Available	0	5
Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering Approach	Oct 9, 2017	BenchmarkingClustering	CodeCode Available	0	5
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN Parameters	Sep 8, 2022	Benchmarkingcontinuous-control	CodeCode Available	0	5
Fluorescence Reference Target Quantitative Analysis Library	Apr 22, 2025	Benchmarking	CodeCode Available	0	5
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0	Aug 23, 2023	Benchmarkingregression	CodeCode Available	0	5
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem	Mar 6, 2024	BenchmarkingHallucination	CodeCode Available	0	5
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification	Jan 14, 2025	BenchmarkingGraph Representation Learning	CodeCode Available	0	5

Show:10 25 50

← PrevPage 260 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified