SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2281–2290 of 5548 papers

Title	Date	Tasks	Status	Hype
MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf	Feb 5, 2025	BenchmarkingScheduling	—Unverified	0
Energy & Force Regression on DFT Trajectories is Not Enough for Universal Machine Learning Interatomic Potentials	Feb 5, 2025	Benchmarking	—Unverified	0
Optimal PMU Placement for Kalman Filtering of DAE Power System Models	Feb 5, 2025	BenchmarkingState Estimation	—Unverified	0
Benchmarking Time Series Forecasting Models: From Statistical Techniques to Foundation Models in Real-World Applications	Feb 5, 2025	BenchmarkingFeature Engineering	—Unverified	0
xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods	Feb 5, 2025	Benchmarking	—Unverified	0
TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics	Feb 5, 2025	BenchmarkingLink Prediction	CodeCode Available	0
No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets	Feb 4, 2025	AllBenchmarking	CodeCode Available	0
Evalita-LLM: Benchmarking Large Language Models on Italian	Feb 4, 2025	BenchmarkingMultiple-choice	—Unverified	0
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models	Feb 4, 2025	BenchmarkingDecision Making	—Unverified	0
A comparison of translation performance between DeepL and Supertext	Feb 4, 2025	BenchmarkingMachine Translation	CodeCode Available	0

Show:10 25 50

← PrevPage 229 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified