SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2211–2220 of 5548 papers

Title	Date	Tasks	Status	Hype
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks	Feb 20, 2025	BenchmarkingCombinatorial Optimization	—Unverified	0
Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models	Feb 20, 2025	Benchmarking	—Unverified	0
Statistical Scenario Modelling and Lookalike Distributions for Multi-Variate AI Risk	Feb 20, 2025	Benchmarking	—Unverified	0
Sentence Smith: Formally Controllable Text Transformation and its Application to Evaluation of Text Embedding Models	Feb 20, 2025	BenchmarkingSentence	—Unverified	0
Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse	Feb 20, 2025	BenchmarkingGraph Attention	—Unverified	0
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems	Feb 20, 2025	BenchmarkingDecision Making	—Unverified	0
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework	Feb 20, 2025	BenchmarkingQuestion Answering	CodeCode Available	0
PredictaBoard: Benchmarking LLM Score Predictability	Feb 20, 2025	BenchmarkingCommon Sense Reasoning	CodeCode Available	0
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare	Feb 19, 2025	BenchmarkingDiversity	—Unverified	0
Position: There are no Champions in Long-Term Time Series Forecasting	Feb 19, 2025	BenchmarkingPosition	—Unverified	0

Show:10 25 50

← PrevPage 222 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified