SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5011–5020 of 5548 papers

Title	Date	Tasks	Status	Hype
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates	Oct 28, 2024	Benchmarking	CodeCode Available	0
A comparison of translation performance between DeepL and Supertext	Feb 4, 2025	BenchmarkingMachine Translation	CodeCode Available	0
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework	Feb 20, 2025	BenchmarkingQuestion Answering	CodeCode Available	0
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program	Apr 9, 2025	Benchmarking	CodeCode Available	0
Benchmarking Machine Translation with Cultural Awareness	May 23, 2023	BenchmarkingIn-Context Learning	CodeCode Available	0
Benchmarking Multilabel Topic Classification in the Kyrgyz Language	Aug 30, 2023	BenchmarkingClassification	CodeCode Available	0
Unsupervised Tracklet Person Re-Identification	Mar 1, 2019	BenchmarkingDomain Adaptation	CodeCode Available	0
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning	Nov 15, 2019	BenchmarkingDiversity	CodeCode Available	0
TMPNN: High-Order Polynomial Regression Based on Taylor Map Factorization	Jul 30, 2023	BenchmarkingMulti-target regression	CodeCode Available	0
Nmbr9 as a Constraint Programming Challenge	Jan 13, 2020	BenchmarkingBoard Games	CodeCode Available	0

Show:10 25 50

← PrevPage 502 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified