SOTAVerified

Benchmarking

Papers

Showing 50115020 of 5548 papers

TitleStatusHype
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual UpdatesCode0
A comparison of translation performance between DeepL and SupertextCode0
Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation FrameworkCode0
Benchmarking Multimodal CoT Reward Model Stepwise by Visual ProgramCode0
Benchmarking Machine Translation with Cultural AwarenessCode0
Benchmarking Multilabel Topic Classification in the Kyrgyz LanguageCode0
Unsupervised Tracklet Person Re-IdentificationCode0
Empirical Study of Off-Policy Policy Evaluation for Reinforcement LearningCode0
TMPNN: High-Order Polynomial Regression Based on Taylor Map FactorizationCode0
Nmbr9 as a Constraint Programming ChallengeCode0
Show:102550
← PrevPage 502 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified