SOTAVerified

Benchmarking

Papers

Showing 14111420 of 5548 papers

TitleStatusHype
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training0
ODRL: A Benchmark for Off-Dynamics Reinforcement LearningCode2
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual UpdatesCode0
LLM-initialized Differentiable Causal Discovery0
CODES: Benchmarking Coupled ODE SurrogatesCode0
CURATe: Benchmarking Personalised Alignment of Conversational AI AssistantsCode0
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?Code0
Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce0
BongLLaMA: LLaMA for Bangla Language0
SPICEPilot: Navigating SPICE Code Generation and Simulation with AI GuidanceCode1
Show:102550
← PrevPage 142 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified