SOTAVerified

Benchmarking

Papers

Showing 28612870 of 5548 papers

TitleStatusHype
WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain0
Advances in Preference-based Reinforcement Learning: A Review0
SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLM's Ability to Generate Digital TwinsCode0
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands0
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning0
UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library0
ISLES'24: Improving final infarct prediction in ischemic stroke using multimodal imaging and clinical data0
Benchmarking Large Language Models for Math Reasoning TasksCode0
Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving0
Benchmarking quantum machine learning kernel training for classification tasksCode0
Show:102550
← PrevPage 287 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified