SOTAVerified

Benchmarking

Papers

Showing 45014510 of 5548 papers

TitleStatusHype
Beyond MD17: the reactive xxMD datasetCode0
The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in RCode0
Learning to Transfer for Traffic Forecasting via Multi-task LearningCode0
IOLBENCH: Benchmarking LLMs on Linguistic ReasoningCode0
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot InteractionsCode0
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data ImbalanceCode0
BEARD: Benchmarking the Adversarial Robustness for Dataset DistillationCode0
RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim VerificationCode0
Inverse Contextual Bandits: Learning How Behavior Evolves over TimeCode0
UCFE: A User-Centric Financial Expertise Benchmark for Large Language ModelsCode0
Show:102550
← PrevPage 451 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified