SOTAVerified

Benchmarking

Papers

Showing 23612370 of 5548 papers

TitleStatusHype
GPTs and Language Barrier: A Cross-Lingual Legal QA Examination0
ArabicaQA: A Comprehensive Dataset for Arabic Question AnsweringCode1
Benchmarking Video Frame Interpolation0
DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts0
NSINA: A News Corpus for SinhalaCode0
CodeS: Natural Language to Code Repository via Multi-Layer SketchCode1
Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmarkCode1
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based ScoringCode0
On the Fragility of Active Learners for Text ClassificationCode0
Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation0
Show:102550
← PrevPage 237 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified