SOTAVerified

Benchmarking

Papers

Showing 31913200 of 5548 papers

TitleStatusHype
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning0
PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEsCode2
Re-Benchmarking Pool-Based Active Learning for Binary ClassificationCode0
MLonMCU: TinyML Benchmarking with Fast RetargetingCode1
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?Code1
KoLA: Carefully Benchmarking World Knowledge of Large Language ModelsCode1
One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial SupportCode0
BED: Bi-Encoder-Based Detectors for Out-of-Distribution DetectionCode0
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion0
Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language ModelsCode1
Show:102550
← PrevPage 320 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified