SOTAVerified

Benchmarking

Papers

Showing 24512460 of 5548 papers

TitleStatusHype
The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition0
Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized TasksCode2
Efficient Lifelong Model Evaluation in an Era of Rapid ProgressCode1
FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry BenchmarkingCode0
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
Editing Factual Knowledge and Explanatory Ability of Medical Large Language ModelsCode0
The Seeker's Dilemma: Realistic Formulation and Benchmarking for Hardware Trojan Detection0
Beacon, a lightweight deep reinforcement learning benchmark library for flow controlCode1
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies0
Benchmarking Data Science AgentsCode1
Show:102550
← PrevPage 246 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified