SOTAVerified

Benchmarking

Papers

Showing 25712580 of 5548 papers

TitleStatusHype
Dataset and Benchmark: Novel Sensors for Autonomous Vehicle PerceptionCode1
SciMMIR: Benchmarking Scientific Multi-modal Information RetrievalCode1
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding0
Benchmarking the Fairness of Image Upsampling MethodsCode0
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM AgentsCode3
LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection MethodCode0
Benchmarking LLMs via Uncertainty QuantificationCode3
What the Weight?! A Unified Framework for Zero-Shot Knowledge CompositionCode0
Deep Neural Network Benchmarks for Selective ClassificationCode0
Subgroup analysis methods for time-to-event outcomes in heterogeneous randomized controlled trialsCode0
Show:102550
← PrevPage 258 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified