SOTAVerified

Benchmarking

Papers

Showing 35713580 of 5548 papers

TitleStatusHype
On the Performance of Multimodal Language Models0
On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks0
On the project risk baseline: integrating aleatory uncertainty into project scheduling0
On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild0
On the reduction of Linear Parameter-Varying State-Space models0
On the relationship between Benchmarking, Standards and Certification in Robotics and AI0
On the Reliability and Validity of Detecting Approval of Political Actors in Tweets0
On the Robustness of Human-Object Interaction Detection against Distribution Shift0
On the role of benchmarking data sets and simulations in method comparison studies0
Optimizer Benchmarking Needs to Account for Hyperparameter Tuning0
Show:102550
← PrevPage 358 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified