SOTAVerified

Benchmarking

Papers

Showing 34713480 of 5548 papers

TitleStatusHype
Benchmarking LLM Guardrails in Handling Multilingual Toxicity0
Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V30
Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins0
Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios0
Making Sense of Data in the Wild: Data Analysis Automation at Scale0
OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User0
A Deep Q-Learning Method for Downlink Power Allocation in Multi-Cell Networks0
Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages0
Benchmarking LiDAR Sensors for Development and Evaluation of Automotive Perception0
Towards Benchmarking and Evaluating Deepfake Detection0
Show:102550
← PrevPage 348 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified