SOTAVerified

Benchmarking

Papers

Showing 32713280 of 5548 papers

TitleStatusHype
Benchmarking the Robustness of UAV Tracking Against Common CorruptionsCode0
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety0
Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking0
FlowMind: Automatic Workflow Generation with LLMs0
Depression Detection on Social Media with Large Language Models0
Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks0
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot StudyCode0
SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different LanguagesCode0
Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object DetectorsCode0
Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing FlowsCode0
Show:102550
← PrevPage 328 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified