SOTAVerified

Benchmarking

Papers

Showing 33513360 of 5548 papers

TitleStatusHype
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite0
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images0
Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance AnalysisCode0
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy0
Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation0
Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection0
OpenAGI: When LLM Meets Domain ExpertsCode4
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and SystemsCode1
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable ConfidenceCode0
On Evaluation of Bangla Word Analogies0
Show:102550
← PrevPage 336 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified