SOTAVerified

Benchmarking

Papers

Showing 35313540 of 5548 papers

TitleStatusHype
AnyTOD: A Programmable Task-Oriented Dialog System0
Benchmarking Spatial Relationships in Text-to-Image GenerationCode1
Trial-Based Dominance Enables Non-Parametric Tests to Compare both the Speed and Accuracy of Stochastic Optimizers0
GiCCS: A German in-Context Conversational Similarity Benchmark0
Biomedical image analysis competitions: The state of current participation practice0
Automatic vehicle trajectory data reconstruction at scale0
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
Benchmarking Large Language Models for Automated Verilog RTL Code GenerationCode1
Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction0
PyPop7: A Pure-Python Library for Population-Based Black-Box OptimizationCode2
Show:102550
← PrevPage 354 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified