SOTAVerified

Benchmarking

Papers

Showing 14411450 of 5548 papers

TitleStatusHype
FuzzWiz -- Fuzzing Framework for Efficient Hardware Coverage0
Benchmarking Large Language Models for Image Classification of Marine MammalsCode0
VoiceBench: Benchmarking LLM-Based Voice AssistantsCode3
Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies0
Benchmarking Multi-Scene Fire and Smoke DetectionCode1
ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical ImagesCode0
Safe Load Balancing in Software-Defined-Networking0
Polyp-E: Benchmarking the Robustness of Deep Segmentation Models via Polyp Editing0
Building Conformal Prediction Intervals with Approximate Message PassingCode0
Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions FollowingCode2
Show:102550
← PrevPage 145 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified