SOTAVerified

Benchmarking

Papers

Showing 27612770 of 5548 papers

TitleStatusHype
CONGRA: Benchmarking Automatic Conflict ResolutionCode0
An Evolutionary Algorithm For the Vehicle Routing Problem with Drones with Interceptions0
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology0
Present and Future Generalization of Synthetic Image DetectorsCode0
Efficient and Effective Model ExtractionCode0
CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data0
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time0
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive ProgressionsCode0
Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks0
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection0
Show:102550
← PrevPage 277 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified