SOTAVerified

Benchmarking

Papers

Showing 10111020 of 5548 papers

TitleStatusHype
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research0
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language ModelCode2
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate CampaignsCode1
Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection0
Molecular-driven Foundation Model for Oncologic PathologyCode4
Making Sense of Data in the Wild: Data Analysis Automation at Scale0
A Benchmarking Environment for Worker Flexibility in Flexible Job Shop Scheduling Problems0
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding0
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding0
Benchmarking Quantum Reinforcement LearningCode0
Show:102550
← PrevPage 102 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified