SOTAVerified

Benchmarking

Papers

Showing 49814990 of 5548 papers

TitleStatusHype
Evaluating AI Recruitment Sourcing Tools by Human PreferenceCode0
EvalAI: Towards Better Evaluation Systems for AI AgentsCode0
Essential guidelines for computational method benchmarkingCode0
Benchmarking of LSTM NetworksCode0
NerveNet: Learning Structured Policy with Graph Neural NetworksCode0
How Fragile is Relation Extraction under Entity Replacements?Code0
Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress?Code0
Sequence-Aware Recommender SystemsCode0
WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification, detection, and segmentationCode0
Enterprise Benchmarks for Large Language Model EvaluationCode0
Show:102550
← PrevPage 499 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified