SOTAVerified

Benchmarking

Papers

Showing 401410 of 5548 papers

TitleStatusHype
Protein Structure Tokenization: Benchmarking and New RecipeCode1
CBench: Towards Better Evaluation of Question Answering Over Knowledge GraphsCode1
AQuA: A Benchmarking Tool for Label Quality AssessmentCode1
Prompt Tuned Embedding Classification for Multi-Label Industry Sector AllocationCode1
CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital TwinsCode1
CCTV-Gun: Benchmarking Handgun Detection in CCTV ImagesCode1
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMsCode1
CattleFace-RGBT: RGB-T Cattle Facial Landmark BenchmarkCode1
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised LearningCode1
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE DetectionCode1
Show:102550
← PrevPage 41 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified