SOTAVerified

Benchmarking

Papers

Showing 38513860 of 5548 papers

TitleStatusHype
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet ExtractionCode0
Benchmarking Machine Translation with Cultural AwarenessCode0
Multilingual Large Language Models Are Not (Yet) Code-Switchers0
Robust Model-Based Optimization for Challenging Fitness LandscapesCode0
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate0
How Fragile is Relation Extraction under Entity Replacements?Code0
A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting ApproachesCode0
Value-at-Risk-Based Portfolio Insurance: Performance Evaluation and Benchmarking Against CPPI in a Markov-Modulated Regime-Switching Market0
Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite0
TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks0
Show:102550
← PrevPage 386 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified