SOTAVerified

Benchmarking

Papers

Showing 881890 of 5548 papers

TitleStatusHype
Coarse-to-Fine Q-attention with Learned Path RankingCode1
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought MethodCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial DocumentsCode1
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization CorrelationsCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
Benchmarking Meaning Representations in Neural Semantic ParsingCode1
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical StudyCode1
Benchmarking Classical and Learning-Based Multibeam Point Cloud RegistrationCode1
Benchmarking Meta-embeddings: What Works and What Does NotCode1
Show:102550
← PrevPage 89 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified