SOTAVerified

Benchmarking

Papers

Showing 401410 of 5548 papers

TitleStatusHype
Protein Structure Tokenization: Benchmarking and New RecipeCode1
Benchmarking Cognitive Biases in Large Language Models as EvaluatorsCode1
Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation DatasetCode1
Prompt Tuned Embedding Classification for Multi-Label Industry Sector AllocationCode1
Benchmarking Counterfactual Image GenerationCode1
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?Code1
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMsCode1
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical StudyCode1
CBench: Towards Better Evaluation of Question Answering Over Knowledge GraphsCode1
Benchmarking Classical and Learning-Based Multibeam Point Cloud RegistrationCode1
Show:102550
← PrevPage 41 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified