SOTAVerified

Benchmarking

Papers

Showing 27812790 of 5548 papers

TitleStatusHype
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language ModelsCode0
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness CalibrationCode0
Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact0
Benchmarking VLMs' Reasoning About Persuasive Atypical Images0
Benchmarking Large Language Model Uncertainty for Prompt OptimizationCode0
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data0
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study0
Text-To-Speech Synthesis In The Wild0
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering0
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal0
Show:102550
← PrevPage 279 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified