SOTAVerified

Benchmarking

Papers

Showing 32313240 of 5548 papers

TitleStatusHype
Multicalibration for Confidence Scoring in LLMs0
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition DynamicsCode0
SDFR: Synthetic Data for Face Recognition Competition0
Enhancing Video Summarization with Context AwarenessCode0
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System0
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)Code0
Dynamic Risk Assessment Methodology with an LDM-based System for Parking Scenarios0
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text GenerationCode0
Benchmarking ChatGPT on Algorithmic ReasoningCode0
Schroedinger's Threshold: When the AUC doesn't predict AccuracyCode0
Show:102550
← PrevPage 324 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified