SOTAVerified

Benchmarking

Papers

Showing 26512660 of 5548 papers

TitleStatusHype
UCFE: A User-Centric Financial Expertise Benchmark for Large Language ModelsCode0
Trust but Verify: Programmatic VLM Evaluation in the Wild0
Benchmarking Defeasible Reasoning with Large Language Models -- Initial Experiments and Future Directions0
Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation0
Understanding the Role of LLMs in Multimodal Evaluation BenchmarksCode0
Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs0
AERO: Softmax-Only LLMs for Efficient Private Inference0
Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum ChemistryCode0
Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos0
FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting0
Show:102550
← PrevPage 266 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified