SOTAVerified

Benchmarking

Papers

Showing 31413150 of 5548 papers

TitleStatusHype
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models0
DispaRisk: Auditing Fairness Through Usable InformationCode0
EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge0
BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail PromotionsCode0
A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text0
SpeechVerse: A Large-scale Generalizable Audio Language Model0
Show:102550
← PrevPage 315 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified