SOTAVerified

Benchmarking

Papers

Showing 131140 of 5548 papers

TitleStatusHype
ACEGEN: Reinforcement learning of generative chemical agents for drug discoveryCode3
Advancing LLM Reasoning Generalists with Preference TreesCode3
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation DatasetCode3
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language ModelsCode3
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement LearningCode3
AER: Auto-Encoder with Regression for Time Series Anomaly DetectionCode3
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity AnalysisCode3
HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image AnalysisCode3
XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAMCode3
Show:102550
← PrevPage 14 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified