SOTAVerified

Benchmarking

Papers

Showing 101110 of 5548 papers

TitleStatusHype
OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation ModelsCode3
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual GenerationCode3
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language ModelsCode3
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge GraphsCode3
Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity AnalysisCode3
General Geospatial Inference with a Population Dynamics Foundation ModelCode3
A Survey on Performance Metrics for Object-Detection AlgorithmsCode3
AndroidLab: Training and Systematic Benchmarking of Android Autonomous AgentsCode3
AER: Auto-Encoder with Regression for Time Series Anomaly DetectionCode3
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM AgentsCode3
Show:102550
← PrevPage 11 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified