SOTAVerified

Benchmarking

Papers

Showing 91100 of 5548 papers

TitleStatusHype
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation DatasetCode3
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
Benchmarking LLMs via Uncertainty QuantificationCode3
Benchmarking Multimodal AutoML for Tabular Data with Text FieldsCode3
AndroidLab: Training and Systematic Benchmarking of Android Autonomous AgentsCode3
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement LearningCode3
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge GraphsCode3
mlpack 3: a fast, flexible machine learning libraryCode3
DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning BenchmarksCode3
A Survey on Performance Metrics for Object-Detection AlgorithmsCode3
Show:102550
← PrevPage 10 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified