SOTAVerified

Benchmarking

Papers

Showing 45914600 of 5548 papers

TitleStatusHype
Identifying and Benchmarking Natural Out-of-Context Prediction ProblemsCode0
Analysis | OPEN | Published: 17 June 2019 Multitask learning and benchmarking with clinical time series dataCode0
IdeaBench: Benchmarking Large Language Models for Research Idea GenerationCode0
IceBench: A Benchmark for Deep Learning based Sea Ice Type ClassificationCode0
BioFors: A Large Biomedical Image Forensics DatasetCode0
Benchmarking Attribution Methods with Relative Feature ImportanceCode0
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMsCode0
Hyperspectral Image Dataset for Benchmarking on Salient Object DetectionCode0
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement LearningCode0
Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face RecognitionCode0
Show:102550
← PrevPage 460 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified