SOTAVerified

Benchmarking

Papers

Showing 14311440 of 5548 papers

TitleStatusHype
Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video EnvironmentsCode1
A Critical Assessment of State-of-the-Art in Entity AlignmentCode1
BEND: Benchmarking DNA Language Models on biologically meaningful tasksCode1
EntQA: Entity Linking as Question AnsweringCode1
ClinicRealm: Re-evaluating Large Language Models with Conventional Machine Learning for Non-Generative Clinical Prediction TasksCode1
Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning InterpretabilityCode1
ERASE: Benchmarking Feature Selection Methods for Deep Recommender SystemsCode1
ESB: A Benchmark For Multi-Domain End-to-End Speech RecognitionCode1
BenchML: an extensible pipelining framework for benchmarking representations of materials and molecules at scaleCode1
AQuA: A Benchmarking Tool for Label Quality AssessmentCode1
Show:102550
← PrevPage 144 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified