SOTAVerified

Benchmarking

Papers

Showing 12511275 of 5548 papers

TitleStatusHype
Performance Evaluation of Deep Transfer Learning on Multiclass Identification of Common Weed Species in Cotton Production SystemsCode1
SERAB: A multi-lingual benchmark for speech emotion recognitionCode1
EntQA: Entity Linking as Question AnsweringCode1
Revisiting Self-Training for Few-Shot Learning of Language ModelCode1
Machine Learning with Knowledge Constraints for Process Optimization of Open-Air Perovskite Solar Cell ManufacturingCode1
Phonetic Word EmbeddingsCode1
MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated EvaluationCode1
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken ConversationsCode1
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language UnderstandingCode1
PASS: An ImageNet replacement for self-supervised pretraining without humansCode1
Disentangled Feature Representation for Few-shot Image ClassificationCode1
Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue SystemCode1
SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and BenchmarkingCode1
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge GraphsCode1
AI Accelerator Survey and TrendsCode1
Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation DatasetCode1
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle CommunicationCode1
Benchmarking the Spectrum of Agent CapabilitiesCode1
RobustART: Benchmarking Robustness on Architecture Design and Training TechniquesCode1
Scikit-dimension: a Python package for intrinsic dimension estimationCode1
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through LexicaCode1
Biomedical Data-to-Text Generation via Fine-Tuning TransformersCode1
ReMeDi: Resources for Multi-domain, Multi-service, Medical DialoguesCode1
Tune It or Don't Use It: Benchmarking Data-Efficient Image ClassificationCode1
Show:102550
← PrevPage 51 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified