SOTAVerified

Benchmarking

Papers

Showing 41764200 of 5548 papers

TitleStatusHype
Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies0
Virus-MNIST: Machine Learning Baseline Calculations for Image Classification0
Procedural Generalization by Planning with Self-Supervised World Models0
Don’t be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue SystemCode1
Constructing a Psychometric Testbed for Fair Natural Language ProcessingCode0
Benchmarking Meta-embeddings: What Works and What Does NotCode1
Automatic Resolution of Domain Name DisputesCode0
Who’s on First?: Probing the Learning and Representation Capabilities of Language Models on Deterministic Closed DomainsCode0
OPF-Learn: An Open-Source Framework for Creating Representative AC Optimal Power Flow DatasetsCode1
AdaPool: Exponential Adaptive Pooling for Information-Retaining DownsamplingCode1
Livestock Monitoring with Transformer0
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image SegmentationCode0
Towards a Taxonomy of Graph Learning Datasets0
FTNet: Feature Transverse Network for Thermal Image Semantic SegmentationCode1
Quantum Boosting using Domain-Partitioning HypothesesCode0
Which Model to Trust: Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms for Continuous Control TasksCode0
Identifying and Benchmarking Natural Out-of-Context Prediction ProblemsCode0
Scientific Machine Learning Benchmarks0
Benchmarking of Lightweight Deep Learning Architectures for Skin Cancer Classification using ISIC 2017 Dataset0
Learning with Noisy Labels Revisited: A Study Using Real-World Human AnnotationsCode1
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems0
OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit SynthesisCode1
Text-Based Person Search with Limited DataCode1
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair PredictionCode0
An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C)0
Show:102550
← PrevPage 168 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified