SOTAVerified

Benchmarking

Papers

Showing 42264250 of 5548 papers

TitleStatusHype
Revisiting Self-Training for Few-Shot Learning of Language ModelCode1
Benchmarking Safety Monitors for Image Classifiers with Machine LearningCode0
A New Approach for Image Authentication Framework for Media Forensics Purpose0
Machine Learning with Knowledge Constraints for Process Optimization of Open-Air Perovskite Solar Cell ManufacturingCode1
Phonetic Word EmbeddingsCode1
A Two-Stage Neural-Filter Pareto Front Extractor and the need for Benchmarking0
NAS-Bench-Zero: A Large Scale Dataset for Understanding Zero-Shot Neural Architecture Search0
Benchmarking person re-identification approaches and training datasets for practical real-world implementations0
Deep Learning of Intrinsically Motivated Options in the Arcade Learning Environment0
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
Less is more: Selecting the right benchmarking set of data for time series classification0
Imitation Learning from Pixel Observations for Continuous Control0
Learning to Schedule Learning rate with Graph Neural Networks0
Best Practices in Pool-based Active Learning for Image Classification0
Stabilized Self-training with Negative Sampling on Few-labeled Graph Data0
Measuring CLEVRness: Black-box Testing of Visual Reasoning Models0
Modelling neuronal behaviour with time series regression: Recurrent Neural Networks on synthetic C. elegans data0
Benchmarking Algorithms from Machine Learning for Low-Budget Black-Box Optimization0
Benchmarking Sample Selection Strategies for Batch Reinforcement Learning0
FastEnsemble: Benchmarking and Accelerating Ensemble-based Uncertainty Estimation for Image-to-Image Translation0
A Systematic Evaluation of Domain Adaptation Algorithms On Time Series Data0
Decentralized Learning for Overparameterized Problems: A Multi-Agent Kernel Approximation Approach0
Benchmarking Machine Learning Robustness in Covid-19 Spike Sequence Classification0
MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated EvaluationCode1
"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken ConversationsCode1
Show:102550
← PrevPage 170 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified