SOTAVerified

Benchmarking

Papers

Showing 14511500 of 5548 papers

TitleStatusHype
Ground Truth Evaluation of Neural Network Explanations with CLEVR-XAICode1
DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip TrainingCode1
AirSim Drone Racing LabCode1
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge GraphsCode1
Benchmarking TinyML Systems: Challenges and DirectionCode1
Benchmarking MRI Reconstruction Neural Networks on Large Public DatasetsCode1
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic ApplicationsCode1
Image Matching across Wide Baselines: From Paper to PracticeCode1
End-to-end Emotion-Cause Pair Extraction via Learning to LinkCode1
Single-cell entropy to quantify the cellular transcription from single-cell RNA-seq dataCode1
NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture SearchCode1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarkingCode1
An Exploration of Embodied Visual ExplorationCode1
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures TranslationCode1
Automatic Detection of Generated Text is Easiest when Humans are FooledCode1
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods ComparisonCode1
Torchreid: A Library for Deep Learning Person Re-Identification in PytorchCode1
Benchmarking Batch Deep Reinforcement Learning AlgorithmsCode1
Benchmarking machine learning models on multi-centre eICU critical care datasetCode1
An Evaluation Dataset for Intent Classification and Out-of-Scope PredictionCode1
miniSAM: A Flexible Factor Graph Non-linear Least Squares Optimization FrameworkCode1
Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural NetworksCode1
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation RecognitionCode1
PyRobot: An Open-source Robotics Framework for Research and BenchmarkingCode1
MMDetection: Open MMLab Detection Toolbox and BenchmarkCode1
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking DatasetsCode1
MNIST-C: A Robustness Benchmark for Computer VisionCode1
Meta-Surrogate Benchmarking for Hyperparameter OptimizationCode1
Benchmarking Regression Methods: A comparison with CGANCode1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test SuiteCode1
Benchmarking Natural Language Understanding Services for building Conversational AgentsCode1
NAS-Bench-101: Towards Reproducible Neural Architecture SearchCode1
The StarCraft Multi-Agent ChallengeCode1
The Liver Tumor Segmentation Benchmark (LiTS)Code1
LEAF: A Benchmark for Federated SettingsCode1
GuacaMol: Benchmarking Models for De Novo Molecular DesignCode1
IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization HeuristicsCode1
On Evaluation of Embodied Navigation AgentsCode1
Benchmarking Neural Network Robustness to Common Corruptions and Surface VariationsCode1
Texygen: A Benchmarking Platform for Text Generation ModelsCode1
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data MiningCode1
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning AlgorithmsCode1
featsel: A framework for benchmarking of feature selection algorithms and cost functionsCode1
Multitask learning and benchmarking with clinical time series dataCode1
MS MARCO: A Human Generated MAchine Reading COmprehension DatasetCode1
CIDEr: Consensus-based Image Description EvaluationCode1
Building a Scalable and Interpretable Bayesian Deep Learning Framework for Quality Control of Free Form SurfacesCode1
Visual Place Recognition for Large-Scale UAV Applications0
Training Transformers with Enforced Lipschitz Constants0
MUPAX: Multidimensional Problem Agnostic eXplainable AI0
Show:102550
← PrevPage 30 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified