SOTAVerified

Benchmarking

Papers

Showing 42014250 of 5548 papers

TitleStatusHype
NAS-HPO-Bench-II: A Benchmark Dataset on Joint Optimization of Convolutional Neural Network Architecture and Training HyperparametersCode1
GAN-based disentanglement learning for chest X-ray rib suppression0
MTG: A Benchmarking Suite for Multilingual Text Generation0
Benchmarking Biomedical Nested NER and Relation Extraction Models0
Multitask Prompted Training Enables Zero-Shot Task GeneralizationCode2
HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive MediaCode1
OG-SPACE: Optimized Stochastic Simulation of Spatial Models of Cancer EvolutionCode0
Benchmarking the Robustness of Spatial-Temporal Models Against CorruptionsCode1
What can 5.17 billion regression fits tell us about artificial models of the human visual system?0
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets0
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking PlatformCode1
NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse TasksCode1
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech RepresentationsCode1
EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale DatasetCode1
Beyond Accuracy: A Consolidated Tool for Visual Question Answering BenchmarkingCode0
The CaLiGraph Ontology as a Challenge for OWL ReasonersCode0
SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health RecordsCode0
Performance Evaluation of Deep Transfer Learning on Multiclass Identification of Common Weed Species in Cotton Production SystemsCode1
Chaos as an interpretable benchmark for forecasting and data-driven modellingCode1
Evolving Evolutionary Algorithms with PatternsCode0
Hybrid Random FeaturesCode0
Process Extraction from Text: Benchmarking the State of the Art and Paving the Way for Future ChallengesCode0
Explicitly Multi-Modal Benchmarks for Multi-Objective Optimization0
SERAB: A multi-lingual benchmark for speech emotion recognitionCode1
EntQA: Entity Linking as Question AnsweringCode1
Revisiting Self-Training for Few-Shot Learning of Language ModelCode1
Benchmarking Safety Monitors for Image Classifiers with Machine LearningCode0
A New Approach for Image Authentication Framework for Media Forensics Purpose0
Machine Learning with Knowledge Constraints for Process Optimization of Open-Air Perovskite Solar Cell ManufacturingCode1
Phonetic Word EmbeddingsCode1
A Two-Stage Neural-Filter Pareto Front Extractor and the need for Benchmarking0
NAS-Bench-Zero: A Large Scale Dataset for Understanding Zero-Shot Neural Architecture Search0
Benchmarking person re-identification approaches and training datasets for practical real-world implementations0
Deep Learning of Intrinsically Motivated Options in the Arcade Learning Environment0
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
Less is more: Selecting the right benchmarking set of data for time series classification0
Imitation Learning from Pixel Observations for Continuous Control0
Learning to Schedule Learning rate with Graph Neural Networks0
Best Practices in Pool-based Active Learning for Image Classification0
Stabilized Self-training with Negative Sampling on Few-labeled Graph Data0
Measuring CLEVRness: Black-box Testing of Visual Reasoning Models0
Modelling neuronal behaviour with time series regression: Recurrent Neural Networks on synthetic C. elegans data0
Benchmarking Algorithms from Machine Learning for Low-Budget Black-Box Optimization0
Benchmarking Sample Selection Strategies for Batch Reinforcement Learning0
FastEnsemble: Benchmarking and Accelerating Ensemble-based Uncertainty Estimation for Image-to-Image Translation0
A Systematic Evaluation of Domain Adaptation Algorithms On Time Series Data0
Decentralized Learning for Overparameterized Problems: A Multi-Agent Kernel Approximation Approach0
Benchmarking Machine Learning Robustness in Covid-19 Spike Sequence Classification0
MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated EvaluationCode1
"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken ConversationsCode1
Show:102550
← PrevPage 85 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified