SOTAVerified

Benchmarking

Papers

Showing 39514000 of 5548 papers

TitleStatusHype
Dynatask: A Framework for Creating Dynamic AI Benchmark TasksCode1
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery0
A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality0
Efficient, Uncertainty-based Moderation of Neural Networks Text ClassifiersCode0
Coarse-to-Fine Q-attention with Learned Path RankingCode1
pmuBAGE: The Benchmarking Assortment of Generated PMU Data for Power System Events -- Part I: Overview and ResultsCode0
Intelligence at the Extreme Edge: A Survey on Reformable TinyML0
Multi-Class Road User Detection With 3+1D Radar in the View-of-Delft DatasetCode2
Unitail: Detecting, Reading, and Matching in Retail Scene0
Assessing the risk of re-identification arising from an attack on anonymised data0
Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?0
To Find Waldo You Need Contextual Cues: Debiasing Who's WaldoCode0
Earnings-22: A Practical Benchmark for Accents in the WildCode1
Parameter-efficient Model Adaptation for Vision TransformersCode1
Treatment Learning Causal Transformer for Noisy Image Classification0
A Unified Study of Machine Learning Explanation Evaluation Metrics0
Benchmarking Deep AUROC Optimization: Loss Functions and Algorithmic Choices0
Benchmarking Algorithms for Automatic License Plate Recognition0
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative ComprehensionCode1
Visual Abductive ReasoningCode1
LAMBDA: Covering the Solution Set of Black-Box Inequality by Search Space Quantization0
Benchmarking Visual Localization for Autonomous NavigationCode1
minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language ModelsCode1
An Optical Control Environment for Benchmarking Reinforcement Learning AlgorithmsCode0
Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition0
A Perspective on Neural Capacity Estimation: Viability and Reliability0
Sionna: An Open-Source Library for Next-Generation Physical Layer ResearchCode1
Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices0
Policy Gradients using Variational Quantum Circuits0
A Statistical Framework to Investigate the Optimality of Signal-Reconstruction Methods0
Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes ProsthesisCode0
SHEL5K: An Extended Dataset and Benchmarking for Safety Helmet DetectionCode1
On the Usefulness of the Fit-on-the-Test View on Evaluating Calibration of ClassifiersCode0
Fiber Bundle Morphisms as a Framework for Modeling Many-to-Many Maps0
ADATIME: A Benchmarking Suite for Domain Adaptation on Time Series DataCode2
From 2D to 3D: Re-thinking Benchmarking of Monocular Depth Prediction0
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge GraphsCode3
ALDI++: Automatic and parameter-less discord and outlier detection for building energy load profilesCode0
DFTR: Depth-supervised Fusion Transformer for Salient Object Detection0
ROOD-MRI: Benchmarking the robustness of deep learning segmentation models to out-of-distribution and corrupted data in MRICode1
IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages0
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach0
Metastatic Cancer Outcome Prediction with Injective Multiple Instance Pooling0
Mapping global dynamics of benchmark creation and saturation in artificial intelligence0
Benchmarking Graphormer on Large-Scale Molecular Modeling DatasetsCode4
ClearPose: Large-scale Transparent Object Dataset and BenchmarkCode1
Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain GapCode1
ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial PatchesCode1
Score-Based Generative Models for Molecule Generation0
SurvSet: An open-source time-to-event dataset repositoryCode1
Show:102550
← PrevPage 80 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified