SOTAVerified

Benchmarking

Papers

Showing 46514700 of 5548 papers

TitleStatusHype
Mamba-Based Ensemble learning for White Blood Cell ClassificationCode0
Better Late Than Never: Formulating and Benchmarking Recommendation EditingCode0
Better force fields start with better data -- A data set of cation dipeptide interactionsCode0
MANTRA: The Manifold Triangulations AssemblageCode0
BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep SupervisionCode0
debiaSAE: Benchmarking and Mitigating Vision-Language Model BiasCode0
VizSeq: A Visual Analysis Toolkit for Text Generation TasksCode0
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time SeriesCode0
Hi Guys or Hi Folks? Benchmarking Gender-Neutral Machine Translation with the GeNTE CorpusCode0
Margin-bounded Confidence Scores for Out-of-Distribution DetectionCode0
Benchmarks for Graph Embedding EvaluationCode0
High-Quality, ROS Compatible Video Encoding and Decoding for High-Definition DatasetsCode0
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation DatasetCode0
MARTA: a model for the automatic phonemic grouping of the parkinsonian speechCode0
High-Dynamic-Range Imaging for Cloud SegmentationCode0
Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific AbstractsCode0
The Freiburg Groceries DatasetCode0
AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptidesCode0
Z_2 Z_2 Equivariant Quantum Neural Networks: Benchmarking against Classical Neural NetworksCode0
Benchmark of Deep Learning Models on Large Healthcare MIMIC DatasetsCode0
Hi-EF: Benchmarking Emotion Forecasting in Human-interactionCode0
Heterogeneous Datasets for Federated Survival Analysis SimulationCode0
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot StudyCode0
Robust 2D/3D Vehicle Parsing in Arbitrary Camera Views for CVISCode0
Adaptive Visual Scene Understanding: Incremental Scene Graph GenerationCode0
HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability predictionCode0
HATE-ITA: New Baselines for Hate Speech Detection in ItalianCode0
Benchmarking YOLOv5 and YOLOv7 models with DeepSORT for droplet tracking applicationsCode0
Benchmarking White Blood Cell Classification Under Domain ShiftCode0
MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified BenchmarkCode0
Robust Benchmarking for Machine Learning of Clinical Entity ExtractionCode0
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based AttacksCode0
Benchmarking Vision-Language Contrastive Methods for Medical Representation LearningCode0
A Wild Bootstrap for Degenerate Kernel TestsCode0
Harnessing Orthogonality to Train Low-Rank Neural NetworksCode0
Aux-Drop: Handling Haphazard Inputs in Online Learning Using Auxiliary DropoutsCode0
Causally Testing Gender Bias in LLMs: A Case Study on Occupational BiasCode0
Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time SeriesCode0
Harmonization Benchmarking Tool for Neuroimaging DatasetsCode0
Adaptive Shrinkage Estimation For Personalized Deep Kernel Regression In Modeling Brain TrajectoriesCode0
Benchmarking Unsupervised Online IDS for Masquerade Attacks in CANCode0
The iToBoS dataset: skin region images extracted from 3D total body photographs for lesion detectionCode0
Benchmarking Ultra-High-Definition Image Reflection RemovalCode0
Understanding the Role of LLMs in Multimodal Evaluation BenchmarksCode0
VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction ModelsCode0
Measuring what Really Matters: Optimizing Neural Networks for TinyMLCode0
Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power TransformersCode0
Benchmarking TPU, GPU, and CPU Platforms for Deep LearningCode0
RoLargeSum: A Large Dialect-Aware Romanian News Dataset for Summary, Headline, and Keyword GenerationCode0
Hardware Aware Neural Network Architectures using FbNetCode0
Show:102550
← PrevPage 94 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified