SOTAVerified

Benchmarking

Papers

Showing 51765200 of 5548 papers

TitleStatusHype
CleanPatrick: A Benchmark for Image Data CleaningCode0
Detecting critical treatment effect bias in small subgroupsCode0
AI-generated Image Quality Assessment in Visual CommunicationCode0
SOSD: A Benchmark for Learned IndexesCode0
OpenML Benchmarking SuitesCode0
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual DesignCode0
Design and implementation of intelligent packet filtering in IoT microcontroller-based devicesCode0
OpenOOD: Benchmarking Generalized Out-of-Distribution DetectionCode0
Dermatological Diagnosis Explainability Benchmark for Convolutional Neural NetworksCode0
Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning AlgorithmsCode0
Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and BenchmarkCode0
Towards Efficient and Scalable Training of Differentially Private Deep LearningCode0
Benchmarking Label Noise in Instance Segmentation: Spatial Noise MattersCode0
Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding ApproachCode0
Delta-Influence: Unlearning Poisons via Influence FunctionsCode0
Benchmarking Keyword Spotting Efficiency on Neuromorphic HardwareCode0
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation DifficultyCode0
DefAn: Definitive Answer Dataset for LLMs Hallucination EvaluationCode0
Deep Reinforcement Learning for General Video Game AICode0
DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing UnderstandingCode0
Operation-Level Performance Benchmarking of Graph Neural Networks for Scientific ApplicationsCode0
DeepOBS: A Deep Learning Optimizer Benchmark SuiteCode0
VarBench: Robust Language Model Benchmarking Through Dynamic Variable PerturbationCode0
OptIForest: Optimal Isolation Forest for Anomaly DetectionCode0
Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE DatasetCode0
Show:102550
← PrevPage 208 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified