SOTAVerified

Benchmarking

Papers

Showing 19512000 of 5548 papers

TitleStatusHype
Bayesian Neural Networks with Soft EvidenceCode0
A Modular Workflow for Performance Benchmarking of Neuronal Network SimulationsCode0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian ContextCode0
CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future DirectionsCode0
Partial Rankings of OptimizersCode0
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model ArchitectureCode0
Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?Code0
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy ReasoningCode0
Improvements & Evaluations on the MLCommons CloudMask BenchmarkCode0
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated SamplesCode0
Individual Fairness Guarantees for Neural NetworksCode0
Beyond Document Page Classification: Design, Datasets, and ChallengesCode0
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep LearningCode0
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation ThreadsCode0
Benchmarking Feature-based Algorithm Selection Systems for Black-box Numerical OptimizationCode0
Performance Evaluation of Real-Time Object Detection for Electric ScootersCode0
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part ICode0
BASED: Benchmarking, Analysis, and Structural Estimation of DeblurringCode0
Neural Style Transfer Improves 3D Cardiovascular MR Image Segmentation on Inconsistent DataCode0
Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal FrameworkCode0
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive SegmentationCode0
Beyond Accuracy: A Consolidated Tool for Visual Question Answering BenchmarkingCode0
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair PredictionCode0
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part IICode0
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual IllusionCode0
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning modelsCode0
Immunofluorescence Capillary Imaging Segmentation: Cases StudyCode0
Impact of ImageNet Model Selection on Domain AdaptationCode0
Better Late Than Never: Formulating and Benchmarking Recommendation EditingCode0
Better force fields start with better data -- A data set of cation dipeptide interactionsCode0
BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media PostsCode0
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity LearningCode0
BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep SupervisionCode0
Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learningCode0
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual IllusionsCode0
Action-conditioned Benchmarking of Robotic Video Prediction Models: a Comparative StudyCode0
Illuminating the Diversity-Fitness Trade-Off in Black-Box OptimizationCode0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
A Meta-Analysis of the Anomaly Detection ProblemCode0
Benchmarks for Graph Embedding EvaluationCode0
BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis DatasetCode0
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF InfeasibleCode0
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing AtariCode0
IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical SystemsCode0
Benchmark of Deep Learning Models on Large Healthcare MIMIC DatasetsCode0
AlphaZip: Neural Network-Enhanced Lossless Text CompressionCode0
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot StudyCode0
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation ModelsCode0
IdeaBench: Benchmarking Large Language Models for Research Idea GenerationCode0
Identifying and Benchmarking Natural Out-of-Context Prediction ProblemsCode0
Show:102550
← PrevPage 40 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified