SOTAVerified

Benchmarking

Papers

Showing 18511900 of 5548 papers

TitleStatusHype
AnaloBench: Benchmarking the Identification of Abstract and Long-context AnalogiesCode0
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAMCode0
Inverse Contextual Bandits: Learning How Behavior Evolves over TimeCode0
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion RecognitionCode0
LMEMs for post-hoc analysis of HPO BenchmarkingCode0
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data ImbalanceCode0
IPC: A Benchmark Data Set for Learning with Graph-Structured DataCode0
Benchmark Generation Framework with Customizable Distortions for Image Classifier RobustnessCode0
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning modelsCode0
BONES: a Benchmark fOr Neural Estimation of Shapley valuesCode0
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language GenerationCode0
Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black BoxCode0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media TextsCode0
InstaIndoor and Multi-modal Deep Learning for Indoor Scene RecognitionCode0
Benchmark data and method for real-time people counting in cluttered scenes using depth sensorsCode0
Continuous Optimization Benchmarks by SimulationCode0
Advancing and Benchmarking Personalized Tool Invocation for LLMsCode0
BLESS: Benchmarking Large Language Models on Sentence SimplificationCode0
A Benchmarking Dataset with 2440 Organic Molecules for Volume Distribution at Steady StateCode0
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequencesCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian ContextCode0
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated SamplesCode0
Individual Fairness Guarantees for Neural NetworksCode0
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model ArchitectureCode0
BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image AnalysisCode0
BioSentVec: creating sentence embeddings for biomedical textsCode0
BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture SearchCode0
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy ReasoningCode0
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual IllusionCode0
BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture SearchCode0
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part ICode0
Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part IICode0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image ClassificationCode0
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair PredictionCode0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation ThreadsCode0
Improvements & Evaluations on the MLCommons CloudMask BenchmarkCode0
Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing RisksCode0
BioFors: A Large Biomedical Image Forensics DatasetCode0
Immunofluorescence Capillary Imaging Segmentation: Cases StudyCode0
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity LearningCode0
Impact of ImageNet Model Selection on Domain AdaptationCode0
Benchmarking Attribution Methods with Relative Feature ImportanceCode0
Bilingual BSARD: Extending Statutory Article Retrieval to DutchCode0
Beemo: Benchmark of Expert-edited Machine-generated OutputsCode0
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset GenerationCode0
AdamZ: An Enhanced Optimisation Method for Neural Network TrainingCode0
Bias Analysis and Mitigation in the Evaluation of Authorship VerificationCode0
Show:102550
← PrevPage 38 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified