SOTAVerified

Benchmarking

Papers

Showing 18011850 of 5548 papers

TitleStatusHype
Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The BenchmarkCode0
Comparative Analysis: Violence Recognition from Videos using Transfer LearningCode0
Inverse Contextual Bandits: Learning How Behavior Evolves over TimeCode0
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data ImbalanceCode0
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot InteractionsCode0
Bugs in the Data: How ImageNet Misrepresents BiodiversityCode0
CleanPatrick: A Benchmark for Image Data CleaningCode0
BubGAN: Bubble Generative Adversarial Networks for Synthesizing Realistic Bubbly Flow ImagesCode0
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAMCode0
bsnsing: A decision tree induction method based on recursive optimal boolean rule compositionCode0
BSBench: will your LLM find the largest prime number?Code0
Adaptive Shrinkage Estimation For Personalized Deep Kernel Regression In Modeling Brain TrajectoriesCode0
MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and LearningCode0
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion RecognitionCode0
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language ModelsCode0
Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample DatasetsCode0
Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model ValidationCode0
Adaptive Power System Emergency Control using Deep Reinforcement LearningCode0
InstaIndoor and Multi-modal Deep Learning for Indoor Scene RecognitionCode0
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory PerceptionCode0
Benchmarking Abstract and Reasoning Abilities Through A Theoretical PerspectiveCode0
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequencesCode0
Benchmarking 6DOF Outdoor Visual Localization in Changing ConditionsCode0
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model DiscoveryCode0
AnaloBench: Benchmarking the Identification of Abstract and Long-context AnalogiesCode0
LMEMs for post-hoc analysis of HPO BenchmarkingCode0
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual IllusionCode0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated SamplesCode0
Benchmark Generation Framework with Customizable Distortions for Image Classifier RobustnessCode0
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy ReasoningCode0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian ContextCode0
BONES: a Benchmark fOr Neural Estimation of Shapley valuesCode0
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language GenerationCode0
Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black BoxCode0
Using Color To Identify Insider ThreatsCode0
Conditional diffusions for amortized neural posterior estimationCode0
Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternativesCode0
Improvements & Evaluations on the MLCommons CloudMask BenchmarkCode0
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model ArchitectureCode0
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair PredictionCode0
BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media TextsCode0
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation ThreadsCode0
MST: Adaptive Multi-Scale Tokens Guided Interactive SegmentationCode0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image ClassificationCode0
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part ICode0
Benchmark data and method for real-time people counting in cluttered scenes using depth sensorsCode0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
ConQRet: Benchmarking Fine-Grained Evaluation of Retrieval Augmented Argumentation with LLM JudgesCode0
BLESS: Benchmarking Large Language Models on Sentence SimplificationCode0
Show:102550
← PrevPage 37 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified