SOTAVerified

Benchmarking

Papers

Showing 18261850 of 5548 papers

TitleStatusHype
LMEMs for post-hoc analysis of HPO BenchmarkingCode0
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual IllusionCode0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated SamplesCode0
Benchmark Generation Framework with Customizable Distortions for Image Classifier RobustnessCode0
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy ReasoningCode0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian ContextCode0
BONES: a Benchmark fOr Neural Estimation of Shapley valuesCode0
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language GenerationCode0
Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black BoxCode0
Using Color To Identify Insider ThreatsCode0
Conditional diffusions for amortized neural posterior estimationCode0
Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternativesCode0
Improvements & Evaluations on the MLCommons CloudMask BenchmarkCode0
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model ArchitectureCode0
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair PredictionCode0
BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media TextsCode0
Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation ThreadsCode0
MST: Adaptive Multi-Scale Tokens Guided Interactive SegmentationCode0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image ClassificationCode0
Improve Machine Learning carbon footprint using Nvidia GPU and Mixed Precision training for classification models -- Part ICode0
Benchmark data and method for real-time people counting in cluttered scenes using depth sensorsCode0
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
ConQRet: Benchmarking Fine-Grained Evaluation of Retrieval Augmented Argumentation with LLM JudgesCode0
BLESS: Benchmarking Large Language Models on Sentence SimplificationCode0
Show:102550
← PrevPage 74 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified