SOTAVerified

Benchmarking

Papers

Showing 38513900 of 5548 papers

TitleStatusHype
A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking0
Needle In A Haystack, Fast: Benchmarking Image Perceptual Similarity Metrics At ScaleCode1
NEWTS: A Corpus for News Topic-Focused Summarization0
Hide and Seek: on the Stealthiness of Attacks against Deep Learning Systems0
AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark SuiteCode0
bsnsing: A decision tree induction method based on recursive optimal boolean rule compositionCode0
Benchmarking Unsupervised Anomaly Detection and Localization0
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object DetectionCode1
A Framework for Generating Informative Benchmark InstancesCode0
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object InteractionsCode1
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset GenerationCode0
MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization TaskCode1
Failure Detection in Medical Image Classification: A Reality Check and Benchmarking TestbedCode1
Fast Vision Transformers with HiLo AttentionCode2
Benchmarking of Deep Learning models on 2D Laminar Flow behind Cylinder0
GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument RolesCode1
Large Language Models are Few-Shot Clinical Information Extractors0
Optimizing Performance of Federated Person Re-identification: Benchmarking and AnalysisCode1
Advanced Manufacturing Configuration by Sample-efficient Batch Bayesian Optimization0
RCC-GAN: Regularized Compound Conditional GAN for Large-Scale Tabular Data Synthesis0
Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining DatasetsCode0
Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking0
PyRelationAL: a python library for active learning research and developmentCode1
Graph-theoretical approach to robust 3D normal extraction of LiDAR dataCode0
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization0
Deep Learning-Based Synchronization for Uplink NB-IoTCode1
Self-Supervised Speech Representation Learning: A Review0
Deep Learning vs. Gradient Boosting: Benchmarking state-of-the-art machine learning algorithms for credit scoring0
Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine Learning AlgorithmsCode1
BARS: Towards Open Benchmarking for Recommender SystemsCode2
SNaC: Coherence Error Detection for Narrative SummarizationCode0
Entity Alignment For Knowledge Graphs: Progress, Challenges, and Empirical Studies0
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data0
Uncertainty estimation for Cross-dataset performance in Trajectory prediction0
The VoicePrivacy 2020 Challenge Evaluation PlanCode1
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking0
Federated Learning Under Intermittent Client Availability and Time-Varying Communication ConstraintsCode1
Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages0
Subspace Learning Machine (SLM): Methodology and Performance0
Individual Fairness Guarantees for Neural NetworksCode0
Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasksCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing EvaluationCode0
LayoutXLM vs. GNN: An Empirical Evaluation of Relation Extraction for Documents0
Assigning Species Information to Corresponding Genes by a Sequence Labeling FrameworkCode0
BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose EstimationCode1
GenISP: Neural ISP for Low-Light Machine CognitionCode1
VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution0
Benchmarking Econometric and Machine Learning Methodologies in NowcastingCode1
Design Target Achievement Index: A Differentiable Metric to Enhance Deep Generative Models in Multi-Objective Inverse Design0
Show:102550
← PrevPage 78 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified