SOTAVerified

Benchmarking

Papers

Showing 11011150 of 5548 papers

TitleStatusHype
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive CareCode1
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App ScreenshotsCode1
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and ToolkitCode1
nnOOD: A Framework for Benchmarking Self-supervised Anomaly Localisation MethodsCode1
Structural Bias for Aspect Sentiment Triplet ExtractionCode1
Benchmarking Compositionality with Formal LanguagesCode1
A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation ModelsCode1
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methodsCode1
Accelerated and interpretable oblique random survival forestsCode1
Tracking Every Thing in the WildCode1
ArtFID: Quantitative Evaluation of Neural Style TransferCode1
Physiology-based simulation of the retinal vasculature enables annotation-free segmentation of OCT angiographsCode1
ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and LocalizationCode1
Detecting beats in the photoplethysmogram: benchmarking open-source algorithmsCode1
Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experimentsCode1
Benchmarking Omni-Vision Representation through the Lens of Visual RealmsCode1
TASKOGRAPHY: Evaluating robot task planning over large 3D scene graphsCode1
Graph Generative Model for Benchmarking Graph Neural NetworksCode1
Can Language Models Make Fun? A Case Study in Chinese Comical CrosstalkCode1
Less Is More: A Comparison of Active Learning Strategies for 3D Medical Image SegmentationCode1
DFGC 2022: The Second DeepFake Game CompetitionCode1
Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital PathologyCode1
Beyond neural scaling laws: beating power law scaling via data pruningCode1
Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video FramesCode1
The DEBS 2022 Grand Challenge: Detecting Trading Trends in Financial Tick DataCode1
GEMv2: Multilingual NLG Benchmarking in a Single Line of CodeCode1
OpenXAI: Towards a Transparent Evaluation of Model ExplanationsCode1
Benchmarking Constraint Inference in Inverse Reinforcement LearningCode1
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text InputsCode1
NAS-Bench-Graph: Benchmarking Graph Neural Architecture SearchCode1
SMPL: Simulated Industrial Manufacturing and Process Control Learning EnvironmentsCode1
Long Range Graph BenchmarkCode1
Taxonomy of Benchmarks in Graph Representation LearningCode1
Evaluating histopathology transfer learning with ChampKitCode1
ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation datasetCode1
Data-Driven Denoising of Stationary Accelerometer SignalsCode1
SwinCheX: Multi-label classification on chest X-ray images with transformersCode1
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional BenchmarkCode1
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored ClusteringCode1
Revisiting the "Video" in Video-Language UnderstandingCode1
Needle In A Haystack, Fast: Benchmarking Image Perceptual Similarity Metrics At ScaleCode1
Jojajovai: A Parallel Guarani-Spanish Corpus for MT BenchmarkingCode1
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog DomainCode1
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object DetectionCode1
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object InteractionsCode1
Failure Detection in Medical Image Classification: A Reality Check and Benchmarking TestbedCode1
MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization TaskCode1
GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument RolesCode1
Optimizing Performance of Federated Person Re-identification: Benchmarking and AnalysisCode1
PyRelationAL: a python library for active learning research and developmentCode1
Show:102550
← PrevPage 23 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified