SOTAVerified

Benchmarking

Papers

Showing 11261150 of 5548 papers

TitleStatusHype
GEMv2: Multilingual NLG Benchmarking in a Single Line of CodeCode1
OpenXAI: Towards a Transparent Evaluation of Model ExplanationsCode1
Benchmarking Constraint Inference in Inverse Reinforcement LearningCode1
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text InputsCode1
NAS-Bench-Graph: Benchmarking Graph Neural Architecture SearchCode1
SMPL: Simulated Industrial Manufacturing and Process Control Learning EnvironmentsCode1
Long Range Graph BenchmarkCode1
Taxonomy of Benchmarks in Graph Representation LearningCode1
Evaluating histopathology transfer learning with ChampKitCode1
ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation datasetCode1
Data-Driven Denoising of Stationary Accelerometer SignalsCode1
SwinCheX: Multi-label classification on chest X-ray images with transformersCode1
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional BenchmarkCode1
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored ClusteringCode1
Revisiting the "Video" in Video-Language UnderstandingCode1
Needle In A Haystack, Fast: Benchmarking Image Perceptual Similarity Metrics At ScaleCode1
Jojajovai: A Parallel Guarani-Spanish Corpus for MT BenchmarkingCode1
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog DomainCode1
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object DetectionCode1
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object InteractionsCode1
Failure Detection in Medical Image Classification: A Reality Check and Benchmarking TestbedCode1
MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization TaskCode1
GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument RolesCode1
Optimizing Performance of Federated Person Re-identification: Benchmarking and AnalysisCode1
PyRelationAL: a python library for active learning research and developmentCode1
Show:102550
← PrevPage 46 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified