SOTAVerified

Benchmarking

Papers

Showing 33513400 of 5548 papers

TitleStatusHype
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite0
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images0
Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance AnalysisCode0
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy0
Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation0
Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection0
OpenAGI: When LLM Meets Domain ExpertsCode4
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and SystemsCode1
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable ConfidenceCode0
On Evaluation of Bangla Word Analogies0
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit0
RoboPianist: Dexterous Piano Playing with Deep Reinforcement LearningCode2
ForamViT-GAN: Exploring New Paradigms in Deep Learning for Micropaleontological Image Analysis0
Benchmarking the Robustness of Quantized Models0
SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented DataCode0
Probing Conceptual Understanding of Large Visual-Language ModelsCode0
Interpretable statistical representations of neural population dynamics and geometryCode1
Benchmarking Robustness to Text-Guided CorruptionsCode0
DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images0
MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy CodingCode1
LogoNet: a fine-grained network for instance-level logo sketch retrievalCode0
IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical SystemsCode0
The Saudi Privacy Policy DatasetCode0
OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI0
SLPerf: a Unified Framework for Benchmarking Split LearningCode1
Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam DetectionCode1
ScandEval: A Benchmark for Scandinavian Natural Language ProcessingCode1
Vision-Language Models for Vision Tasks: A SurveyCode4
A Latent Fingerprint in the Wild Database0
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetryCode1
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise ModelsCode1
What Makes for Effective Few-shot Point Cloud Classification?Code1
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersCode0
Benchmarking FedAvg and FedCurv for Image Classification Tasks0
Why is the winner the best?0
Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks0
ImageNet-E: Benchmarking Neural Network Robustness via Attribute EditingCode1
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-XCode5
From Private to Public: Benchmarking GANs in the Context of Private Time Series Classification0
Open the box of digital neuromorphic processor: Towards effective algorithm-hardware co-design0
Hyperparameter optimization, quantum-assisted model performance prediction, and benchmarking of AI-based High Energy Physics workloads using HPC0
GeoNet: Benchmarking Unsupervised Adaptation across Geographies0
Exploring Continual Learning of Diffusion Models0
MGTBench: Benchmarking Machine-Generated Text DetectionCode1
Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learningCode0
Benchmarking the Impact of Noise on Deep Learning-based Classification of Atrial Fibrillation in 12-Lead ECG0
Vulnerability of Face Morphing Attacks: A Case Study on Lookalike and Identical Twins0
Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance0
MEGA: Multilingual Evaluation of Generative AICode1
Automated deep learning segmentation of high-resolution 7 T postmortem MRI for quantitative analysis of structure-pathology correlations in neurodegenerative diseasesCode0
Show:102550
← PrevPage 68 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified