SOTAVerified

Benchmarking

Papers

Showing 26512675 of 5548 papers

TitleStatusHype
TAO-Amodal: A Benchmark for Tracking Any Object AmodallyCode1
Bio-Image Informatics Index BIII: A unique database of image analysis tools and workflows for and by the bioimaging community0
QDA^2: A principled approach to automatically annotating charge stability diagrams0
MA-BBOB: A Problem Generator for Black-Box Optimization Using Affine Combinations and Shifts0
Code Ownership in Open-Source AI Software SecurityCode0
FER-C: Benchmarking Out-of-Distribution Soft Calibration for Facial Expression Recognition0
How to Train Neural Field Representations: A Comprehensive Study and BenchmarkCode1
Enabling Accelerators for Graph Computing0
A Novel Hybrid Ordinal Learning Model with Health Care Application0
ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors0
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language ModelsCode1
SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration0
Efficiently Quantifying Individual Agent Importance in Cooperative MARL0
EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms with Real-captured Hybrid Dataset0
Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation0
Benchmarking Deep Learning Classifiers for SAR Automatic Target Recognition0
Meta-survey on outlier and anomaly detectionCode0
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images0
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level PlanningCode1
Implementing hosting capacity analysis in distribution networks: Practical considerations, advancements and future directions0
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection0
EQ-Bench: An Emotional Intelligence Benchmark for Large Language ModelsCode2
Benchmarking Distribution Shift in Tabular Data with TableShiftCode1
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into OneCode3
Show:102550
← PrevPage 107 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified