SOTAVerified

Benchmarking

Papers

Showing 10261050 of 5548 papers

TitleStatusHype
ScandEval: A Benchmark for Scandinavian Natural Language ProcessingCode1
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetryCode1
What Makes for Effective Few-shot Point Cloud Classification?Code1
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise ModelsCode1
ImageNet-E: Benchmarking Neural Network Robustness via Attribute EditingCode1
MGTBench: Benchmarking Machine-Generated Text DetectionCode1
MEGA: Multilingual Evaluation of Generative AICode1
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4Code1
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering Regularized Self-TrainingCode1
CCTV-Gun: Benchmarking Handgun Detection in CCTV ImagesCode1
COVID-19 event extraction from Twitter via extractive question answering with continuous promptsCode1
TransNetR: Transformer-based Residual Network for Polyp Segmentation with Multi-Center Out-of-Distribution TestingCode1
What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet ClassifiersCode1
Revisiting the Gumbel-Softmax in MADDPGCode1
A framework for benchmarking class-out-of-distribution detection and its application to ImageNetCode1
A SWAT-based Reinforcement Learning Framework for Crop ManagementCode1
SurgT challenge: Benchmark of Soft-Tissue Trackers for Robotic SurgeryCode1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasksCode1
Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfilerCode1
Rethinking low-cost microscopy workflow: Image enhancement using deep based Extended Depth of Field methodsCode1
Benchmarking Large Language Models for News SummarizationCode1
Benchmarking Robustness to Adversarial Image ObfuscationsCode1
TemporAI: Facilitating Machine Learning Innovation in Time Domain Tasks for MedicineCode1
BiBench: Benchmarking and Analyzing Network BinarizationCode1
Young Labeled Faces in the Wild (YLFW): A Dataset for Children Faces RecognitionCode1
Show:102550
← PrevPage 42 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified