SOTAVerified

Benchmarking

Papers

Showing 48264850 of 5548 papers

TitleStatusHype
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation ModelsCode0
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language ModelsCode0
FlexMol: A Flexible Toolkit for Benchmarking Molecular Relational LearningCode0
ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory MachinesCode0
Wildfire spread forecasting with Deep LearningCode0
Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphsCode0
FIVR: Fine-grained Incident Video RetrievalCode0
SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health RecordsCode0
Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty QuantificationCode0
Benchmarking Self-Supervised Learning Methods for Accelerated MRI ReconstructionCode0
Benchmarking Self-Supervised Contrastive Learning Methods for Image-Based Plant PhenotypingCode0
A Manually Annotated Image-Caption Dataset for Detecting Children in the WildCode0
Schroedinger's Threshold: When the AUC doesn't predict AccuracyCode0
Benchmarking Scalable Methods for Streaming Cross Document Entity CoreferenceCode0
Benchmarking Scalable Epistemic Uncertainty Quantification in Organ SegmentationCode0
Automated deep learning segmentation of high-resolution 7 T postmortem MRI for quantitative analysis of structure-pathology correlations in neurodegenerative diseasesCode0
Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of HealthCode0
There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error CorrectionCode0
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic GradingCode0
SciFaultyQA: Benchmarking LLMs on Faulty Science Question Detection with a GAN-Inspired Approach to Synthetic Dataset GenerationCode0
Benchmarking Safety Monitors for Image Classifiers with Machine LearningCode0
First-frame Supervised Video Polyp Segmentation via Propagative and Semantic Dual-teacher NetworkCode0
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation ModelsCode0
MOLE: Digging Tunnels Through Multimodal Multi-Objective LandscapesCode0
A Linear Constrained Optimization Benchmark For Probabilistic Search Algorithms: The Rotated Klee-Minty ProblemCode0
Show:102550
← PrevPage 194 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified