SOTAVerified

Benchmarking

Papers

Showing 48014850 of 5548 papers

TitleStatusHype
FR-MRInet: A Deep Convolutional Encoder-Decoder for Brain Tumor Segmentation with Relu-RGB and Sliding-windowCode0
AdamZ: An Enhanced Optimisation Method for Neural Network TrainingCode0
MLPerf Training BenchmarkCode0
Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm ConfigurationCode0
Benchmarking Spurious Bias in Few-Shot Image ClassifiersCode0
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question AnsweringCode0
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN ParametersCode0
MMCoQA: Conversational Question Answering over Text, Tables, and ImagesCode0
Forecasting time series with constraintsCode0
Action-conditioned Benchmarking of Robotic Video Prediction Models: a Comparative StudyCode0
Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and ChallengesCode0
Forecasting Future International Events: A Reliable Dataset for Text-Based Event ModelingCode0
Benchmarking Single Image Dehazing and BeyondCode0
VRKitchen2.0-IndoorKit: A Tutorial for Augmented Indoor Scene Building in OmniverseCode0
One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial SupportCode0
Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering ApproachCode0
fMRI-S4: learning short- and long-range dynamic fMRI dependencies using 1D Convolutions and State Space ModelsCode0
Scaling and Benchmarking Self-Supervised Visual Representation LearningCode0
Scaling Compute Is Not All You Need for Adversarial RobustnessCode0
Scaling Up Resonate-and-Fire Networks for Fast Deep LearningCode0
Universal Music Representations? Evaluating Foundation Models on World Music CorporaCode0
MM-Soc: Benchmarking Multimodal Large Language Models in Social Media PlatformsCode0
Fluorescence Reference Target Quantitative Analysis LibraryCode0
FLsim: A Modular and Library-Agnostic Simulation Framework for Federated LearningCode0
FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry BenchmarkingCode0
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation ModelsCode0
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language ModelsCode0
FlexMol: A Flexible Toolkit for Benchmarking Molecular Relational LearningCode0
ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory MachinesCode0
Wildfire spread forecasting with Deep LearningCode0
Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphsCode0
FIVR: Fine-grained Incident Video RetrievalCode0
SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health RecordsCode0
Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty QuantificationCode0
Benchmarking Self-Supervised Learning Methods for Accelerated MRI ReconstructionCode0
Benchmarking Self-Supervised Contrastive Learning Methods for Image-Based Plant PhenotypingCode0
A Manually Annotated Image-Caption Dataset for Detecting Children in the WildCode0
Schroedinger's Threshold: When the AUC doesn't predict AccuracyCode0
Benchmarking Scalable Methods for Streaming Cross Document Entity CoreferenceCode0
Benchmarking Scalable Epistemic Uncertainty Quantification in Organ SegmentationCode0
Automated deep learning segmentation of high-resolution 7 T postmortem MRI for quantitative analysis of structure-pathology correlations in neurodegenerative diseasesCode0
Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of HealthCode0
There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error CorrectionCode0
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic GradingCode0
SciFaultyQA: Benchmarking LLMs on Faulty Science Question Detection with a GAN-Inspired Approach to Synthetic Dataset GenerationCode0
Benchmarking Safety Monitors for Image Classifiers with Machine LearningCode0
First-frame Supervised Video Polyp Segmentation via Propagative and Semantic Dual-teacher NetworkCode0
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation ModelsCode0
MOLE: Digging Tunnels Through Multimodal Multi-Objective LandscapesCode0
A Linear Constrained Optimization Benchmark For Probabilistic Search Algorithms: The Rotated Klee-Minty ProblemCode0
Show:102550
← PrevPage 97 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified