SOTAVerified

Benchmarking

Papers

Showing 35013525 of 5548 papers

TitleStatusHype
Large Language Models as Automated Aligners for benchmarking Vision-Language Models0
An Empirical Investigation into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification0
Dialogue Quality and Emotion Annotations for Customer Support ConversationsCode0
Learning Dynamic Selection and Pricing of Out-of-Home DeliveriesCode0
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)0
Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSICode0
A projected nonlinear state-space model for forecasting time series signalsCode0
Benchmarking Toxic Molecule Classification using Graph Neural Networks and Few Shot Learning0
Benchmarking bias: Expanding clinical AI model card to incorporate bias reporting of social and non-social factors0
Deep State-Space Model for Predicting Cryptocurrency Price0
Segment Together: A Versatile Paradigm for Semi-Supervised Medical Image Segmentation0
Demonstrating Almost Linear Time Complexity of Bus Admittance Matrix-Based Distribution Network Power Flow: An Empirical Approach0
Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning0
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization0
Benchmarking Machine Learning Models for Quantum Error Correction0
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach0
Social Bias Probing: Fairness Benchmarking for Language Models0
Domain Aligned CLIP for Few-shot Classification0
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two BenchmarksCode0
Model Agnostic Explainable Selective Regression via Uncertainty Estimation0
Benchmarking Individual Tree Mapping with Sub-meter Imagery0
On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine TranslationCode0
The Disagreement Problem in Faithfulness Metrics0
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data0
Show:102550
← PrevPage 141 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified