SOTAVerified

Benchmarking

Papers

Showing 18011825 of 5548 papers

TitleStatusHype
CleanPatrick: A Benchmark for Image Data CleaningCode0
Comparative Analysis: Violence Recognition from Videos using Transfer LearningCode0
BubGAN: Bubble Generative Adversarial Networks for Synthesizing Realistic Bubbly Flow ImagesCode0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
bsnsing: A decision tree induction method based on recursive optimal boolean rule compositionCode0
BSBench: will your LLM find the largest prime number?Code0
Adaptive Shrinkage Estimation For Personalized Deep Kernel Regression In Modeling Brain TrajectoriesCode0
InstaIndoor and Multi-modal Deep Learning for Indoor Scene RecognitionCode0
Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample DatasetsCode0
Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model ValidationCode0
Adaptive Power System Emergency Control using Deep Reinforcement LearningCode0
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory PerceptionCode0
Benchmarking Abstract and Reasoning Abilities Through A Theoretical PerspectiveCode0
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual IllusionCode0
Benchmarking 6DOF Outdoor Visual Localization in Changing ConditionsCode0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian ContextCode0
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model DiscoveryCode0
AnaloBench: Benchmarking the Identification of Abstract and Long-context AnalogiesCode0
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model ArchitectureCode0
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy ReasoningCode0
MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and LearningCode0
LMEMs for post-hoc analysis of HPO BenchmarkingCode0
Improvements & Evaluations on the MLCommons CloudMask BenchmarkCode0
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated SamplesCode0
Individual Fairness Guarantees for Neural NetworksCode0
Show:102550
← PrevPage 73 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified