SOTAVerified

Benchmarking

Papers

Showing 901950 of 5548 papers

TitleStatusHype
An Image Dataset for Benchmarking Recommender Systems with Raw PixelsCode1
ConsumerBench: Benchmarking Generative AI Applications on End-User DevicesCode1
A Comprehensive Benchmark for RNA 3D Structure-Function ModelingCode1
animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacousticsCode1
AD-LLM: Benchmarking Large Language Models for Anomaly DetectionCode1
GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule GenerationCode1
Event Probability Mask (EPM) and Event Denoising Convolutional Neural Network (EDnCNN) for Neuromorphic CamerasCode1
Benchmarking Counterfactual Image GenerationCode1
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning PotentialsCode1
Examining Post-Training Quantization for Mixture-of-Experts: A BenchmarkCode1
Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic AnalysisCode1
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite ImageryCode1
Benchmarking Object Detectors with COCO: A New Path ForwardCode1
Long Range Arena: A Benchmark for Efficient TransformersCode1
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive CareCode1
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMMCode1
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field TestsCode1
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text InterpretationCode1
Evaluating histopathology transfer learning with ChampKitCode1
Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures TranslationCode1
Benchmarking Neural Network Robustness to Common Corruptions and Surface VariationsCode1
MC-Blur: A Comprehensive Benchmark for Image DeblurringCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and TasksCode1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19Code1
Benchmarking deep inverse models over time, and the neural-adjoint methodCode1
A Call to Reflect on Evaluation Practices for Failure Detection in Image ClassificationCode1
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareCode1
AnomalyHop: An SSL-based Image Anomaly Localization MethodCode1
Evaluating Multimodal Representations on Visual Semantic Textual SimilarityCode1
Evaluation of large language models for discovery of gene set functionCode1
Benchmarking Natural Language Understanding Services for building Conversational AgentsCode1
Evaluating Adversarial Attacks on ImageNet: A Reality Check on Misclassification ClassesCode1
Benchmarking Deep Learning Interpretability in Time Series PredictionsCode1
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and ToolkitCode1
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality MetricsCode1
An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality RecognitionCode1
Benchmarking Deep Models for Salient Object DetectionCode1
Benchmarking Multi-Scene Fire and Smoke DetectionCode1
Evaluating Attribution for Graph Neural NetworksCode1
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor EnvironmentsCode1
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of CancerCode1
Benchmarking Neural Network Generalization for Grammar InductionCode1
Data-Driven Denoising of Stationary Accelerometer SignalsCode1
Curious Hierarchical Actor-Critic Reinforcement LearningCode1
Benchmarking emergency department triage prediction models with machine learning and large public electronic health recordsCode1
Benchmarking Multimodal Knowledge Conflict for Large Multimodal ModelsCode1
Benchmarking Detection Transfer Learning with Vision TransformersCode1
3DYoga90: A Hierarchical Video Dataset for Yoga Pose UnderstandingCode1
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality RobustnessCode1
Show:102550
← PrevPage 19 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified