SOTAVerified

Benchmarking

Papers

Showing 53515400 of 5548 papers

TitleStatusHype
Affine Non-negative Collaborative Representation Based Pattern ClassificationCode0
Subgroup analysis methods for time-to-event outcomes in heterogeneous randomized controlled trialsCode0
A Benchmarking Dataset with 2440 Organic Molecules for Volume Distribution at Steady StateCode0
Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark StudyCode0
Subjective Visual Quality Assessment for High-Fidelity Learning-Based Image CompressionCode0
Constructing a Psychometric Testbed for Fair Natural Language ProcessingCode0
Benchmarking down-scaled (not so large) pre-trained language modelsCode0
VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily ActivitiesCode0
Constrained Reinforcement Learning for Safe Heat Pump ControlCode0
Benchmarking Domain Generalization Algorithms in Computational PathologyCode0
When Multi-Task Learning Meets Partial Supervision: A Computer Vision ReviewCode0
XFEVER: Exploring Fact Verification across LanguagesCode0
Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and DatasetCode0
ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor AlgorithmsCode0
Benchmarking Distributional Alignment of Large Language ModelsCode0
ConQRet: Benchmarking Fine-Grained Evaluation of Retrieval Augmented Argumentation with LLM JudgesCode0
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation ModelsCode0
PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language ModelsCode0
VideoMarkBench: Benchmarking Robustness of Video WatermarkingCode0
Connectivity Matters: Neural Network Pruning Through the Lens of Effective SparsityCode0
ANNA: Abstractive Text-to-Image Synthesis with Filtered News CaptionsCode0
Precise Benchmarking of Explainable AI Attribution MethodsCode0
Trade-offs in Privacy-Preserving Eye Tracking through Iris Obfuscation: A Benchmarking StudyCode0
Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical ImagesCode0
PredictaBoard: Benchmarking LLM Score PredictabilityCode0
Benchmarking Differentially Private Residual Networks for Medical ImageryCode0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical ImagingCode0
GNN-Suite: a Graph Neural Network Benchmarking Framework for Biomedical InformaticsCode0
Benchmarking Deep Spiking Neural Networks on Neuromorphic HardwareCode0
SurvUnc: A Meta-Model Based Uncertainty Quantification Framework for Survival AnalysisCode0
Aesthetic Image Captioning From Weakly-Labelled PhotographsCode0
Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical InvestigationCode0
An implementation of the "Guess who?" game using CLIPCode0
ADVIO: An authentic dataset for visual-inertial odometryCode0
CONGRA: Benchmarking Automatic Conflict ResolutionCode0
When the Music Stops: Tip-of-the-Tongue Retrieval for MusicCode0
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor PerturbationCode0
Present and Future Generalization of Synthetic Image DetectorsCode0
SweetRS: Dataset for a recommender systems of sweetsCode0
PRGFlow: Benchmarking SWAP-Aware Unified Deep Visual Inertial OdometryCode0
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes EquationsCode0
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset EvaluationCode0
Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based VisualizationsCode0
Conditional out-of-sample generation for unpaired data using trVAECode0
Deep Jansen-Rit Parameter Inference for Model-Driven Analysis of Brain ActivityCode0
Adversarial Metric Attack and Defense for Person Re-identificationCode0
Conditional diffusions for amortized neural posterior estimationCode0
Where are we now? A large benchmark study of recent symbolic regression methodsCode0
Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media DataCode0
Show:102550
← PrevPage 108 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified