SOTAVerified

Benchmarking

Papers

Showing 53515375 of 5548 papers

TitleStatusHype
Affine Non-negative Collaborative Representation Based Pattern ClassificationCode0
Subgroup analysis methods for time-to-event outcomes in heterogeneous randomized controlled trialsCode0
A Benchmarking Dataset with 2440 Organic Molecules for Volume Distribution at Steady StateCode0
Constructing Confidence Intervals for 'the' Generalization Error -- a Comprehensive Benchmark StudyCode0
Subjective Visual Quality Assessment for High-Fidelity Learning-Based Image CompressionCode0
Constructing a Psychometric Testbed for Fair Natural Language ProcessingCode0
Benchmarking down-scaled (not so large) pre-trained language modelsCode0
VHAKG: A Multi-modal Knowledge Graph Based on Synchronized Multi-view Videos of Daily ActivitiesCode0
Constrained Reinforcement Learning for Safe Heat Pump ControlCode0
Benchmarking Domain Generalization Algorithms in Computational PathologyCode0
When Multi-Task Learning Meets Partial Supervision: A Computer Vision ReviewCode0
XFEVER: Exploring Fact Verification across LanguagesCode0
Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and DatasetCode0
ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor AlgorithmsCode0
Benchmarking Distributional Alignment of Large Language ModelsCode0
ConQRet: Benchmarking Fine-Grained Evaluation of Retrieval Augmented Argumentation with LLM JudgesCode0
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation ModelsCode0
PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language ModelsCode0
VideoMarkBench: Benchmarking Robustness of Video WatermarkingCode0
Connectivity Matters: Neural Network Pruning Through the Lens of Effective SparsityCode0
ANNA: Abstractive Text-to-Image Synthesis with Filtered News CaptionsCode0
Precise Benchmarking of Explainable AI Attribution MethodsCode0
Trade-offs in Privacy-Preserving Eye Tracking through Iris Obfuscation: A Benchmarking StudyCode0
Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical ImagesCode0
PredictaBoard: Benchmarking LLM Score PredictabilityCode0
Show:102550
← PrevPage 215 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified