SOTAVerified

Benchmarking

Papers

Showing 16011625 of 5548 papers

TitleStatusHype
Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum ChemistryCode0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
HumaniBench: A Human-Centric Framework for Large Multimodal Models EvaluationCode0
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersCode0
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methodsCode0
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
An implementation of the "Guess who?" game using CLIPCode0
Adjusting Pretrained Backbones for PerformativityCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysisCode0
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes EquationsCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical StudyCode0
An Exploration of Exploration: Measuring the ability of lexicase selection to find obscure pathways to optimalityCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
MANTRA: The Manifold Triangulations AssemblageCode0
KArSL: Arabic Sign Language DatabaseCode0
An Experimental Study of the Transferability of Spectral Graph NetworksCode0
Benchmarking Classic and Learned Navigation in Complex 3D EnvironmentsCode0
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-ZenCode0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic DataCode0
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation ModelsCode0
JExplore: Design Space Exploration Tool for Nvidia Jetson BoardsCode0
Show:102550
← PrevPage 65 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified