SOTAVerified

Benchmarking

Papers

Showing 16011650 of 5548 papers

TitleStatusHype
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural NetworksCode0
HumaniBench: A Human-Centric Framework for Large Multimodal Models EvaluationCode0
Learned Sorted Table Search and Static Indexes in Small Model SpaceCode0
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual TrackingCode0
Large-scale Ridesharing DARP Instances Based on Real Travel DemandCode0
An implementation of the "Guess who?" game using CLIPCode0
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?Code0
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methodsCode0
Adjusting Pretrained Backbones for PerformativityCode0
Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysisCode0
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes EquationsCode0
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical StudyCode0
An Exploration of Exploration: Measuring the ability of lexicase selection to find obscure pathways to optimalityCode0
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and ChallengesCode0
Learnability and Complexity of Quantum SamplesCode0
MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access BookCode0
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methodsCode0
An Experimental Study of the Transferability of Spectral Graph NetworksCode0
Benchmarking Classic and Learned Navigation in Complex 3D EnvironmentsCode0
An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic DataCode0
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersCode0
Language-based Image Colorization: A Benchmark and BeyondCode0
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation ModelsCode0
Benchmarking ChatGPT on Algorithmic ReasoningCode0
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation OncologyCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-RiskCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
A New Cervical Cytology Dataset for Nucleus Detection and Image Classification (Cervix93) and Methods for Cervical Nucleus DetectionCode0
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-ZenCode0
A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papersCode0
KArSL: Arabic Sign Language DatabaseCode0
A Biologically Plausible Benchmark for Contextual Bandit Algorithms in Precision Oncology Using in vitro DataCode0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
Joint Multi-Scale Tone Mapping and Denoising for HDR Image EnhancementCode0
JExplore: Design Space Exploration Tool for Nvidia Jetson BoardsCode0
A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal KnowledgeCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency PredictionCode0
Benchmarking AutoML algorithms on a collection of synthetic classification problemsCode0
A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered EnvironmentCode0
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMsCode0
A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and InterpretabilityCode0
Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large pCode0
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMsCode0
Benchmarking a transformer-FREE model for ad-hoc retrievalCode0
Show:102550
← PrevPage 33 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified