SOTAVerified

Benchmarking

Papers

Showing 15511600 of 5548 papers

TitleStatusHype
Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum SystemsCode0
Answer Consolidation: Formulation and BenchmarkingCode0
Benchmarking down-scaled (not so large) pre-trained language modelsCode0
Benchmarking down-scaled (not so large) pre-trained language modelsCode0
Large Scale Clustering with Variational EM for Gaussian Mixture ModelsCode0
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural NetworksCode0
Learning an Event Sequence Embedding for Dense Event-Based Deep StereoCode0
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?Code0
Benchmarking Domain Generalization Algorithms in Computational PathologyCode0
Large-scale Ridesharing DARP Instances Based on Real Travel DemandCode0
Benchmarking Distributional Alignment of Large Language ModelsCode0
A novel evaluation methodology for supervised Feature Ranking algorithmsCode0
Benchmarking Differentially Private Residual Networks for Medical ImageryCode0
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG RoutingCode0
Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical ImagingCode0
Language-based Image Colorization: A Benchmark and BeyondCode0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency PredictionCode0
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methodsCode0
Benchmarking Deep Spiking Neural Networks on Neuromorphic HardwareCode0
Laparoscopic Image Desmoking Using the U-Net with New Loss Function and Integrated Differentiable Wiener FilterCode0
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and ChallengesCode0
An Optical Control Environment for Benchmarking Reinforcement Learning AlgorithmsCode0
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical InvestigationCode0
An open unified deep graph learning framework for discovering drug leadsCode0
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersCode0
Advancing and Benchmarking Personalized Tool Invocation for LLMsCode0
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor PerturbationCode0
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science CommunicatorsCode0
Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and DatasetCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset EvaluationCode0
Deep Jansen-Rit Parameter Inference for Model-Driven Analysis of Brain ActivityCode0
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methodsCode0
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual RelationshipsCode0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor AlgorithmsCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternativesCode0
ANNA: Abstractive Text-to-Image Synthesis with Filtered News CaptionsCode0
Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated LearningCode0
AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithmsCode0
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-ZenCode0
Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum ChemistryCode0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
HumaniBench: A Human-Centric Framework for Large Multimodal Models EvaluationCode0
KArSL: Arabic Sign Language DatabaseCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Joint Multi-Scale Tone Mapping and Denoising for HDR Image EnhancementCode0
Show:102550
← PrevPage 32 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified