SOTAVerified

Benchmarking

Papers

Showing 15761600 of 5548 papers

TitleStatusHype
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersCode0
Advancing and Benchmarking Personalized Tool Invocation for LLMsCode0
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor PerturbationCode0
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science CommunicatorsCode0
Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and DatasetCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset EvaluationCode0
Deep Jansen-Rit Parameter Inference for Model-Driven Analysis of Brain ActivityCode0
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methodsCode0
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual RelationshipsCode0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor AlgorithmsCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternativesCode0
ANNA: Abstractive Text-to-Image Synthesis with Filtered News CaptionsCode0
Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated LearningCode0
AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithmsCode0
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-ZenCode0
Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum ChemistryCode0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
HumaniBench: A Human-Centric Framework for Large Multimodal Models EvaluationCode0
KArSL: Arabic Sign Language DatabaseCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Joint Multi-Scale Tone Mapping and Denoising for HDR Image EnhancementCode0
Show:102550
← PrevPage 64 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified