SOTAVerified

Benchmarking

Papers

Showing 45764600 of 5548 papers

TitleStatusHype
Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language ModelsCode0
AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark SuiteCode0
BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image AnalysisCode0
Illuminating the Diversity-Fitness Trade-Off in Black-Box OptimizationCode0
Revisiting Hate Speech Benchmarks: From Data Curation to System DeploymentCode0
Local manifold learning and its link to domain-based physics knowledgeCode0
LOCO-EPI: Leave-one-chromosome-out (LOCO) as a benchmarking paradigm for deep learning based prediction of enhancer-promoter interactionsCode0
IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C)Code0
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug ErrorsCode0
BioSentVec: creating sentence embeddings for biomedical textsCode0
LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning ChallengesCode0
IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical SystemsCode0
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF InfeasibleCode0
LogoNet: a fine-grained network for instance-level logo sketch retrievalCode0
Identifying Money Laundering Subgraphs on the BlockchainCode0
Identifying and Benchmarking Natural Out-of-Context Prediction ProblemsCode0
Analysis | OPEN | Published: 17 June 2019 Multitask learning and benchmarking with clinical time series dataCode0
IdeaBench: Benchmarking Large Language Models for Research Idea GenerationCode0
IceBench: A Benchmark for Deep Learning based Sea Ice Type ClassificationCode0
BioFors: A Large Biomedical Image Forensics DatasetCode0
Benchmarking Attribution Methods with Relative Feature ImportanceCode0
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMsCode0
Hyperspectral Image Dataset for Benchmarking on Salient Object DetectionCode0
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement LearningCode0
Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face RecognitionCode0
Show:102550
← PrevPage 184 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified