SOTAVerified

Benchmarking

Papers

Showing 17261750 of 5548 papers

TitleStatusHype
Benchmarking AutoML algorithms on a collection of synthetic classification problemsCode0
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language ModelsCode0
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking StudyCode0
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMsCode0
JATE 2.0: Java Automatic Term Extraction with Apache SolrCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
IoT Data Trust Evaluation via Machine LearningCode0
IPC: A Benchmark Data Set for Learning with Graph-Structured DataCode0
Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-BenchCode0
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot InteractionsCode0
IOLBENCH: Benchmarking LLMs on Linguistic ReasoningCode0
ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical ImagesCode0
Inverse Contextual Bandits: Learning How Behavior Evolves over TimeCode0
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data ImbalanceCode0
Can geometric combinatorics improve RNA branching predictions?Code0
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAMCode0
Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual NavigationCode0
Can a single neuron learn predictive uncertainty?Code0
Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim Evidence ReasoningCode0
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning modelsCode0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language ModelsCode0
Analyzing the Feature Extractor Networks for Face Image SynthesisCode0
InstaIndoor and Multi-modal Deep Learning for Indoor Scene RecognitionCode0
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified ModelCode0
Show:102550
← PrevPage 70 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified