SOTAVerified

Benchmarking

Papers

Showing 16511700 of 5548 papers

TitleStatusHype
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency PredictionCode0
Laparoscopic Image Desmoking Using the U-Net with New Loss Function and Integrated Differentiable Wiener FilterCode0
Benchmarking a transformer-FREE model for ad-hoc retrievalCode0
Benchmarking Approximate Inference Methods for Neural Structured PredictionCode0
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methodsCode0
Language-based Image Colorization: A Benchmark and BeyondCode0
Benchmarking Apache Spark and Hadoop MapReduce on Big Data ClassificationCode0
a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verificationCode0
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision TransformersCode0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and RetrievalCode0
Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection SystemCode0
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
Benchmarking and Understanding Compositional Relational Reasoning of LLMsCode0
Characterizing SLAM Benchmarks and Methods for the Robust Perception AgeCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Benchmarking and Rethinking Knowledge Editing for Large Language ModelsCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing EnginesCode0
Benchmarking and optimizing organism wide single-cell RNA alignment methodsCode0
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPsCode0
An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variationsCode0
A Dataset for Web-Scale Knowledge Base PopulationCode0
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-ZenCode0
Benchmarking and Improving Text-to-SQL Generation under AmbiguityCode0
An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State EstimationCode0
Joint Multi-Scale Tone Mapping and Denoising for HDR Image EnhancementCode0
KArSL: Arabic Sign Language DatabaseCode0
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language ModelsCode0
A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting ApproachesCode0
JATE 2.0: Java Automatic Term Extraction with Apache SolrCode0
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable ConfidenceCode0
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMsCode0
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM PipelinesCode0
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text GenerationCode0
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMsCode0
JExplore: Design Space Exploration Tool for Nvidia Jetson BoardsCode0
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional ChineseCode0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
Large-scale Ridesharing DARP Instances Based on Real Travel DemandCode0
IPC: A Benchmark Data Set for Learning with Graph-Structured DataCode0
ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical ImagesCode0
IOLBENCH: Benchmarking LLMs on Linguistic ReasoningCode0
A Benchmarking Study of Vision-based Robotic Grasping AlgorithmsCode0
IoT Data Trust Evaluation via Machine LearningCode0
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic EnvironmentsCode0
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel BugsCode0
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time SeriesCode0
Show:102550
← PrevPage 34 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified