SOTAVerified

Benchmarking

Papers

Showing 28512900 of 5548 papers

TitleStatusHype
The Design and Implementation of a Scalable DL Benchmarking Platform0
Handwritten Text Recognition: A Survey0
HaN-Seg: The head and neck organ-at-risk CT and MR segmentation dataset0
xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods0
Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead0
Hardware-aware mobile building block evaluation for computer vision0
The Disagreement Problem in Faithfulness Metrics0
The DLV System for Knowledge Representation and Reasoning0
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study0
The Dota 2 Bot Competition0
Benchmarking XAI Explanations with Human-Aligned Evaluations0
A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior0
Benchmarking with MIMIC-IV, an irregular, spare clinical time series dataset0
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard0
Hawk: An Industrial-strength Multi-label Document Classifier0
Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset0
Benchmarking VLMs' Reasoning About Persuasive Atypical Images0
Haze Visibility Enhancement: A Survey and Quantitative Benchmarking0
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information0
Heidelberg Colorectal Data Set for Surgical Data Science in the Sensor Operating Room0
HelixDesign-Binder: A Scalable Production-Grade Platform for Binder Design Built on HelixFold30
Helsinki Deblur Challenge 2021: description of photographic data0
HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding0
Agent-oriented Joint Decision Support for Data Owners in Auction-based Federated Learning0
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression0
Benchmarking Vision Language Models on German Factual Data0
The Effect of Domain and Diacritics in Yoruba–English Neural Machine Translation0
Jointly Modeling and Clustering Tensors in High Dimensions0
Heterogeneous graph neural networks for species distribution modeling0
Hide and Seek: on the Stealthiness of Attacks against Deep Learning Systems0
Hiding in Plain Sight: Reframing Hardware Trojan Benchmarking as a Hide&Seek Modification0
Agentic Mixture-of-Workflows for Multi-Modal Chemical Search0
Benchmarking Vision Language Models for Cultural Understanding0
Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce0
AA3DNet: Attention Augmented Real Time 3D Object Detection0
High Accuracy Tumor Diagnoses and Benchmarking of Hematoxylin and Eosin Stained Prostate Core Biopsy Images Generated by Explainable Deep Neural Networks0
Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals0
High Fidelity RF Clutter Modeling and Simulation0
High-Level Synthesis Performance Prediction using GNNs: Benchmarking, Modeling, and Advancing0
Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving0
The EuroCity Persons Dataset: A Novel Benchmark for Object Detection0
The Evolutionary Computation Methods No One Should Use0
HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects0
Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments0
Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation0
Benchmarking Video Frame Interpolation0
SnCQA: A hardware-efficient equivariant quantum convolutional circuit architecture0
HLB: Benchmarking LLMs' Humanlikeness in Language Use0
Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data0
The Expressive Power of Word Embeddings0
Show:102550
← PrevPage 58 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified