SOTAVerified

Benchmarking

Papers

Showing 18011850 of 5548 papers

TitleStatusHype
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools0
Comparative Benchmarking of Causal Discovery Techniques0
Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis0
Comparative Design Space Exploration of Dense and Semi-Dense SLAM0
Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery0
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices0
LAraBench: Benchmarking Arabic AI with Large Language Models0
Comparing Computing Platforms for Deep Learning on a Humanoid Robot0
ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors0
Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells0
An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition0
Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices0
Comparison of feature extraction and dimensionality reduction methods for single channel extracellular spike sorting0
Comparison of tree-based ensemble algorithms for merging satellite and earth-observed precipitation data at the daily time scale0
DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift0
CompBench: Benchmarking Complex Instruction-guided Image Editing0
Dual Encoder-Decoder based Generative Adversarial Networks for Disentangled Facial Representation Learning0
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models0
An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction0
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance0
Complexity of Representations in Deep Learning0
Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition0
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
ChatGPT Alternative Solutions: Large Language Models Survey0
Comprehensive Energy Footprint Benchmarking Algorithm for Electrified Powertrains0
Comprehensive Energy Footprint Benchmarking of Strong Parallel Electrified Powertrain0
Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data0
Computational and Exploratory Landscape Analysis of the GKLS Generator0
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets0
Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts0
Computer-aided diagnosis and prediction in brain disorders0
Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art0
DRIV100: In-The-Wild Multi-Domain Dataset and Evaluation for Real-World Domain Adaptation of Semantic Segmentation0
ConDefects: A New Dataset to Address the Data Leakage Concern for LLM-based Fault Localization and Program Repair0
A War Beyond Deepfake: Benchmarking Facial Counterfeits and Countermeasures0
Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity0
Conditional Neural Processes for Molecules0
Benchmarking Decoupled Neural Interfaces with Synthetic Gradients0
CoNES: Convex Natural Evolutionary Strategies0
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization0
Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth using Stochastic Grammars0
Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation0
Dual Task Framework for Improving Persona-grounded Dialogue Dataset0
Connecting convex energy-based inference and optimal transport for domain adaptation0
Dynamic benchmarking framework for LLM-based conversational data capture0
CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization0
Benchmarking deep generative models for diverse antibody sequence design0
Characterizing Transactional Databases for Frequent Itemset Mining0
Show:102550
← PrevPage 37 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified