SOTAVerified

Benchmarking

Papers

Showing 52015250 of 5548 papers

TitleStatusHype
Efficiently Quantifying Individual Agent Importance in Cooperative MARL0
SysML'19 demo: customizable and reusable Collective Knowledge pipelines to automate and reproduce machine learning experiments0
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency0
Class-agnostic Object Detection0
Efficient Processing of Deep Neural Networks: A Tutorial and Survey0
Systematic Comparison of Path Planning Algorithms using PathBench0
Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification0
EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection0
Efficient Training of Deep Classifiers for Wireless Source Identification using Test SNR Estimates0
A Line-of-Sight Channel Model for the 100-450 Gigahertz Frequency Band0
Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles0
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives0
Egocentric Human-Object Interaction Detection: A New Benchmark and Method0
CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering0
CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities0
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision0
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry0
EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations0
CIMLA: Interpretable AI for inference of differential causal networks0
CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis0
ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"0
ELSA: Evaluating Localization of Social Activities in Urban Streets using Open-Vocabulary Detection0
Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation0
CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data0
Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework0
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents0
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools0
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool0
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices0
ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors0
Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description0
EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models0
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models0
Emotion Analysis of Tweets Banning Education in Afghanistan0
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task0
Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI0
Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler0
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices0
Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification0
ChatGPT Alternative Solutions: Large Language Models Survey0
SzCORE as a benchmark: report from the seizure detection challenge at the 2025 AI in Epilepsy and Neurological Disorders Conference0
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts0
Enabling Accelerators for Graph Computing0
Automated Machine Learning: A Case Study on Non-Intrusive Appliance Load Monitoring0
Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design0
Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts0
EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting0
CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization0
Characterizing Transactional Databases for Frequent Itemset Mining0
1-D Convlutional Neural Networks for the Analysis of Pupil Size Variations in Scotopic Conditions0
Show:102550
← PrevPage 105 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified