SOTAVerified

Benchmarking

Papers

Showing 29513000 of 5548 papers

TitleStatusHype
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence0
Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets0
Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning0
Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures0
Dynamic Risk Assessment Methodology with an LDM-based System for Parking Scenarios0
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding0
E2E Parking Dataset: An Open Benchmark for End-to-End Autonomous Parking0
EarthquakeNPP: Benchmark Datasets for Earthquake Forecasting with Neural Point Processes0
EASTER: Efficient and Scalable Text Recognizer0
ECG-Adv-GAN: Detecting ECG Adversarial Examples with Conditional Generative Adversarial Networks0
ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph0
EconGym: A Scalable AI Testbed with Diverse Economic Tasks0
EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments0
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey0
Edge-First Language Model Inference: Models, Metrics, and Tradeoffs0
EdgeMark: An Automation and Benchmarking System for Embedded Artificial Intelligence Tools0
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods0
EEGS: A Transparent Model of Emotions0
EffCNet: An Efficient CondenseNet for Image Classification on NXP BlueBox0
Effective Evaluation of Deep Active Learning on Image Classification Tasks0
Effective Transfer of Pretrained Large Visual Model for Fabric Defect Segmentation via Specifc Knowledge Injection0
Efficacy of Synthetic Data as a Benchmark0
Efficiency in European Air Traffic Management -- A Fundamental Analysis of Data, Models, and Methods0
Efficient computation of backprojection arrays for 3D light field deconvolution0
Efficient and Accurate In-Database Machine Learning with SQL Code Generation in Python0
Efficient Benchmarking of Algorithm Configuration Procedures via Model-Based Surrogates0
Efficient Benchmarking of Language Models0
Efficient Benchmarking of NLP APIs using Multi-armed Bandits0
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack0
Efficient Channel Estimation for Millimeter Wave and Terahertz Systems Enabled by Integrated Super-resolution Sensing and Communication0
Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models0
Efficient Expression Neutrality Estimation with Application to Face Recognition Utility Prediction0
Efficiently Exploring Ordering Problems through Conflict-directed Search0
Efficiently Quantifying Individual Agent Importance in Cooperative MARL0
Efficient Processing of Deep Neural Networks: A Tutorial and Survey0
Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification0
EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection0
Efficient Training of Deep Classifiers for Wireless Source Identification using Test SNR Estimates0
Egocentric Human-Object Interaction Detection: A New Benchmark and Method0
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision0
EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations0
ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"0
ELSA: Evaluating Localization of Social Activities in Urban Streets using Open-Vocabulary Detection0
Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation0
Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework0
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents0
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool0
Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description0
EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models0
Emotion Analysis of Tweets Banning Education in Afghanistan0
Show:102550
← PrevPage 60 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified