SOTAVerified

Benchmarking

Papers

Showing 41014150 of 5548 papers

TitleStatusHype
Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training0
Performance Benchmarking of Psychomotor Skills Using Wearable Devices: An Application in Sport0
Performance Comparison of Surrogate-Assisted Evolutionary Algorithms on Computational Fluid Dynamics Problems0
Performance Evaluation Methodology for Long-Term Visual Object Tracking0
Benchmark Dataset for Pore-Scale CO2-Water Interaction0
TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations0
Performance Evaluation of Transcriptomics Data Normalization for Survival Risk Prediction0
Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale0
Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding0
Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As0
Performance prediction of data streams on high-performance architecture0
Periocular Recognition in the Wild with Orthogonal Combination of Local Binary Coded Pattern in Dual-stream Convolutional Neural Network0
Which models are innately best at uncertainty estimation?0
PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language0
WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain0
Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics0
PerSEval: Assessing Personalization in Text Summarizers0
A Conformance Checking-based Approach for Drift Detection in Business Processes0
Personalised Feedback Framework for Online Education Programmes Using Generative AI0
Benchmark Data Repositories for Better Benchmarking0
Personalized Multimodal Large Language Models: A Survey0
Personalized On-Device E-health Analytics with Decentralized Block Coordinate Descent0
Person Re-Identification by Unsupervised Video Matching0
Person Re-Identification in Identity Regression Space0
Person Re-identification in the Wild0
Person Search by Multi-Scale Matching0
Person Search by Multi-Scale Matching0
Perspective on recent developments and challenges in regulatory and systems genomics0
Perspectives on the State and Future of Deep Learning -- 20230
Perturbation-based exploration methods in deep reinforcement learning0
Benchmark Analysis of Various Pre-trained Deep Learning Models on ASSIRA Cats and Dogs Dataset0
BENCHIP: Benchmarking Intelligence Processors0
PGLearn -- An Open-Source Learning Toolkit for Optimal Power Flow0
PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions0
BenchCouncil's View on Benchmarking AI and Other Emerging Workloads0
PhD Thesis on Code Modulated Interferometric Imaging System using Phased Arrays0
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle0
PhilHumans: Benchmarking Machine Learning for Personal Health0
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology0
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding0
PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models0
Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning0
Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs0
PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach0
BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation0
Behavior Structformer: Learning Players Representations with Structured Tokenization0
Yesil o1 Pro: Evidence-Based AI Model for Health and Benchmarking in Clinical Decision Support0
PieTrack: An MOT solution based on synthetic data training and self-supervised domain adaptation0
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents0
Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data0
Show:102550
← PrevPage 83 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified