SOTAVerified

Benchmarking

Papers

Showing 36013650 of 5548 papers

TitleStatusHype
Open the box of digital neuromorphic processor: Towards effective algorithm-hardware co-design0
Opposition based Ensemble Micro Differential Evolution0
Optimal Eco-driving Control of Autonomous and Electric Trucks in Adaptation to Highway Topography: Energy Minimization and Battery Life Extension0
Optimally-Weighted Maximum Mean Discrepancy Framework for Continual Learning0
Optimal PMU Placement for Kalman Filtering of DAE Power System Models0
Optimal Scheduling of Anticipated COVID-19 Vaccination: A Case Study of New York State0
Optimization of Genomic Classifiers for Clinical Deployment: Evaluation of Bayesian Optimization to Select Predictive Models of Acute Infection and In-Hospital Mortality0
Optimization Techniques for a Physical Model of Human Vocalisation0
Optimizing open-domain question answering with graph-based retrieval augmented generation0
Optimizing Recommendations using Fine-Tuned LLMs0
OPTION: OPTImization Algorithm Benchmarking ONtology0
OPTION: OPTImization Algorithm Benchmarking ONtology0
OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery0
Organ-aware Multi-scale Medical Image Segmentation Using Text Prompt Engineering0
Orthogonal Deep Features Decomposition for Age-Invariant Face Recognition0
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents0
oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving0
Out of Distribution Performance of State of Art Vision Model0
Overconfident Oracles: Limitations of In Silico Sequence Design Benchmarking0
Overview and practical recommendations on using Shapley Values for identifying predictive biomarkers via CATE modeling0
Overview of Todai Robot Project and Evaluation Framework of its NLP-based Problem Solving0
OVQA: A Clinically Generated Visual Question Answering Dataset0
Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking0
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms0
Paradigm Shift in Sustainability Disclosure Analysis: Empowering Stakeholders with CHATREPORT, a Language Model-Based Tool0
Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis0
Parsing Any Domain English text to CoNLL dependencies0
Participatory Personalization in Classification0
'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems0
PASTA: A Dataset for Modeling Participant States in Narratives0
PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database0
PathBench: A Benchmarking Platform for Classical and Learned Path Planning Algorithms0
PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology0
Patherea: Cell Detection and Classification for the 2020s0
Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications0
Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite0
PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints0
Perception Test 2023: A Summary of the First Challenge And Outcome0
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark0
Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training0
Performance Benchmarking of Psychomotor Skills Using Wearable Devices: An Application in Sport0
Performance Comparison of Surrogate-Assisted Evolutionary Algorithms on Computational Fluid Dynamics Problems0
Performance Evaluation Methodology for Long-Term Visual Object Tracking0
Performance Evaluation of Transcriptomics Data Normalization for Survival Risk Prediction0
Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale0
Performance of large language models in numerical vs. semantic medical knowledge: Benchmarking on evidence-based Q&As0
Performance prediction of data streams on high-performance architecture0
Periocular Recognition in the Wild with Orthogonal Combination of Local Binary Coded Pattern in Dual-stream Convolutional Neural Network0
PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language0
WeQA: A Benchmark for Retrieval Augmented Generation in Wind Energy Domain0
Show:102550
← PrevPage 73 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified