SOTAVerified

Benchmarking

Papers

Showing 31513200 of 5548 papers

TitleStatusHype
Is margin all you need? An extensive empirical study of active learning on tabular data0
Benchmarking real-time monitoring strategies for ethanol production from lignocellulosic biomass0
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations0
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval0
Benchmarking real-time algorithms for in-phase auditory stimulation of low amplitude slow waves with wearable EEG devices during sleep0
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?0
Is Self-Supervision Enough? Benchmarking Foundation Models Against End-to-End Training for Mitotic Figure Classification0
Is Single-View Mesh Reconstruction Ready for Robotics?0
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images0
Is Synthetic Dataset Reliable for Benchmarking Generalizable Person Re-Identification?0
Is Transfer Learning Necessary for Protein Landscape Prediction?0
Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?0
Is Your Benchmark (Still) Useful? Dynamic Benchmarking for Code Language Models0
The Trap of Presumed Equivalence: Artificial General Intelligence Should Not Be Assessed on the Scale of Human Intelligence0
A Flatter Loss for Bias Mitigation in Cross-dataset Facial Age Estimation0
Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review0
Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes0
Iterated Invariant Extended Kalman Filter (IterIEKF)0
Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines0
It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives0
"It's a Match!" -- A Benchmark of Task Affinity Scores for Joint Learning0
iWarded: A System for Benchmarking Datalog+/- Reasoning (technical report)0
IXGS-Intraoperative 3D Reconstruction from Sparse, Arbitrarily Posed Real X-rays0
Jailbreak Distillation: Renewable Safety Benchmarking0
The Unconstrained Ear Recognition Challenge0
JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads0
Japanese to English/Chinese/Korean Datasets for Translation Quality Estimation and Automatic Post-Editing0
The Unconstrained Ear Recognition Challenge 2019 - ArXiv Version With Appendix0
Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection0
JENGA: Object selection and pose estimation for robotic grasping from a stack0
AFAT: Adaptive Failure-Aware Tracker for Robust Visual Object Tracking0
Job2Vec: Job Title Benchmarking with Collective Multi-View Representation Learning0
JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models0
Job recommendations: benchmarking of collaborative filtering methods for classifieds0
Benchmarking quantized LLaMa-based models on the Brazilian Secondary School Exam0
Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning0
Benchmarking Quality-Dependent and Cost-Sensitive Score-Level Multimodal Biometric Fusion Algorithms0
Joint Batching and Scheduling for High-Throughput Multiuser Edge AI with Asynchronous Task Arrivals0
JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation0
Joint Learning of Brain Lesion and Anatomy Segmentation from Heterogeneous Datasets0
Joint Linear Precoding and DFT Beamforming Design for Massive MIMO Satellite Communication0
Jointly learning heterogeneous features for rgb-d activity recognition0
Jointly Learning Knowledge Embedding and Neighborhood Consensus with Relational Knowledge Distillation for Entity Alignment0
Joint Multi-Domain Learning for Automatic Short Answer Grading0
Joint multi-person detection and tracking from overlapping cameras0
AERO: Softmax-Only LLMs for Efficient Private Inference0
Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks0
Joint Phase Shift Optimization and Precoder Selection for RIS-Assisted 5G NR MIMO Systems0
Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking0
Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models0
Show:102550
← PrevPage 64 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified