SOTAVerified

Benchmarking

Papers

Showing 35513600 of 5548 papers

TitleStatusHype
On Evaluation of Bangla Word Analogies0
On Evaluation of Document Classification using RVL-CDIP0
On General Language Understanding0
Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions0
Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots0
On loss functions and evaluation metrics for music source separation0
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling0
On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction0
An Approach to Evaluate Modeling Adequacy for Small-Signal Stability Analysis of IBR-related SSOs in Multimachine Systems0
On Neural Inertial Classification Networks for Pedestrian Activity Recognition0
On quantifying and improving realism of images generated with diffusion0
On Symbiosis of Attribute Prediction and Semantic Segmentation0
On the Assessment of Benchmark Suites for Algorithm Comparison0
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation0
Decisions and Performance Under Bounded Rationality: A Computational Benchmarking Approach0
On the Evaluation of Speech Foundation Models for Spoken Language Understanding0
On the Evaluation of User Privacy in Deep Neural Networks using Timing Side Channel0
On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks0
Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes0
On the Interaction of Belief Bias and Explanations0
On the Performance of Multimodal Language Models0
On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks0
On the project risk baseline: integrating aleatory uncertainty into project scheduling0
On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild0
On the reduction of Linear Parameter-Varying State-Space models0
On the relationship between Benchmarking, Standards and Certification in Robotics and AI0
On the Reliability and Validity of Detecting Approval of Political Actors in Tweets0
On the Robustness of Human-Object Interaction Detection against Distribution Shift0
On the role of benchmarking data sets and simulations in method comparison studies0
Optimizer Benchmarking Needs to Account for Hyperparameter Tuning0
On the Use of Quality Diversity Algorithms for The Traveling Thief Problem0
On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds0
On the Value of ML Models0
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images0
OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations0
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking0
Open-CD: A Comprehensive Toolbox for Change Detection0
OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI0
Open Datasets for Satellite Radio Resource Control0
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation0
OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion0
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety0
OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation0
Open foundation models for Azerbaijani language0
Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs0
Open Llama2 Model for the Lithuanian Language0
OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning0
Open-set object detection: towards unified problem formulation and benchmarking0
OpenSiteRec: An Open Dataset for Site Recommendation0
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks0
Show:102550
← PrevPage 72 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified