SOTAVerified

Benchmarking

Papers

Showing 27012750 of 5548 papers

TitleStatusHype
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition0
Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling0
Dataset and Benchmarking of Real-Time Embedded Object Detection for RoboCup SSL0
Benchmarking fixed-length Fingerprint Representations across different Embedding Sizes and Sensor Types0
Benchmarking five global optimization approaches for nano-optical shape optimization and parameter reconstruction0
A Platform for Event Extraction in Hindi0
Adversarial Reinforcement Learning Framework for Benchmarking Collision Avoidance Mechanisms in Autonomous Vehicles0
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework0
Multi-scale data reconstruction of turbulent rotating flows with Gappy POD, Extended POD and Generative Adversarial Networks0
Data needs and challenges for quantum dot devices automation0
Benchmarking federated strategies in Peer-to-Peer Federated learning for biomedical data0
Data-Driven Target Localization: Benchmarking Gradient Descent Using the Cramer-Rao Bound0
Data-driven surrogate modelling and benchmarking for process equipment0
Data-driven Power Flow Linearization: Simulation0
Benchmarking Federated Machine Unlearning methods for Tabular Data0
A Pipeline for Post-Crisis Twitter Data Acquisition0
Data-driven inventory management for new products: An adjusted Dyna-Q approach with transfer learning0
Benchmarking FedAvg and FedCurv for Image Classification Tasks0
Data-driven Approach for Static Hedging of Exchange Traded Options0
Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset0
A Perspective on Neural Capacity Estimation: Viability and Reliability0
Data Augmentation for Traffic Classification0
Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory0
Benchmarking features from different radiomics toolkits / toolboxes using Image Biomarkers Standardization Initiative0
Data and its (dis)contents: A survey of dataset development and use in machine learning research0
Data Analysis in the Era of Generative AI0
Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization0
A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages0
Accelerating the discovery of steady-states of planetary interior dynamics with machine learning0
DASB -- Discrete Audio and Speech Benchmark0
DarkBench: Benchmarking Dark Patterns in Large Language Models0
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization0
AnyTOD: A Programmable Task-Oriented Dialog System0
DailyQA: A Benchmark to Evaluate Web Retrieval Augmented LLMs Based on Capturing Real-World Changes0
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles0
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS20
DACOS-A Manually Annotated Dataset of Code Smells0
Benchmarking Explanatory Models for Inertia Forecasting using Public Data of the Nordic Area0
Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)0
Adversarially Training for Audio Classifiers0
CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx0
Benchmarking Evolutionary Community Detection Algorithms in Dynamic Networks0
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset0
Benchmarking Evolutionary Algorithms For Single Objective Real-valued Constrained Optimization - A Critical Review0
Anytime Behavior of Inexact TSP Solvers and Perspectives for Automated Algorithm Selection0
Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition0
Benchmarking Ethical and Safety Risks of Healthcare LLMs in China-Toward Systemic Governance under Healthy China 20300
Labelling Vertebrae with 2D Reformations of Multidetector CT Images: An Adversarial Approach for Incorporating Prior Knowledge of Spine Anatomy0
Accelerating IoV Intrusion Detection: Benchmarking GPU-Accelerated vs CPU-Based ML Libraries0
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors0
Show:102550
← PrevPage 55 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified