SOTAVerified

Benchmarking

Papers

Showing 44514500 of 5548 papers

TitleStatusHype
Assumed Identities: Quantifying Gender Bias in Machine Translation of Gender-Ambiguous Occupational Terms0
Retrieval-Augmented Generation for Service Discovery: Chunking Strategies and Benchmarking0
Unsupervised Hierarchical Grouping of Knowledge Graph Entities0
AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science0
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis0
Assessing the risk of re-identification arising from an attack on anonymised data0
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion0
Review and experimental benchmarking of machine learning algorithms for efficient optimization of cold atom experiments0
Reviewing and Benchmarking Parameter Control Methods in Differential Evolution0
Categorization and analysis of 14 computational methods for estimating cell potency from single-cell RNA-seq data0
Unsupervised Learning of 3D Object Categories from Videos in the Wild0
Unsupervised machine learning approach for building composite indicators with fuzzy metrics0
Multi-Agent Reinforcement Learning with Long-Term Performance Objectives for Service Workforce Optimization0
Assessing the efficacy of large language models in generating accurate teacher responses0
Unsupervised Person Re-identification by Deep Learning Tracklet Association0
Revisiting Implicit Models: Sparsity Trade-offs Capability in Weight-tied Model for Vision Tasks0
Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets0
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking0
Assessing Foundation Models for Sea Ice Type Segmentation in Sentinel-1 SAR Imagery0
Assessing Encoder-Decoder Architectures for Robust Coronary Artery Segmentation0
Revisiting Safe Exploration in Safe Reinforcement learning0
ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems0
A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment0
A Spiking Neural Network for Image Segmentation0
A Spatial Mapping Algorithm with Applications in Deep Learning-Based Structure Classification0
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets0
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning0
A Solid-State Nanopore Signal Generator for Training Machine Learning Models0
RF Fingerprinting Needs Attention: Multi-task Approach for Real-World WiFi and Bluetooth0
A Simulation-Augmented Benchmarking Framework for Automatic RSO Streak Detection in Single-Frame Space Images0
Riemannian Geometry for the classification of brain states with intracortical brain-computer interfaces0
Riemannian Self-Attention Mechanism for SPD Networks0
A Simple Evolutionary Algorithm for Multi-modal Multi-objective Optimization0
RISEdb: a Novel Indoor Localization Dataset0
Risk Aware Benchmarking of Large Language Models0
Risk-Neutral Generative Networks0
ASI: Accuracy-Stability Index for Evaluating Deep Learning Models0
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations0
RL-Based Method for Benchmarking the Adversarial Resilience and Robustness of Deep Reinforcement Learning Policies0
A Seven-Layer Model for Standardising AI Fairness Assessment0
A Sequence-to-Sequence Model for Semantic Role Labeling0
A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking0
A Scalable Approach to Benchmarking the In-Conversation Differential Diagnostic Accuracy of a Health AI0
Artificial Intelligence for Microbiology and Microbiome Research0
RNAmountAlign: efficient software for local, global, semiglobal pairwise and multiple RNA sequence/structure alignment0
A Comprehensive Guide to CAN IDS Data & Introduction of the ROAD Dataset0
ARTeFACT: Benchmarking Segmentation Models on Diverse Analogue Media Damage0
ROBBIE: Robust Bias Evaluation of Large Generative Language Models0
OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images0
A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text0
Show:102550
← PrevPage 90 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified