SOTAVerified

Benchmarking

Papers

Showing 32263250 of 5548 papers

TitleStatusHype
Explainable AI using expressive Boolean formulas0
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models0
Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging0
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot LearningCode3
Str2Str: A Score-based Framework for Zero-shot Protein Conformation SamplingCode1
N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition0
Benchmarking Middle-Trained Language Models for Neural Search0
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
LibAUC: A Deep Learning Library for X-Risk OptimizationCode2
RepoBench: Benchmarking Repository-Level Code Auto-Completion SystemsCode1
EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection0
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning0
TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal DomainCode1
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models0
ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation0
Multilingual Conceptual Coverage in Text-to-Image ModelsCode1
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language modelsCode1
Spatially Resolved Gene Expression Prediction from H&E Histology Images via Bi-modal Contrastive LearningCode1
Break a Lag: Triple Exponential Moving Average for Enhanced Optimization0
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study0
The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI0
Revisiting Hate Speech Benchmarks: From Data Curation to System DeploymentCode0
End-to-end Knowledge Retrieval with Multi-modal QueriesCode1
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?Code0
Improving and Benchmarking Offline Reinforcement Learning AlgorithmsCode1
Show:102550
← PrevPage 130 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified