SOTAVerified

Benchmarking

Papers

Showing 38013825 of 5548 papers

TitleStatusHype
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration0
Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation FrameworkCode0
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMsCode0
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization ProblemsCode0
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems0
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language ModelsCode0
RD-Suite: A Benchmark for Ranking Distillation0
Self-Adjusting Weighted Expected Improvement for Bayesian OptimizationCode0
Benchmarking Foundation Models with Language-Model-as-an-Examiner0
ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation0
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities0
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models0
Explainable AI using expressive Boolean formulas0
Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging0
Benchmarking Middle-Trained Language Models for Neural Search0
N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition0
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning0
EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection0
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models0
ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation0
Break a Lag: Triple Exponential Moving Average for Enhanced Optimization0
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study0
The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI0
Show:102550
← PrevPage 153 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified