SOTAVerified

Benchmarking

Papers

Showing 32013250 of 5548 papers

TitleStatusHype
Datasets and Benchmarks for Offline Safe Reinforcement LearningCode2
MUBen: Benchmarking the Uncertainty of Molecular Representation ModelsCode0
RRSIS: Referring Remote Sensing Image Segmentation0
A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews0
detrex: Benchmarking Detection Transformers0
Benchmarking Neural Network Training AlgorithmsCode4
Contribution à l'Optimisation d'un Comportement Collectif pour un Groupe de Robots Autonomes0
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine PerceptionCode2
NeuroGraph: Benchmarks for Graph Machine Learning in Brain ConnectomicsCode1
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration0
A Large-Scale Analysis on Self-Supervised Video Representation Learning0
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization ProblemsCode0
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMsCode0
Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical MLCode1
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language ModelsCode0
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems0
Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation FrameworkCode0
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic WritingCode1
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation0
RD-Suite: A Benchmark for Ranking Distillation0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
Benchmarking Foundation Models with Language-Model-as-an-Examiner0
Self-Adjusting Weighted Expected Improvement for Bayesian OptimizationCode0
ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection0
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities0
Explainable AI using expressive Boolean formulas0
Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models0
Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging0
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot LearningCode3
Str2Str: A Score-based Framework for Zero-shot Protein Conformation SamplingCode1
N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition0
Benchmarking Middle-Trained Language Models for Neural Search0
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
LibAUC: A Deep Learning Library for X-Risk OptimizationCode2
RepoBench: Benchmarking Repository-Level Code Auto-Completion SystemsCode1
EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection0
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning0
TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal DomainCode1
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models0
ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation0
Multilingual Conceptual Coverage in Text-to-Image ModelsCode1
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language modelsCode1
Spatially Resolved Gene Expression Prediction from H&E Histology Images via Bi-modal Contrastive LearningCode1
Break a Lag: Triple Exponential Moving Average for Enhanced Optimization0
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study0
The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI0
Revisiting Hate Speech Benchmarks: From Data Curation to System DeploymentCode0
End-to-end Knowledge Retrieval with Multi-modal QueriesCode1
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?Code0
Improving and Benchmarking Offline Reinforcement Learning AlgorithmsCode1
Show:102550
← PrevPage 65 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified