SOTAVerified

Benchmarking

Papers

Showing 15261550 of 5548 papers

TitleStatusHype
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset EvaluationCode0
mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at ScaleCode0
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge0
FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation0
FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization0
Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time SeriesCode0
Multimodal Information Retrieval for Open World with Edit Distance Weak Supervision0
scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection0
A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression0
HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot InteractionCode0
AI-Driven MRI-based Brain Tumour Segmentation Benchmarking0
BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos0
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequencesCode0
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans0
Quantitative Benchmarking of Anomaly Detection Methods in Digital Pathology0
MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control0
QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges0
Staining normalization in histopathology: Method benchmarking using multicenter dataset0
Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions0
Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey0
Benchmarking Music Generation Models and Metrics via Human Preference Studies0
Survey of HPC in US Research Institutions0
Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtypingCode0
Statistical Multicriteria Evaluation of LLM-Generated TextCode0
On the Robustness of Human-Object Interaction Detection against Distribution Shift0
Show:102550
← PrevPage 62 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified