SOTAVerified

Benchmarking

Papers

Showing 29012925 of 5548 papers

TitleStatusHype
GNUMAP: A Parameter-Free Approach to Unsupervised Dimensionality Reduction via Graph Neural Networks0
Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images0
Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning0
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks0
On the Evaluation Consistency of Attribution-based ExplanationsCode0
Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection0
Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical ImagingCode0
Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems0
GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy0
SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images0
Quality Assured: Rethinking Annotation Strategies in Imaging AI0
Building a Domain-specific Guardrail Model in Production0
Flexible Generation of Preference Data for Recommendation AnalysisCode0
Can time series forecasting be automated? A benchmark and analysis0
Aggregated Attributions for Explanatory Analysis of 3D Segmentation ModelsCode0
Hi-EF: Benchmarking Emotion Forecasting in Human-interactionCode0
BONES: a Benchmark fOr Neural Estimation of Shapley valuesCode0
StylusAI: Stylistic Adaptation for Robust German Handwritten Text Generation0
Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QACode0
Benchmarks as Microscopes: A Call for Model Metrology0
Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research0
Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems0
InLUT3D: Challenging real indoor dataset for point cloud analysis0
Open-CD: A Comprehensive Toolbox for Change Detection0
Non-Reference Quality Assessment for Medical Imaging: Application to Synthetic Brain MRIs0
Show:102550
← PrevPage 117 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified