SOTAVerified

Benchmarking

Papers

Showing 29012950 of 5548 papers

TitleStatusHype
GNUMAP: A Parameter-Free Approach to Unsupervised Dimensionality Reduction via Graph Neural Networks0
Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images0
Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning0
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks0
On the Evaluation Consistency of Attribution-based ExplanationsCode0
Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection0
Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical ImagingCode0
Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems0
GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy0
SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images0
Quality Assured: Rethinking Annotation Strategies in Imaging AI0
Building a Domain-specific Guardrail Model in Production0
Flexible Generation of Preference Data for Recommendation AnalysisCode0
Can time series forecasting be automated? A benchmark and analysis0
Aggregated Attributions for Explanatory Analysis of 3D Segmentation ModelsCode0
Hi-EF: Benchmarking Emotion Forecasting in Human-interactionCode0
BONES: a Benchmark fOr Neural Estimation of Shapley valuesCode0
StylusAI: Stylistic Adaptation for Robust German Handwritten Text Generation0
Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QACode0
Benchmarks as Microscopes: A Call for Model Metrology0
Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research0
Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems0
InLUT3D: Challenging real indoor dataset for point cloud analysis0
Open-CD: A Comprehensive Toolbox for Change Detection0
Non-Reference Quality Assessment for Medical Imaging: Application to Synthetic Brain MRIs0
OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking0
Vision-Based Power Line Cables and Pylons Detection for Low Flying Aircraft0
SHS: Scorpion Hunting Strategy Swarm Algorithm0
Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection0
Benchmarking deep learning models for bearing fault diagnosis using the CWRU dataset: A multi-label approach0
Enhancing Biomedical Knowledge Discovery for Diseases: An Open-Source Framework Applied on Rett Syndrome and Alzheimer's DiseaseCode0
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle0
RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark0
Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance0
Benchmarking Robust Self-Supervised Learning Across Diverse Downstream TasksCode0
FETCH: A Memory-Efficient Replay Approach for Continual Learning in Image Classification0
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?0
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models0
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual RelationshipsCode0
HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects0
Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data0
Temporal receptive field in dynamic graph learning: A comprehensive analysisCode0
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification0
Feature interpretability in BCIs: exploring the role of network lateralizationCode0
Benchmarking the Attribution Quality of Vision ModelsCode0
REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image MatchingCode0
AstroMLab 1: Who Wins Astronomy Jeopardy!?0
On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction0
ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation0
Benchmarking Vision Language Models for Cultural Understanding0
Show:102550
← PrevPage 59 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified