SOTAVerified

Benchmarking

Papers

Showing 18261850 of 5548 papers

TitleStatusHype
Benchmarks as Microscopes: A Call for Model Metrology0
Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems0
LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class TaxonomiesCode1
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal ReasoningCode1
Open-CD: A Comprehensive Toolbox for Change Detection0
StylusAI: Stylistic Adaptation for Robust German Handwritten Text Generation0
Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QACode0
Non-Reference Quality Assessment for Medical Imaging: Application to Synthetic Brain MRIs0
POGEMA: A Benchmark Platform for Cooperative Multi-Agent PathfindingCode1
Benchmarking deep learning models for bearing fault diagnosis using the CWRU dataset: A multi-label approach0
OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking0
Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection0
Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and EvaluationsCode1
ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?Code7
Vision-Based Power Line Cables and Pylons Detection for Low Flying Aircraft0
SHS: Scorpion Hunting Strategy Swarm Algorithm0
Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance0
RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark0
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle0
Restore Anything Model via Efficient Degradation AdaptationCode1
Enhancing Biomedical Knowledge Discovery for Diseases: An Open-Source Framework Applied on Rett Syndrome and Alzheimer's DiseaseCode0
Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data0
Temporal receptive field in dynamic graph learning: A comprehensive analysisCode0
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual RelationshipsCode0
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?0
Show:102550
← PrevPage 74 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified