SOTAVerified

Benchmarking

Papers

Showing 54515500 of 5548 papers

TitleStatusHype
A Baseline Statistical Method For Robust User-Assisted Multiple SegmentationCode0
COCO: A Platform for Comparing Continuous Optimizers in a Black-Box SettingCode0
VisionAD, a software package of performant anomaly detection algorithms, and Proportion Localised, an interpretable metricCode0
CNM: An Interpretable Complex-valued Network for MatchingCode0
Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA ArchitecturesCode0
QGym: Scalable Simulation and Benchmarking of Queuing Network ControllersCode0
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty SimulationsCode0
QLBS: Q-Learner in the Black-Scholes(-Merton) WorldsCode0
Benchmarking AutoML algorithms on a collection of synthetic classification problemsCode0
Benchmarking a transformer-FREE model for ad-hoc retrievalCode0
Benchmarking Approximate Inference Methods for Neural Structured PredictionCode0
LMEMs for post-hoc analysis of HPO BenchmarkingCode0
Benchmarking Contemporary Deep Learning Hardware and Frameworks:A Survey of Qualitative MetricsCode0
TAP-DLND 1.0 : A Corpus for Document Level Novelty DetectionCode0
Benchmarking Apache Spark and Hadoop MapReduce on Big Data ClassificationCode0
Who’s on First?: Probing the Learning and Representation Capabilities of Language Models on Deterministic Closed DomainsCode0
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language ModelsCode0
Quality Indicators for Preference-based Evolutionary Multi-objective Optimization Using a Reference Point: A Review and AnalysisCode0
CLMB: deep contrastive learning for robust metagenomic binningCode0
Investigation of UAV Detection in Images with Complex Backgrounds and Rainy ArtifactsCode0
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media ContextsCode0
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical SystemsCode0
Task-Agnostic Graph Neural Network Evaluation via Adversarial CollaborationCode0
Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection SystemCode0
Benchmarking and Understanding Compositional Relational Reasoning of LLMsCode0
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical CasesCode0
A New Cervical Cytology Dataset for Nucleus Detection and Image Classification (Cervix93) and Methods for Cervical Nucleus DetectionCode0
ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate DisclosuresCode0
Benchmarking and Rethinking Knowledge Editing for Large Language ModelsCode0
CLEAVE: Scalable and Edge-native Benchmarking of Networked Control SystemsCode0
Quantitative Metrics for Benchmarking Human-Aware Robot NavigationCode0
Benchmarking and optimizing organism wide single-cell RNA alignment methodsCode0
XTSC-Bench: Quantitative Benchmarking for Explainers on Time Series ClassificationCode0
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained ModelsCode0
Benchmarking and Improving Text-to-SQL Generation under AmbiguityCode0
Quantum Boosting using Domain-Partitioning HypothesesCode0
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMsCode0
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text GenerationCode0
Multi-GPU-Enabled Hybrid Quantum-Classical Workflow in Quantum-HPC Middleware: Applications in Quantum SimulationsCode0
TDBench: Benchmarking Vision-Language Models in Understanding Top-Down ImagesCode0
A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papersCode0
Adversarial Environment Generation for Learning to Navigate the WebCode0
A*3D Dataset: Towards Autonomous Driving in Challenging EnvironmentsCode0
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based ScoringCode0
Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation StrategiesCode0
Quasi-Newton Methods for Machine Learning: Forget the Past, Just SampleCode0
Quaternion Capsule NetworksCode0
QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking ResultsCode0
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMsCode0
Question-Answering Dense Video EventsCode0
Show:102550
← PrevPage 110 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified