SOTAVerified

Benchmarking

Papers

Showing 54515475 of 5548 papers

TitleStatusHype
A Baseline Statistical Method For Robust User-Assisted Multiple SegmentationCode0
COCO: A Platform for Comparing Continuous Optimizers in a Black-Box SettingCode0
VisionAD, a software package of performant anomaly detection algorithms, and Proportion Localised, an interpretable metricCode0
CNM: An Interpretable Complex-valued Network for MatchingCode0
Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA ArchitecturesCode0
QGym: Scalable Simulation and Benchmarking of Queuing Network ControllersCode0
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty SimulationsCode0
QLBS: Q-Learner in the Black-Scholes(-Merton) WorldsCode0
Benchmarking AutoML algorithms on a collection of synthetic classification problemsCode0
Benchmarking a transformer-FREE model for ad-hoc retrievalCode0
Benchmarking Approximate Inference Methods for Neural Structured PredictionCode0
LMEMs for post-hoc analysis of HPO BenchmarkingCode0
Benchmarking Contemporary Deep Learning Hardware and Frameworks:A Survey of Qualitative MetricsCode0
TAP-DLND 1.0 : A Corpus for Document Level Novelty DetectionCode0
Benchmarking Apache Spark and Hadoop MapReduce on Big Data ClassificationCode0
Who’s on First?: Probing the Learning and Representation Capabilities of Language Models on Deterministic Closed DomainsCode0
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language ModelsCode0
Quality Indicators for Preference-based Evolutionary Multi-objective Optimization Using a Reference Point: A Review and AnalysisCode0
CLMB: deep contrastive learning for robust metagenomic binningCode0
Investigation of UAV Detection in Images with Complex Backgrounds and Rainy ArtifactsCode0
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media ContextsCode0
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical SystemsCode0
Task-Agnostic Graph Neural Network Evaluation via Adversarial CollaborationCode0
Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection SystemCode0
Benchmarking and Understanding Compositional Relational Reasoning of LLMsCode0
Show:102550
← PrevPage 219 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified