SOTAVerified

Benchmarking

Papers

Showing 50515100 of 5548 papers

TitleStatusHype
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models0
DiPCo -- Dinner Party Corpus0
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning0
Disability prediction in multiple sclerosis using performance outcome measures and demographic data0
Disambiguation in Conversational Question Answering in the Era of LLM: A Survey0
DISC: a Dataset for Integrated Sensing and Communication in mmWave Systems0
DISCOMAN: Dataset of Indoor SCenes for Odometry, Mapping And Navigation0
Discosuite - A parser test suite for German discontinuous structures0
Discovering Visual Concept Structure with Sparse and Incomplete Tags0
CompBench: Benchmarking Complex Instruction-guided Image Editing0
Discriminating modelling approaches for Point in Time Economic Scenario Generation0
Discriminative Link Prediction using Local Links, Node Features and Community Structure0
Comparison of tree-based ensemble algorithms for merging satellite and earth-observed precipitation data at the daily time scale0
Disentangling coincident cell events using deep transfer learning and compressive sensing0
DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts0
ALT: A Python Package for Lightweight Feature Representation in Time Series Classification0
Survey of HPC in US Research Institutions0
DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction0
Alpha Excel Benchmark0
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance0
Distortion-adaptive Salient Object Detection in 360^ Omnidirectional Images0
Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
Distributed Software-Defined Network Architecture for Smart Grid Resilience to Denial-of-Service Attacks0
Distributed Training Large-Scale Deep Architectures0
ALP: Action-Aware Embodied Learning for Perception0
SUTD-PRCM Dataset and Neural Architecture Search Approach for Complex Metasurface Design0
Distribution-Based Invariant Deep Networks for Learning Meta-Features0
Sensitivity analysis and experimental evaluation of PID-like continuous sliding mode control0
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation0
Diverse Community Data for Benchmarking Data Privacy Algorithms0
Comparison of feature extraction and dimensionality reduction methods for single channel extracellular spike sorting0
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation0
Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices0
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended)0
DLUE: Benchmarking Document Language Understanding0
Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells0
DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs0
Comparing Foundation Models using Data Kernels0
A Sober Look at the Robustness of CLIPs to Spurious Features0
Comparing Computing Platforms for Deep Learning on a Humanoid Robot0
Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields0
Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery0
Does imputation matter? Benchmark for predictive models0
A Look at the Evaluation Setup of the M5 Forecasting Competition0
Comparative Design Space Exploration of Dense and Semi-Dense SLAM0
Vision-Based Deep Reinforcement Learning of UAV Autonomous Navigation Using Privileged Information0
SWIFT: Super-fast and Robust Privacy-Preserving Machine Learning0
Comparative Benchmarking of Failure Detection Methods in Medical Image Segmentation: Unveiling the Role of Confidence Aggregation0
ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments0
Show:102550
← PrevPage 102 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified