SOTAVerified

Benchmarking

Papers

Showing 29012950 of 5548 papers

TitleStatusHype
DIG: A Turnkey Library for Diving into Graph Deep Learning Research0
DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation0
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models0
DiPCo -- Dinner Party Corpus0
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning0
Disability prediction in multiple sclerosis using performance outcome measures and demographic data0
Disambiguation in Conversational Question Answering in the Era of LLM: A Survey0
DISC: a Dataset for Integrated Sensing and Communication in mmWave Systems0
DISCOMAN: Dataset of Indoor SCenes for Odometry, Mapping And Navigation0
Discosuite - A parser test suite for German discontinuous structures0
Discovering Visual Concept Structure with Sparse and Incomplete Tags0
Discriminating modelling approaches for Point in Time Economic Scenario Generation0
Discriminative Link Prediction using Local Links, Node Features and Community Structure0
Disentangling coincident cell events using deep transfer learning and compressive sensing0
DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts0
DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction0
Distortion-adaptive Salient Object Detection in 360^ Omnidirectional Images0
Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization0
Distributed Software-Defined Network Architecture for Smart Grid Resilience to Denial-of-Service Attacks0
Distributed Training Large-Scale Deep Architectures0
Distribution-Based Invariant Deep Networks for Learning Meta-Features0
Sensitivity analysis and experimental evaluation of PID-like continuous sliding mode control0
Diverse Community Data for Benchmarking Data Privacy Algorithms0
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended)0
DLUE: Benchmarking Document Language Understanding0
DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs0
A Sober Look at the Robustness of CLIPs to Spurious Features0
Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields0
Does imputation matter? Benchmark for predictive models0
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts0
Domain Aligned CLIP for Few-shot Classification0
Domain Generalization in Computational Pathology: Survey and Guidelines0
Don't stack layers in graph neural networks, wire them randomly0
Downsampling and geometric feature methods for EEG classification tasks with CNNs0
On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates0
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning0
DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images0
Drift in a Popular Metal Oxide Sensor Dataset Reveals Limitations for Gas Classification Benchmarks0
DRIV100: In-The-Wild Multi-Domain Dataset and Evaluation for Real-World Domain Adaptation of Semantic Segmentation0
DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift0
Dual Encoder-Decoder based Generative Adversarial Networks for Disentangled Facial Representation Learning0
Dual Task Framework for Improving Persona-grounded Dialogue Dataset0
DyFEn: Agent-Based Fee Setting in Payment Channel Networks0
Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking0
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking0
Dynabench: Rethinking Benchmarking in NLP0
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking0
Dynamic benchmarking framework for LLM-based conversational data capture0
Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views0
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination0
Show:102550
← PrevPage 59 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified