SOTAVerified

Benchmarking

Papers

Showing 51015150 of 5548 papers

TitleStatusHype
SydneyScapes: Image Segmentation for Australian Environments0
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts0
Domain Aligned CLIP for Few-shot Classification0
ALOJA: A Framework for Benchmarking and Predictive Analytics in Big Data Deployments0
Domain Generalization in Computational Pathology: Survey and Guidelines0
Comparative Benchmarking of Causal Discovery Techniques0
Comparative Analysis of Packages and Algorithms for the Analysis of Spatially Resolved Transcriptomics Data0
Comparative analysis of neural network architectures for short-term FOREX forecasting0
Don't stack layers in graph neural networks, wire them randomly0
Commute Graph Neural Networks0
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning0
Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories0
Downsampling and geometric feature methods for EEG classification tasks with CNNs0
Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration0
Rethinking Coherence Modeling: Synthetic vs. Downstream Tasks0
On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates0
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning0
Syn3DWound: A Synthetic Dataset for 3D Wound Bed Analysis0
DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images0
Coherent Feed Forward Quantum Neural Network0
Cognitive Model Priors for Predicting Human Decisions0
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models0
Drift in a Popular Metal Oxide Sensor Dataset Reveals Limitations for Gas Classification Benchmarks0
DRIV100: In-The-Wild Multi-Domain Dataset and Evaluation for Real-World Domain Adaptation of Semantic Segmentation0
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks0
SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data0
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings0
DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift0
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations0
Dual Encoder-Decoder based Generative Adversarial Networks for Disentangled Facial Representation Learning0
Dual Task Framework for Improving Persona-grounded Dialogue Dataset0
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance0
Synplex: A synthetic simulator of highly multiplexed histological images0
Syntactically Aware Neural Architectures for Definition Extraction0
DyFEn: Agent-Based Fee Setting in Payment Channel Networks0
Syntax Encoding with Application in Authorship Attribution0
Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking0
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking0
Dynabench: Rethinking Benchmarking in NLP0
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking0
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis0
CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems0
Dynamic benchmarking framework for LLM-based conversational data capture0
Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views0
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination0
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence0
Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets0
Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning0
Vision-Based Power Line Cables and Pylons Detection for Low Flying Aircraft0
Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures0
Show:102550
← PrevPage 103 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified