SOTAVerified

Benchmarking

Papers

Showing 35263550 of 5548 papers

TitleStatusHype
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks0
Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical ImagesCode0
Identification of vortex in unstructured mesh with graph neural networks0
SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification0
Prompt Sketching for Large Language Models0
An efficiency analysis of Spanish airports0
A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR PredictionCode0
DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing UnderstandingCode0
Benchmarking Deep Facial Expression Recognition: An Extensive Protocol with Balanced Dataset in the Wild0
Benchmarking Differential Evolution on a Quantum Simulator0
Exploitation-Guided Exploration for Semantic Embodied Navigation0
Benchmarking a Benchmark: How Reliable is MS-COCO?0
Learning Disentangled Speech Representations0
Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information RetrievalCode0
Grounded Intuition of GPT-Vision's Abilities with Scientific ImagesCode0
An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction0
Investigating Deep-Learning NLP for Automating the Extraction of Oncology Efficacy Endpoints from Scientific Literature0
Use of Deep Neural Networks for Uncertain Stress Functions with Extensions to Impact Mechanics0
Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in IndonesiaCode0
Decentralized Federated Learning on the Edge over Wireless Mesh Networks0
Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs0
SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization0
A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain0
Next-generation MRD assays: do we have the tools to evaluate them properly?0
UAV Immersive Video Streaming: A Comprehensive Survey, Benchmarking, and Open Challenges0
Show:102550
← PrevPage 142 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified