SOTAVerified

Benchmarking

Papers

Showing 52515275 of 5548 papers

TitleStatusHype
PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object UnderstandingCode0
CVC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language ModelsCode0
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022Code0
PATCH! Psychometrics-AssisTed BenCHmarking of Large Language Models against Human Populations: A Case Study of Proficiency in 8th Grade MathematicsCode0
Aggregated Attributions for Explanatory Analysis of 3D Segmentation ModelsCode0
A Position Paper on the Automatic Generation of Machine Learning LeaderboardsCode0
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series ClassificationCode0
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey beesCode0
PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image DatasetCode0
Attribution of Predictive Uncertainties in Classification ModelsCode0
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in GraphsCode0
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)Code0
Towards Objectively Benchmarking Social Intelligence for Language Agents at Action LevelCode0
Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QACode0
Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor DiversityCode0
CuRe: Cultural Gaps in the Long Tail of Text-to-Image SystemsCode0
PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language ModelsCode0
CURATe: Benchmarking Personalised Alignment of Conversational AI AssistantsCode0
CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head RedirectionCode0
Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise LevelsCode0
Ants can orienteer a thief in their robberyCode0
3DOS: Towards 3D Open Set Learning -- Benchmarking and Understanding Semantic Novelty Detection on Point CloudsCode0
Benchmarking Generative Latent Variable Models for SpeechCode0
Benchmarking Generative AI Models for Deep Learning Test Input GenerationCode0
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video GroundingCode0
Show:102550
← PrevPage 211 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified