SOTAVerified

Benchmarking

Papers

Showing 27762800 of 5548 papers

TitleStatusHype
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models0
CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code Models0
Cross-functional transferability in universal machine learning interatomic potentials0
crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 20230
Cross-Model Image Annotation Platform with Active Learning0
Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability0
Cross-replication Reliability - An Empirical Approach to Interpreting Inter-rater Reliability0
Cross-subject Brain Functional Connectivity Analysis for Multi-task Cognitive State Evaluation0
Cross-Subject Deep Transfer Models for Evoked Potentials in Brain-Computer Interface0
CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization0
CRS Arena: Crowdsourced Benchmarking of Conversational Recommender Systems0
CSPO: Cross-Market Synergistic Stock Price Movement Forecasting with Pseudo-volatility Optimization0
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories0
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models0
CUB: Benchmarking Context Utilisation Techniques for Language Models0
CubeSat-Enabled Free-Space Optics: Joint Data Communication and Fine Beam Tracking0
CULEMO: Cultural Lenses on Emotion -- Benchmarking LLMs for Cross-Cultural Emotion Understanding0
Curb Your Carbon Emissions: Benchmarking Carbon Emissions in Machine Translation0
CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models0
Curriculum in Gradient-Based Meta-Reinforcement Learning0
Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence0
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset0
CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx0
DACOS-A Manually Annotated Dataset of Code Smells0
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles0
Show:102550
← PrevPage 112 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified