SOTAVerified

Benchmarking

Papers

Showing 27512800 of 5548 papers

TitleStatusHype
Context-guided Triple Matching for Multiple Choice Question Answering0
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy0
Exploring the Practicality of Generative Retrieval on Dynamic Corpora0
Continuous Function Structured in Multilayer Perceptron for Global Optimization0
Continuous-Time Gaussian Process Motion-Compensation for Event-vision Pattern Tracking with Distance Fields0
Continuous U-Net: Faster, Greater and Noiseless0
Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation0
Contribution à l'Optimisation d'un Comportement Collectif pour un Groupe de Robots Autonomes0
Contributions of the Petabyte Scale Sequence Search Codeathon toward efforts to scale sequence-based searches on SRA0
ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation0
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments0
Convolutional and Deep Learning based techniques for Time Series Ordinal Classification0
COPA: Comparing the Incomparable to Explore the Pareto Front0
CORE: A Knowledge Graph Entity Type Prediction Method via Complex Space Regression and Embedding0
CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks0
Cornac: A Comparative Framework for Multimodal Recommender Systems0
COSET: A Benchmark for Evaluating Neural Program Embeddings0
CoSy: Evaluating Textual Explanations of Neurons0
Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies0
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts0
Coupling volume-excluding compartment-based models of diffusion at different scales: Voronoi and pseudo-compartment approaches0
Covariance Matrix Adaptation Evolution Strategy Assisted by Principal Component Analysis0
Creating a Data Collection for Evaluating Rich Speech Retrieval0
CRF-based Single-stage Acoustic Modeling with CTC Topology0
CroCoDL: Cross-device Collaborative Dataset for Localization0
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models0
CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code Models0
Cross-functional transferability in universal machine learning interatomic potentials0
crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 20230
Cross-Model Image Annotation Platform with Active Learning0
Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability0
Cross-replication Reliability - An Empirical Approach to Interpreting Inter-rater Reliability0
Cross-subject Brain Functional Connectivity Analysis for Multi-task Cognitive State Evaluation0
Cross-Subject Deep Transfer Models for Evoked Potentials in Brain-Computer Interface0
CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization0
CRS Arena: Crowdsourced Benchmarking of Conversational Recommender Systems0
CSPO: Cross-Market Synergistic Stock Price Movement Forecasting with Pseudo-volatility Optimization0
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories0
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models0
CUB: Benchmarking Context Utilisation Techniques for Language Models0
CubeSat-Enabled Free-Space Optics: Joint Data Communication and Fine Beam Tracking0
CULEMO: Cultural Lenses on Emotion -- Benchmarking LLMs for Cross-Cultural Emotion Understanding0
Curb Your Carbon Emissions: Benchmarking Carbon Emissions in Machine Translation0
CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models0
Curriculum in Gradient-Based Meta-Reinforcement Learning0
Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence0
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset0
CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx0
DACOS-A Manually Annotated Dataset of Code Smells0
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles0
Show:102550
← PrevPage 56 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified