SOTAVerified

Benchmarking

Papers

Showing 48514900 of 5548 papers

TitleStatusHype
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models0
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models0
CUB: Benchmarking Context Utilisation Techniques for Language Models0
CubeSat-Enabled Free-Space Optics: Joint Data Communication and Fine Beam Tracking0
A Multi-View High-Resolution Foot-Ankle Complex Point Cloud Dataset During Gait for Occlusion-Robust 3D Completion0
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts0
CULEMO: Cultural Lenses on Emotion -- Benchmarking LLMs for Cross-Cultural Emotion Understanding0
Stochastic Spiking Neural Networks with First-to-Spike Coding0
Curb Your Carbon Emissions: Benchmarking Carbon Emissions in Machine Translation0
CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models0
VIPPrint: A Large Scale Dataset of Printed and Scanned Images for Synthetic Face Images Detection and Source Linking0
Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies0
Curriculum in Gradient-Based Meta-Reinforcement Learning0
Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence0
A Multi-Task Deep Learning Approach for Sensor-based Human Activity Recognition and Segmentation0
CoSy: Evaluating Textual Explanations of Neurons0
Stratify: Unifying Multi-Step Forecasting Strategies0
A Multisensory Learning Architecture for Rotation-invariant Object Recognition0
A Multi-rater Comparative Study of Automatic Target Localization Methods for Epilepsy Deep Brain Stimulation Procedures0
Large Language Model-Based Benchmarking Experiment Settings for Evolutionary Multi-Objective Optimization0
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset0
COSET: A Benchmark for Evaluating Neural Program Embeddings0
CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx0
Cornac: A Comparative Framework for Multimodal Recommender Systems0
CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks0
DACOS-A Manually Annotated Dataset of Code Smells0
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles0
DailyQA: A Benchmark to Evaluate Web Retrieval Augmented LLMs Based on Capturing Real-World Changes0
CORE: A Knowledge Graph Entity Type Prediction Method via Complex Space Regression and Embedding0
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization0
DarkBench: Benchmarking Dark Patterns in Large Language Models0
DASB -- Discrete Audio and Speech Benchmark0
Data Analysis in the Era of Generative AI0
Data and its (dis)contents: A survey of dataset development and use in machine learning research0
Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory0
Data Augmentation for Traffic Classification0
Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset0
Data-driven Approach for Static Hedging of Exchange Traded Options0
COPA: Comparing the Incomparable to Explore the Pareto Front0
Data-driven inventory management for new products: An adjusted Dyna-Q approach with transfer learning0
Data-driven Power Flow Linearization: Simulation0
Data-driven surrogate modelling and benchmarking for process equipment0
Data-Driven Target Localization: Benchmarking Gradient Descent Using the Cramer-Rao Bound0
A Multimodal, Full-Surround Vehicular Testbed for Naturalistic Studies and Benchmarking: Design, Calibration and Deployment0
Convolutional and Deep Learning based techniques for Time Series Ordinal Classification0
Data needs and challenges for quantum dot devices automation0
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments0
Multi-scale data reconstruction of turbulent rotating flows with Gappy POD, Extended POD and Generative Adversarial Networks0
Dataset and Benchmarking of Real-Time Embedded Object Detection for RoboCup SSL0
ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation0
Show:102550
← PrevPage 98 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified