SOTAVerified

Benchmarking

Papers

Showing 34263450 of 5548 papers

TitleStatusHype
Benchmarking machine learning models for predicting aerofoil performance0
Benchmarking Machine Learning Models for Quantum Error Correction0
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models0
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage0
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning0
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking0
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers0
Benchmarking machine learning models for quantum state classification0
Towards a Benchmark for Scientific Understanding in Humans and Machines0
Benchmarking Machine Learning Methods for Distributed Acoustic Sensing0
Benchmarking Machine Learning: How Fast Can Your Algorithms Go?0
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym0
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors0
Low-Density 3D Point Cloud Classification0
Low Dynamic Range for RIS-aided Bistatic Integrated Sensing and Communication0
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French0
LSTM-based Whisper Detection0
Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives0
LucidDreaming: Controllable Object-Centric 3D Generation0
Benchmarking LLMs on the Semantic Overlap Summarization Task0
LUND-PROBE -- LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset0
Benchmarking LLMs in Recommendation Tasks: A Comparative Evaluation with Conventional Recommenders0
Towards a Human-Centred Cognitive Model of Visuospatial Complexity in Everyday Driving0
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data0
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes0
Show:102550
← PrevPage 138 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified