SOTAVerified

Benchmarking

Papers

Showing 30263050 of 5548 papers

TitleStatusHype
Ward: Provable RAG Dataset Inference via LLM Watermarks0
The Multi-speaker Multi-style Voice Cloning Challenge 20210
PAWS-VMK: A Unified Approach To Semi-Supervised Learning And Out-of-Distribution Detection0
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation0
The Neural Painter: Multi-Turn Image Generation0
Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv100
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects0
A 28-nm Convolutional Neuromorphic Processor Enabling Online Learning with Spike-Based Retinas0
Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests0
Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation0
Benchmarking terminology building capabilities of ChatGPT on an English-Russian Fashion Corpus0
Improving Augmentation and Evaluation Schemes for Semantic Image Synthesis0
Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary0
Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model0
The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods0
Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation0
Improving Medical Image Classification with Label Noise Using Dual-uncertainty Estimation0
Improving Model Generalization: A Chinese Named Entity Recognition Case Study0
Improving Named Entity Linking Corpora Quality0
Improving plant disease classification by adaptive minimal ensembling0
The Paradox of Success in Evolutionary and Bioinspired Optimization: Revisiting Critical Issues, Key Studies, and Methodological Pathways0
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards0
Improving seasonal forecast using probabilistic deep learning0
The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of Scalable Graph Clustering0
Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework0
Show:102550
← PrevPage 122 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified