SOTAVerified

Benchmarking

Papers

Showing 30213030 of 5548 papers

TitleStatusHype
Implicit Causality-biases in humans and LLMs as a tool for benchmarking LLM discourse capabilities0
Benchmarking the Accuracy and Robustness of Feedback Alignment Algorithms0
Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels0
The Moral Mind(s) of Large Language Models0
Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices0
Ward: Provable RAG Dataset Inference via LLM Watermarks0
The Multi-speaker Multi-style Voice Cloning Challenge 20210
PAWS-VMK: A Unified Approach To Semi-Supervised Learning And Out-of-Distribution Detection0
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation0
The Neural Painter: Multi-Turn Image Generation0
Show:102550
← PrevPage 303 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified