SOTAVerified

Benchmarking

Papers

Showing 27212730 of 5548 papers

TitleStatusHype
SEED-Bench-2: Benchmarking Multimodal Large Language ModelsCode2
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers0
Riemannian Self-Attention Mechanism for SPD Networks0
FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections0
Comprehensive Benchmarking of Entropy and Margin Based Scoring Metrics for Data Selection0
Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice0
Experimental Analysis of Large-scale Learnable Vector Storage CompressionCode0
Syn3DWound: A Synthetic Dataset for 3D Wound Bed Analysis0
Benchmarking Large Language Model Volatility0
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained GenerationCode1
Show:102550
← PrevPage 273 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified