SOTAVerified

Benchmarking

Papers

Showing 16111620 of 5548 papers

TitleStatusHype
Small Language Models: Survey, Measurements, and InsightsCode2
Building a continuous benchmarking ecosystem in bioinformatics0
Benchmarking Edge AI Platforms for High-Performance ML Inference0
Boosting Healthcare LLMs Through Retrieved ContextCode1
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment BenchmarkingCode0
Towards Ground-truth-free Evaluation of Any Segmentation in Medical ImagesCode0
AlphaZip: Neural Network-Enhanced Lossless Text CompressionCode0
RMCBench: Benchmarking Large Language Models' Resistance to Malicious CodeCode1
Margin-bounded Confidence Scores for Out-of-Distribution DetectionCode0
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data ImbalanceCode0
Show:102550
← PrevPage 162 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified