SOTAVerified

Benchmarking

Papers

Showing 911920 of 5548 papers

TitleStatusHype
Multilingual European Language Models: Benchmarking Approaches and Challenges0
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models0
A deep learning framework for efficient pathology image analysisCode4
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics0
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation0
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation0
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?Code1
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative AnalysisCode0
EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking0
Benchmarking MedMNIST dataset on real quantum hardware0
Show:102550
← PrevPage 92 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified