SOTAVerified

Benchmarking

Papers

Showing 27112720 of 5548 papers

TitleStatusHype
AlignBench: Benchmarking Chinese Alignment of Large Language ModelsCode2
TaskBench: Benchmarking Large Language Models for Task AutomationCode6
ROBBIE: Robust Bias Evaluation of Large Generative Language Models0
TransOpt: Transformer-based Representation Learning for Optimization Problem Classification0
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices0
Biomedical knowledge graph-optimized prompt generation for large language modelsCode2
SAIBench: A Structural Interpretation of AI for Science Through Benchmarks0
Enhancing Post-Hoc Explanation Benchmark Reliability for Image Classification0
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMsCode1
SEED-Bench-2: Benchmarking Multimodal Large Language ModelsCode2
Show:102550
← PrevPage 272 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified