SOTAVerified

Benchmarking

Papers

Showing 15611570 of 5548 papers

TitleStatusHype
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based AgentsCode3
A Real Benchmark Swell Noise Dataset for Performing Seismic Data Denoising via Deep Learning0
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations0
MONICA: Benchmarking on Long-tailed Medical Image ClassificationCode1
Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description0
OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation ModelsCode3
StringLLM: Understanding the String Processing Capability of Large Language ModelsCode1
ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving0
shapiq: Shapley Interactions for Machine LearningCode4
Deep Unlearn: Benchmarking Machine Unlearning0
Show:102550
← PrevPage 157 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified