SOTAVerified

Benchmarking

Papers

Showing 611620 of 5548 papers

TitleStatusHype
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and ObjectsCode1
LLM-Pilot: Characterize and Optimize Performance of your LLM Inference ServicesCode1
StringLLM: Understanding the String Processing Capability of Large Language ModelsCode1
MONICA: Benchmarking on Long-tailed Medical Image ClassificationCode1
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE FrameworkCode1
Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic AnalysisCode1
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement LearningCode1
MALPOLON: A Framework for Deep Species Distribution ModelingCode1
HazeSpace2M: A Dataset for Haze Aware Single Image DehazingCode1
RMCBench: Benchmarking Large Language Models' Resistance to Malicious CodeCode1
Show:102550
← PrevPage 62 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified