SOTAVerified

Benchmarking

Papers

Showing 10011010 of 5548 papers

TitleStatusHype
Needle In A Haystack, Fast: Benchmarking Image Perceptual Similarity Metrics At ScaleCode1
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
Neural Methods for Logical Reasoning Over Knowledge GraphsCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
ClearPose: Large-scale Transparent Object Dataset and BenchmarkCode1
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?Code1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable SummarizationCode1
A Comparison of Image Denoising MethodsCode1
Show:102550
← PrevPage 101 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified