SOTAVerified

Benchmarking

Papers

Showing 12311240 of 5548 papers

TitleStatusHype
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
GAMA: a General Automated Machine learning AssistantCode1
GCondenser: Benchmarking Graph CondensationCode1
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
Benchmarking Language Model Creativity: A Case Study on Code GenerationCode1
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIsCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation ModelsCode1
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMMCode1
CoDEx: A Comprehensive Knowledge Graph Completion BenchmarkCode1
Show:102550
← PrevPage 124 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified