SOTAVerified

Benchmarking

Papers

Showing 251260 of 5548 papers

TitleStatusHype
A Content-Driven Micro-Video Recommendation Dataset at ScaleCode2
LLM-Based Multi-Agent Systems are Scalable Graph Generative ModelsCode2
EffiBench: Benchmarking the Efficiency of Automatically Generated CodeCode2
nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation BenchmarkCode2
Authorship Obfuscation in Multilingual Machine-Generated Text DetectionCode2
DreamBench++: A Human-Aligned Benchmark for Personalized Image GenerationCode2
AutoPenBench: Benchmarking Generative Agents for Penetration TestingCode2
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil EngineeringCode2
Deep Visual Geo-localization BenchmarkCode2
PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your FingertipsCode2
Show:102550
← PrevPage 26 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified