SOTAVerified

Benchmarking

Papers

Showing 211220 of 5548 papers

TitleStatusHype
Fast Vision Transformers with HiLo AttentionCode2
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible GuidanceCode2
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis GenerationCode2
EffiBench: Benchmarking the Efficiency of Automatically Generated CodeCode2
A large annotated medical image dataset for the development and evaluation of segmentation algorithmsCode2
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual EditingCode2
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail ModelsCode2
InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph PriorCode2
Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM InteractionsCode2
LLM-Based Multi-Agent Systems are Scalable Graph Generative ModelsCode2
Show:102550
← PrevPage 22 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified