SOTAVerified

Benchmarking

Papers

Showing 261270 of 5548 papers

TitleStatusHype
PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your FingertipsCode2
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil EngineeringCode2
LLM-Based Multi-Agent Systems are Scalable Graph Generative ModelsCode2
AutoPenBench: Benchmarking Generative Agents for Penetration TestingCode2
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and ThoroughlyCode2
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous DrivingCode2
State-specific protein-ligand complex structure prediction with a multi-scale deep generative modelCode2
Event-Based Motion MagnificationCode2
GSCodec Studio: A Modular Framework for Gaussian Splat CompressionCode2
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion TransferCode2
Show:102550
← PrevPage 27 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified