SOTAVerified

Benchmarking

Papers

Showing 141150 of 5548 papers

TitleStatusHype
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil EngineeringCode2
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph LearningCode2
PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket ConditioningCode2
TAB: Unified Benchmarking of Time Series Anomaly Detection MethodsCode2
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation ModelsCode2
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security TasksCode2
SDialog: A Python Toolkit for Synthetic Dialogue Generation and AnalysisCode2
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic EnvironmentsCode2
MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K CategoriesCode2
GSCodec Studio: A Modular Framework for Gaussian Splat CompressionCode2
Show:102550
← PrevPage 15 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified