SOTAVerified

Benchmarking

Papers

Showing 126150 of 5548 papers

TitleStatusHype
Matbench Discovery -- A framework to evaluate machine learning crystal stability predictionsCode3
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot LearningCode3
TorchBench: Benchmarking PyTorch with High API Surface CoverageCode3
Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+Code3
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement LearningCode3
AER: Auto-Encoder with Regression for Time Series Anomaly DetectionCode3
CORL: Research-oriented Deep Offline Reinforcement Learning LibraryCode3
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP TasksCode3
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge GraphsCode3
CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning AlgorithmsCode3
Personalized Benchmarking with the Ludwig Benchmarking ToolkitCode3
Benchmarking Multimodal AutoML for Tabular Data with Text FieldsCode3
A Survey on Performance Metrics for Object-Detection AlgorithmsCode3
Benchmarking Automatic Machine Learning FrameworksCode3
mlpack 3: a fast, flexible machine learning libraryCode3
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil EngineeringCode2
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph LearningCode2
PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket ConditioningCode2
TAB: Unified Benchmarking of Time Series Anomaly Detection MethodsCode2
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation ModelsCode2
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security TasksCode2
SDialog: A Python Toolkit for Synthetic Dialogue Generation and AnalysisCode2
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic EnvironmentsCode2
MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K CategoriesCode2
GSCodec Studio: A Modular Framework for Gaussian Splat CompressionCode2
Show:102550
← PrevPage 6 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified