SOTAVerified

Benchmarking

Papers

Showing 10511075 of 5548 papers

TitleStatusHype
An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version0
FORLAPS: An Innovative Data-Driven Reinforcement Learning Approach for Prescriptive Process Monitoring0
ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and AssistanceCode0
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPUCode0
Village-Net Clustering: A Rapid approach to Non-linear Unsupervised Clustering of High-Dimensional Data0
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape EstimationCode5
Off-policy Evaluation for Payments at Adyen0
Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging0
Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction0
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents0
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval0
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of MindCode1
Evaluating SAT and SMT Solvers on Large-Scale Sudoku PuzzlesCode0
Multimodal LLMs Can Reason about Aesthetics in Zero-ShotCode1
Keras Sig: Efficient Path Signature Computation on GPU in Keras 30
Benchmarking Classical, Deep, and Generative Models for Human Activity Recognition0
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion ModelsCode4
Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving0
Benchmarking Multimodal Models for Fine-Grained Image Analysis: A Comparative Study Across Diverse Visual Features0
Data-driven inventory management for new products: An adjusted Dyna-Q approach with transfer learning0
Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings0
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series ClassificationCode0
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles0
Stronger Than You Think: Benchmarking Weak Supervision on Realistic TasksCode0
WebWalker: Benchmarking LLMs in Web TraversalCode11
Show:102550
← PrevPage 43 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified