SOTAVerified

Benchmarking

Papers

Showing 35263550 of 5548 papers

TitleStatusHype
Causally Testing Gender Bias in LLMs: A Case Study on Occupational BiasCode0
Distributed Software-Defined Network Architecture for Smart Grid Resilience to Denial-of-Service Attacks0
AI applications in forest monitoring need remote sensing benchmark datasets0
Benchmarking person re-identification datasets and approaches for practical real-world implementationsCode0
A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial AttacksCode1
AnyTOD: A Programmable Task-Oriented Dialog System0
Benchmarking Spatial Relationships in Text-to-Image GenerationCode1
Trial-Based Dominance Enables Non-Parametric Tests to Compare both the Speed and Accuracy of Stochastic Optimizers0
GiCCS: A German in-Context Conversational Similarity Benchmark0
Biomedical image analysis competitions: The state of current participation practice0
Automatic vehicle trajectory data reconstruction at scale0
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
Benchmarking Large Language Models for Automated Verilog RTL Code GenerationCode1
Mind the Retrosynthesis Gap: Bridging the divide between Single-step and Multi-step Retrosynthesis Prediction0
PyPop7: A Pure-Python Library for Population-Based Black-Box OptimizationCode2
On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch BaselineCode1
Momentum Contrastive Pre-training for Question Answering0
Progressive Multi-view Human Mesh Recovery with Self-Supervision0
Ego-Body Pose Estimation via Ego-Head Pose EstimationCode1
On Distribution Grid Optimal Power Flow Development and Integration0
Benchmarking Self-Supervised Learning on Diverse Pathology DatasetsCode1
Is Bio-Inspired Learning Better than Backprop? Benchmarking Bio Learning vs. Backprop0
Model-based trajectory stitching for improved behavioural cloning and its applications0
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
An open unified deep graph learning framework for discovering drug leadsCode0
Show:102550
← PrevPage 142 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified