SOTAVerified

Benchmarking

Papers

Showing 16911700 of 5548 papers

TitleStatusHype
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
Large-scale Ridesharing DARP Instances Based on Real Travel DemandCode0
IPC: A Benchmark Data Set for Learning with Graph-Structured DataCode0
ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical ImagesCode0
IOLBENCH: Benchmarking LLMs on Linguistic ReasoningCode0
A Benchmarking Study of Vision-based Robotic Grasping AlgorithmsCode0
IoT Data Trust Evaluation via Machine LearningCode0
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic EnvironmentsCode0
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel BugsCode0
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time SeriesCode0
Show:102550
← PrevPage 170 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified