SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1691–1700 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering	May 21, 2025	BenchmarkingLanguage Modeling	CodeCode Available	0	5
Large-scale Ridesharing DARP Instances Based on Real Travel Demand	May 30, 2023	Benchmarking	CodeCode Available	0	5
IPC: A Benchmark Data Set for Learning with Graph-Structured Data	May 15, 2019	BenchmarkingGraph Classification	CodeCode Available	0	5
ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical Images	Oct 22, 2024	BenchmarkingSelf-Supervised Learning	CodeCode Available	0	5
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning	Jan 8, 2025	Benchmarking	CodeCode Available	0	5
A Benchmarking Study of Vision-based Robotic Grasping Algorithms	Mar 14, 2025	BenchmarkingRobotic Grasping	CodeCode Available	0	5
IoT Data Trust Evaluation via Machine Learning	Aug 15, 2023	BenchmarkingTime Series	CodeCode Available	0	5
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments	Apr 16, 2025	BenchmarkingCausal Inference	CodeCode Available	0	5
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs	May 26, 2025	BenchmarkingFault localization	CodeCode Available	0	5
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series	Nov 21, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0	5

Show:10 25 50

← PrevPage 170 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified