SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 831–840 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges	Oct 21, 2022	BenchmarkingCommunity Detection	CodeCode Available	1	5
Replication in Visual Diffusion Models: A Survey and Outlook	Jul 7, 2024	BenchmarkingSurvey	CodeCode Available	1	5
AIPerf: Automated machine learning as an AI-HPC benchmark	Aug 17, 2020	AutoMLBenchmarking	CodeCode Available	1	5
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection	Mar 12, 2025	BenchmarkingCode Classification	CodeCode Available	1	5
Benchmarking LLMs' Swarm intelligence	May 7, 2025	Benchmarking	CodeCode Available	1	5
IMGTB: A Framework for Machine-Generated Text Detection Benchmarking	Nov 21, 2023	BenchmarkingText Detection	CodeCode Available	1	5
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs	Apr 28, 2024	Benchmarking	CodeCode Available	1	5
Can 3D Vision-Language Models Truly Understand Natural Language?	Mar 21, 2024	BenchmarkingDiversity	CodeCode Available	1	5
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition	Sep 25, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1	5
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios	May 22, 2025	Benchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 84 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified