SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4981–4990 of 5548 papers

Title	Date	Tasks	Status	Hype
Evaluating AI Recruitment Sourcing Tools by Human Preference	Apr 3, 2025	Benchmarking	CodeCode Available	0
EvalAI: Towards Better Evaluation Systems for AI Agents	Feb 10, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available	0
Essential guidelines for computational method benchmarking	Dec 3, 2018	Benchmarking	CodeCode Available	0
Benchmarking of LSTM Networks	Aug 11, 2015	Benchmarking	CodeCode Available	0
NerveNet: Learning Structured Policy with Graph Neural Networks	Jan 1, 2018	Benchmarkingcontinuous-control	CodeCode Available	0
How Fragile is Relation Extraction under Entity Replacements?	May 22, 2023	BenchmarkingCausal Inference	CodeCode Available	0
Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress?	Feb 25, 2020	BenchmarkingLink Prediction	CodeCode Available	0
Sequence-Aware Recommender Systems	Feb 23, 2018	BenchmarkingMatrix Completion	CodeCode Available	0
WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification, detection, and segmentation	Aug 22, 2024	BenchmarkingClassification	CodeCode Available	0
Enterprise Benchmarks for Large Language Model Evaluation	Oct 11, 2024	BenchmarkingLanguage Model Evaluation	CodeCode Available	0

Show:10 25 50

← PrevPage 499 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified