SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4951–4960 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking of Query Strategies: Towards Future Deep Active Learning	Dec 10, 2023	Active LearningBenchmarking	CodeCode Available	0
Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing Flows	Mar 13, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0
A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks	Mar 15, 2019	BenchmarkingCitation Recommendation	CodeCode Available	0
Named Clinical Entity Recognition Benchmark	Oct 7, 2024	BenchmarkingDecoder	CodeCode Available	0
EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models	May 2, 2025	Benchmarking	CodeCode Available	0
Evaluating the Transferability of Machine-Learned Force Fields for Material Property Modeling	Jan 10, 2023	BenchmarkingGraph Neural Network	CodeCode Available	0
Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring	Feb 10, 2025	Benchmarking	CodeCode Available	0
Evaluating the Robustness of Deep Reinforcement Learning for Autonomous Policies in a Multi-agent Urban Driving Environment	Dec 22, 2021	Autonomous DrivingBenchmarking	CodeCode Available	0
Watts: Infrastructure for Open-Ended Learning	Apr 28, 2022	Benchmarking	CodeCode Available	0
Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks	Jul 2, 2024	Activity PredictionAnomaly Detection	CodeCode Available	0

Show:10 25 50

← PrevPage 496 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified