SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1201–1210 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Large Language Models for Automated Verilog RTL Code Generation	Dec 13, 2022	BenchmarkingCode Generation	CodeCode Available	1	5
ByzFL: Research Framework for Robust Federated Learning	May 30, 2025	BenchmarkingFederated Learning	CodeCode Available	1	5
GraphGallery: A Platform for Fast Benchmarking and Easy Development of Graph Neural Networks Based Intelligent Software	Feb 16, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking Object Detectors with COCO: A New Path Forward	Mar 27, 2024	BenchmarkingObject	CodeCode Available	1	5
A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless Systems	May 11, 2021	BenchmarkingEdge-computing	CodeCode Available	1	5
Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems	Jun 28, 2021	3D ReconstructionBenchmarking	CodeCode Available	1	5
Benchmarking Self-Supervised Learning on Diverse Pathology Datasets	Dec 9, 2022	BenchmarkingClassification	CodeCode Available	1	5
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs	Jun 22, 2023	Arithmetic ReasoningBenchmarking	CodeCode Available	1	5
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym	Dec 6, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
Graph Neural Network-Based Anomaly Detection for River Network Systems	Apr 19, 2023	Anomaly DetectionBenchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 121 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified