SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3771–3780 of 5548 papers

Title	Date	Tasks	Status	Hype
TASKOGRAPHY: Evaluating robot task planning over large 3D scene graphs	Jul 11, 2022	BenchmarkingRepresentation Learning	CodeCode Available	1
Graph Generative Model for Benchmarking Graph Neural Networks	Jul 10, 2022	BenchmarkingGraph Generation	CodeCode Available	1
A novel evaluation methodology for supervised Feature Ranking algorithms	Jul 9, 2022	BenchmarkingFeature Importance	CodeCode Available	0
Ensemble random forest filter: An alternative to the ensemble Kalman filter for inverse modeling	Jul 8, 2022	Benchmarking	—Unverified	0
OVQA: A Clinically Generated Visual Question Answering Dataset	Jul 7, 2022	BenchmarkingMedical Visual Question Answering	—Unverified	0
VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning	Jul 7, 2022	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	2
Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems	Jul 7, 2022	Benchmarking	—Unverified	0
Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding	Jul 4, 2022	BenchmarkingDocument Ranking	CodeCode Available	2
Identifying the Context Shift between Test Benchmarks and Production Data	Jul 3, 2022	BenchmarkingBIG-bench Machine Learning	—Unverified	0
Can Language Models Make Fun? A Case Study in Chinese Comical Crosstalk	Jul 2, 2022	BenchmarkingMachine Translation	CodeCode Available	1

Show:10 25 50

← PrevPage 378 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified