SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1301–1310 of 5548 papers

Title	Date	Tasks	Status	Hype
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series	Nov 21, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0
Multi-Agent Environments for Vehicle Routing Problems	Nov 21, 2024	Benchmarkingreinforcement-learning	CodeCode Available	1
Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling	Nov 21, 2024	ArticlesBenchmarking	CodeCode Available	0
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking	Nov 20, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Delta-Influence: Unlearning Poisons via Influence Functions	Nov 20, 2024	AttributeBenchmarking	CodeCode Available	0
Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver	Nov 20, 2024	Benchmarking	—Unverified	0
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models	Nov 20, 2024	BenchmarkingImage Generation	CodeCode Available	5
BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation	Nov 20, 2024	BenchmarkingPoint Cloud Segmentation	—Unverified	0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games	Nov 20, 2024	BenchmarkingNetHack	—Unverified	0
The Moral Mind(s) of Large Language Models	Nov 19, 2024	BenchmarkingDecision Making	—Unverified	0

Show:10 25 50

← PrevPage 131 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified