SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 391–400 of 5548 papers

Title	Date	Tasks	Status	Hype
Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations	Jun 9, 2022	Benchmarkingcontinuous-control	CodeCode Available	2
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act	Oct 10, 2024	BenchmarkingFairness	CodeCode Available	2
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents	Jan 18, 2024	Benchmarking	CodeCode Available	2
Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions	Jan 28, 2022	3D Point Cloud Classification3D Point Cloud Data Augmentation	CodeCode Available	2
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)	Jan 14, 2023	Benchmarking	CodeCode Available	2
Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond	Oct 9, 2024	Benchmarking	CodeCode Available	2
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised Learning	Jan 15, 2021	BenchmarkingMisinformation	CodeCode Available	1
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels	Jan 30, 2024	Benchmarkingimage-classification	CodeCode Available	1
RADAR: Benchmarking Language Models on Imperfect Tabular Data	Jun 9, 2025	BenchmarkingMissing Values	CodeCode Available	1
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK	Aug 8, 2023	BenchmarkingGPU	CodeCode Available	1

Show:10 25 50

← PrevPage 40 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified