SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2861–2870 of 5548 papers

Title	Date	Tasks	Status	Hype
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning	Oct 15, 2023	BenchmarkingSpatial Reasoning	—Unverified	0
Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems	Oct 14, 2023	Benchmarking	CodeCode Available	0
Benchmarking the Sim-to-Real Gap in Cloth Manipulation	Oct 14, 2023	BenchmarkingMuJoCo	—Unverified	0
Mirage: Model-Agnostic Graph Distillation for Graph Classification	Oct 14, 2023	BenchmarkingClassification	CodeCode Available	0
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters	Oct 13, 2023	BenchmarkingFairness	CodeCode Available	1
pose-format: Library for Viewing, Augmenting, and Handling .pose Files	Oct 13, 2023	BenchmarkingManagement	CodeCode Available	1
BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models for Sentiment Analysis of Bangla Social Media Posts	Oct 13, 2023	BenchmarkingSentiment Analysis	CodeCode Available	0
Welfare Diplomacy: Benchmarking Language Model Cooperation	Oct 13, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1
MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning	Oct 12, 2023	Benchmarking	CodeCode Available	1
GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution Shifts	Oct 12, 2023	Benchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 287 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified