SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2251–2260 of 5548 papers

Title	Date	Tasks	Status	Hype
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection	Apr 25, 2024	Benchmarkingobject-detection	CodeCode Available	1
Benchmarking Mobile Device Control Agents across Diverse Configurations	Apr 25, 2024	BenchmarkingImitation Learning	—Unverified	0
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension	Apr 25, 2024	BenchmarkingMultiple-choice	CodeCode Available	3
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees	Apr 24, 2024	BenchmarkingMolecular Property Prediction	CodeCode Available	0
SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data	Apr 24, 2024	BenchmarkingFairness	CodeCode Available	1
Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler	Apr 24, 2024	Benchmarking	—Unverified	0
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction	Apr 24, 2024	AttributeAttribute Value Extraction	CodeCode Available	1
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning	Apr 24, 2024	Benchmarkingreinforcement-learning	—Unverified	0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification	Apr 23, 2024	BenchmarkingHyperspectral Image Classification	CodeCode Available	0
The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking	Apr 22, 2024	BenchmarkingMisinformation	—Unverified	0

Show:10 25 50

← PrevPage 226 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified