SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2821–2830 of 5548 papers

Title	Date	Tasks	Status	Hype
Prediction Accuracy & Reliability: Classification and Object Localization under Distribution Shift	Sep 5, 2024	Autonomous DrivingBenchmarking	—Unverified	0
Benchmarking Spurious Bias in Few-Shot Image Classifiers	Sep 4, 2024	AttributeBenchmarking	CodeCode Available	0
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation	Sep 4, 2024	Benchmarking	—Unverified	0
NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks	Sep 4, 2024	Anomaly DetectionBenchmarking	—Unverified	0
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision	Sep 3, 2024	BenchmarkingMixed Reality	—Unverified	0
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study	Sep 3, 2024	BenchmarkingHallucination	CodeCode Available	0
Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture	Sep 3, 2024	BenchmarkingRAG	—Unverified	0
From Grounding to Planning: Benchmarking Bottlenecks in Web Agents	Sep 3, 2024	Benchmarking	—Unverified	0
Revisiting Safe Exploration in Safe Reinforcement learning	Sep 2, 2024	Benchmarkingreinforcement-learning	—Unverified	0
Landscape-Aware Automated Algorithm Configuration using Multi-output Mixed Regression and Classification	Sep 2, 2024	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 283 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified