SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2071–2080 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest	Dec 20, 2023	BenchmarkingIn-Context Learning	—Unverified	0
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking	Apr 29, 2025	BenchmarkingIntrusion Detection	—Unverified	0
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation	Feb 10, 2025	Benchmarking	—Unverified	0
Can we hop in general? A discussion of benchmark selection and design using the Hopper environment	Oct 11, 2024	BenchmarkingReinforcement Learning (RL)	—Unverified	0
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs	Feb 16, 2025	Benchmarking	—Unverified	0
Benchmarking and Analyzing Generative Data for Visual Recognition	Jul 25, 2023	BenchmarkingRetrieval	—Unverified	0
A dataset for benchmarking vision-based localization at intersections	Nov 4, 2018	Benchmarking	—Unverified	0
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning	Oct 15, 2023	BenchmarkingSpatial Reasoning	—Unverified	0
Can time series forecasting be automated? A benchmark and analysis	Jul 23, 2024	BenchmarkingDecision Making	—Unverified	0
Can Machines “Learn” Halide Perovskite Crystal Formation without Accurate Physicochemical Features?	May 26, 2020	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 208 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified