SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2901–2910 of 5548 papers

Title	Date	Tasks	Status	Hype
GNUMAP: A Parameter-Free Approach to Unsupervised Dimensionality Reduction via Graph Neural Networks	Jul 30, 2024	BenchmarkingContrastive Learning	—Unverified	0
Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images	Jul 30, 2024	BenchmarkingMultiple Instance Learning	—Unverified	0
Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning	Jul 29, 2024	Anomaly DetectionBenchmarking	—Unverified	0
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks	Jul 29, 2024	BenchmarkingLanguage Model Evaluation	—Unverified	0
On the Evaluation Consistency of Attribution-based Explanations	Jul 28, 2024	Benchmarking	CodeCode Available	0
Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection	Jul 28, 2024	BenchmarkingFake News Detection	—Unverified	0
Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical Imaging	Jul 26, 2024	Benchmarking	CodeCode Available	0
Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems	Jul 26, 2024	Benchmarking	—Unverified	0
GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy	Jul 25, 2024	Benchmarking	—Unverified	0
SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images	Jul 25, 2024	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 291 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified