SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1461–1470 of 5548 papers

Title	Date	Tasks	Status	Hype
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models	Nov 27, 2024	BenchmarkingEarth Observation	CodeCode Available	1
OpenFWI: Large-Scale Multi-Structural Benchmark Datasets for Seismic Full Waveform Inversion	Nov 4, 2021	2kBenchmarking	CodeCode Available	1
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks	Feb 4, 2023	Adversarial AttackAdversarial Robustness	CodeCode Available	1
Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages	Mar 11, 2024	BenchmarkingData Augmentation	CodeCode Available	1
Benchmarking Graph Neural Networks on Dynamic Link Prediction	Sep 29, 2021	BenchmarkingDynamic Link Prediction	CodeCode Available	1
Benchmarking Graph Neural Networks for FMRI analysis	Nov 16, 2022	Benchmarking	CodeCode Available	1
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models	Jan 1, 2024	Benchmarking	CodeCode Available	1
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models	Jul 16, 2024	BenchmarkingCode Generation	CodeCode Available	1
Large Scale MRI Collection and Segmentation of Cirrhotic Liver	Oct 6, 2024	BenchmarkingDiagnostic	CodeCode Available	1
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health Counseling	Jun 10, 2025	Benchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 147 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified