SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 421–430 of 5548 papers

Title	Date	Tasks	Status	Hype
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests	May 15, 2025	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M	May 15, 2025	BenchmarkingMemorization	CodeCode Available	0
Visual Fidelity Index for Generative Semantic Communications with Critical Information Embedding	May 15, 2025	BenchmarkingSemantic Communication	—Unverified	0
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly	May 15, 2025	8kBenchmarking	CodeCode Available	2
What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs	May 15, 2025	AllBenchmarking	—Unverified	0
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language	May 15, 2025	BenchmarkingOptical Character Recognition	CodeCode Available	0
Real-World fNIRS-Based Brain-Computer Interfaces: Benchmarking Deep Learning and Classical Models in Interactive Gaming	May 15, 2025	BenchmarkingData Augmentation	—Unverified	0
Towards scalable surrogate models based on Neural Fields for large scale aerodynamic simulations	May 14, 2025	Benchmarking	CodeCode Available	1
TARGET: Benchmarking Table Retrieval for Generative Tasks	May 14, 2025	BenchmarkingRepresentation Learning	—Unverified	0
KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning	May 14, 2025	BenchmarkingMMLU	—Unverified	0

Show:10 25 50

← PrevPage 43 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified