SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 911–920 of 5548 papers

Title	Date	Tasks	Status	Hype
Multilingual European Language Models: Benchmarking Approaches and Challenges	Feb 18, 2025	BenchmarkingQuestion Answering	—Unverified	0
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models	Feb 18, 2025	BenchmarkingLarge Language Model	—Unverified	0
A deep learning framework for efficient pathology image analysis	Feb 18, 2025	BenchmarkingDeep Learning	CodeCode Available	4
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics	Feb 18, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation	Feb 18, 2025	Benchmarking	—Unverified	0
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation	Feb 18, 2025	BenchmarkingText Generation	—Unverified	0
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?	Feb 18, 2025	BenchmarkingBlocking	CodeCode Available	1
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis	Feb 18, 2025	BenchmarkingMamba	CodeCode Available	0
EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking	Feb 18, 2025	BenchmarkingBinary Classification	—Unverified	0
Benchmarking MedMNIST dataset on real quantum hardware	Feb 18, 2025	Benchmarkingimage-classification	—Unverified	0

Show:10 25 50

← PrevPage 92 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified