SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3411–3420 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework	Jun 9, 2025	BenchmarkingFairness	—Unverified	0	0
Benchmarking Middle-Trained Language Models for Neural Search	Jun 5, 2023	BenchmarkingLanguage Modeling	—Unverified	0	0
Logically at Factify 2: A Multi-Modal Fact Checking System Based on Evidence Retrieval techniques and Transformer Encoder Architecture	Jan 9, 2023	AvgBenchmarking	—Unverified	0	0
Logically at Factify 2022: Multimodal Fact Verification	Dec 16, 2021	BenchmarkingFact Checking	—Unverified	0	0
Toward an ImageNet Library of Functions for Global Optimization Benchmarking	Jun 27, 2022	Benchmarkingglobal-optimization	—Unverified	0	0
Benchmarking Meta-heuristic Optimization	Jul 27, 2020	BenchmarkingEvolutionary Algorithms	—Unverified	0	0
Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models	Jun 25, 2024	Benchmarking	—Unverified	0	0
Toward end-to-end interpretable convolutional neural networks for waveform signals	May 3, 2024	BenchmarkingEmotion Recognition	—Unverified	0	0
Benchmarking MedMNIST dataset on real quantum hardware	Feb 18, 2025	Benchmarkingimage-classification	—Unverified	0	0
Benchmarking Machine Translated Sentiment Analysis for Arabic Tweets	Jun 1, 2015	BenchmarkingMachine Translation	—Unverified	0	0

Show:10 25 50

← PrevPage 342 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified