SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1861–1870 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs	May 12, 2025	BenchmarkingDocument Layout Analysis	—Unverified	0
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering	May 11, 2025	BenchmarkingGeneral Knowledge	CodeCode Available	0
Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration	May 11, 2025	BenchmarkingDescriptive	—Unverified	0
Optimizing Recommendations using Fine-Tuned LLMs	May 11, 2025	BenchmarkingRecommendation Systems	—Unverified	0
Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy	May 9, 2025	BenchmarkingSentiment Analysis	—Unverified	0
Contributions of the Petabyte Scale Sequence Search Codeathon toward efforts to scale sequence-based searches on SRA	May 9, 2025	Benchmarkingscientific discovery	—Unverified	0
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information	May 9, 2025	BenchmarkingForm	—Unverified	0
Federated Deconfounding and Debiasing Learning for Out-of-Distribution Generalization	May 8, 2025	AttributeBenchmarking	—Unverified	0
Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective	May 8, 2025	Active LearningBenchmarking	CodeCode Available	0
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations	May 8, 2025	BenchmarkingTask-Oriented Dialogue Systems	—Unverified	0

Show:10 25 50

← PrevPage 187 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified