SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2021–2030 of 5548 papers

Title	Date	Tasks	Status	Hype
Unleashing OpenTitan's Potential: a Silicon-Ready Embedded Secure Element for Root of Trust and Cryptographic Offloading	Jun 17, 2024	Autonomous VehiclesBenchmarking	—Unverified	0
InternalInspector I^2: Robust Confidence Estimation in LLMs through Internal States	Jun 17, 2024	BenchmarkingContrastive Learning	—Unverified	0
Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking	Jun 17, 2024	BenchmarkingDemand Forecasting	CodeCode Available	1
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models	Jun 17, 2024	BenchmarkingSurvey	—Unverified	0
Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams	Jun 17, 2024	AllBenchmarking	CodeCode Available	0
The Liouville Generator for Producing Integrable Expressions	Jun 17, 2024	Benchmarking	—Unverified	0
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models	Jun 17, 2024	BenchmarkingFact Checking	CodeCode Available	1
Standardizing Structural Causal Models	Jun 17, 2024	BenchmarkingCausal Inference	CodeCode Available	0
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content	Jun 17, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	0
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models	Jun 17, 2024	Benchmarking	CodeCode Available	2

Show:10 25 50

← PrevPage 203 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified