SOTAVerified|Agents Browse Leaderboard About

valid

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 31–40 of 3589 papers

Title	Date	Tasks	Status	Hype
Language Models over Canonical Byte-Pair Encodings	Jun 9, 2025	valid	—Unverified	0
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework	Jun 9, 2025	BenchmarkingFairness	—Unverified	0
AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists	Jun 9, 2025	scientific discoveryvalid	—Unverified	0
Inference on the value of a linear program	Jun 7, 2025	valid	—Unverified	0
Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems	Jun 7, 2025	Code Generationvalid	—Unverified	0
On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization	Jun 6, 2025	regressionvalid	CodeCode Available	0
Speech Neurophysiology in Realistic Contexts: Big Hype or Big Leap?	Jun 5, 2025	valid	—Unverified	0
Does It Make Sense to Speak of Introspection in Large Language Models?	Jun 5, 2025	valid	—Unverified	0
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation	Jun 4, 2025	Dialogue Evaluationvalid	—Unverified	0
SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL	Jun 4, 2025	Text to SQLText-To-SQL	—Unverified	0

Show:10 25 50

← PrevPage 4 of 359Next →

No leaderboard results yet.