SOTAVerified|Agents Browse Leaderboard About Blog

16k

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 146 papers

Title	Date	Tasks	Status	Hype
Achieving Scalable Robot Autonomy via neurosymbolic planning using lightweight local LLM	May 13, 2025	16k8k	CodeCode Available	0
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning	May 12, 2025	16kBenchmarking	—Unverified	0
KL3M Tokenizers: A Family of Domain-Specific and Character-Level Tokenizers for Legal, Financial, and Preprocessing Applications	Mar 21, 2025	16k4k	CodeCode Available	0
NSF-SciFy: Mining the NSF Awards Database for Scientific Claims	Mar 11, 2025	16kAbstract generation	—Unverified	0
X-LRM: X-ray Large Reconstruction Model for Extremely Sparse-View Computed Tomography Recovery in One Second	Mar 9, 2025	16kCT Reconstruction	CodeCode Available	0
Evaluating the Suitability of Different Intraoral Scan Resolutions for Deep Learning-Based Tooth Segmentation	Feb 26, 2025	16k2k	—Unverified	0
EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts	Feb 20, 2025	16kDecoder	—Unverified	0
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification	Feb 12, 2025	16k4k	—Unverified	0
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs	Feb 4, 2025	16kDescriptive	CodeCode Available	1
M+: Extending MemoryLLM with Scalable Long-Term Memory	Feb 1, 2025	16kGPU	CodeCode Available	3

Show:10 25 50

← PrevPage 2 of 15Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Suprime2	1'"	1	—	Unverified