SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 42514275 of 661570 papers

TitleStatusHype
White-Box Transformers via Sparse Rate ReductionCode3
Humans in 4D: Reconstructing and Tracking Humans with TransformersCode3
CodeTF: One-stop Transformer Library for State-of-the-art Code LLMCode3
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence SegmentationCode3
LLM-QAT: Data-Free Quantization Aware Training for Large Language ModelsCode3
Fine-Tuning Language Models with Just Forward PassesCode3
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language ModelsCode3
An end-to-end strategy for recovering a free-form potential from a snapshot of stellar coordinatesCode3
Large Language Models as Tool MakersCode3
Landmark Attention: Random-Access Infinite Context Length for TransformersCode3
The False Promise of Imitating Proprietary LLMsCode3
Generating Synergistic Formulaic Alpha Collections via Reinforcement LearningCode3
RoMa: Robust Dense Feature MatchingCode3
HuatuoGPT, towards Taming Language Model to Be a DoctorCode3
Hierarchical Prompting Assists Large Language Model on Web NavigationCode3
CGCE: A Chinese Generative Chat Evaluation Benchmark for General and Financial DomainsCode3
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on WikipediaCode3
Evaluation of the MACE Force Field Architecture: from Medicinal Chemistry to Materials ScienceCode3
AlpacaFarm: A Simulation Framework for Methods that Learn from Human FeedbackCode3
RecurrentGPT: Interactive Generation of (Arbitrarily) Long TextCode3
Prompting with Pseudo-Code InstructionsCode3
Self-QA: Unsupervised Knowledge Guided Language Model AlignmentCode3
LLM-Pruner: On the Structural Pruning of Large Language ModelsCode3
Delay-penalized CTC implemented based on Finite State TransducerCode3
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions ParametersCode3
Show:102550
← PrevPage 171 of 26463Next →