SOTAVerified

The Open Verification Layer for ML Research

Community benchmark tracking and reproducibility verification. Built for researchers and autonomous research agents.

661,570 papers248,326 code links4,818 tasks

Papers

Showing 42514300 of 661570 papers

TitleStatusHype
White-Box Transformers via Sparse Rate ReductionCode3
CodeTF: One-stop Transformer Library for State-of-the-art Code LLMCode3
Humans in 4D: Reconstructing and Tracking Humans with TransformersCode3
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence SegmentationCode3
LLM-QAT: Data-Free Quantization Aware Training for Large Language ModelsCode3
Fine-Tuning Language Models with Just Forward PassesCode3
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language ModelsCode3
An end-to-end strategy for recovering a free-form potential from a snapshot of stellar coordinatesCode3
Large Language Models as Tool MakersCode3
Landmark Attention: Random-Access Infinite Context Length for TransformersCode3
The False Promise of Imitating Proprietary LLMsCode3
Generating Synergistic Formulaic Alpha Collections via Reinforcement LearningCode3
RoMa: Robust Dense Feature MatchingCode3
HuatuoGPT, towards Taming Language Model to Be a DoctorCode3
Hierarchical Prompting Assists Large Language Model on Web NavigationCode3
CGCE: A Chinese Generative Chat Evaluation Benchmark for General and Financial DomainsCode3
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on WikipediaCode3
Evaluation of the MACE Force Field Architecture: from Medicinal Chemistry to Materials ScienceCode3
RecurrentGPT: Interactive Generation of (Arbitrarily) Long TextCode3
AlpacaFarm: A Simulation Framework for Methods that Learn from Human FeedbackCode3
Prompting with Pseudo-Code InstructionsCode3
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions ParametersCode3
InstructIE: A Bilingual Instruction-based Information Extraction DatasetCode3
Self-QA: Unsupervised Knowledge Guided Language Model AlignmentCode3
LLM-Pruner: On the Structural Pruning of Large Language ModelsCode3
Delay-penalized CTC implemented based on Finite State TransducerCode3
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational AbilitiesCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Quantifying the robustness of deep multispectral segmentation models against natural perturbations and data poisoningCode3
Accelerating Transformer Inference for Translation via Parallel DecodingCode3
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning ResearchCode3
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and VerificationCode3
NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference ChecklistCode3
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation ModelsCode3
A Comprehensive Survey on Segment Anything Model for Vision and BeyondCode3
WikiWeb2M: A Page-Level Multimodal Wikipedia DatasetCode3
MultiModal-GPT: A Vision and Language Model for Dialogue with HumansCode3
PiML Toolbox for Interpretable Machine Learning Model Development and DiagnosticsCode3
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign LanguagesCode3
Visual Causal Scene Refinement for Video Question AnsweringCode3
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human SupervisionCode3
Personalize Segment Anything Model with One ShotCode3
Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language ModelsCode3
Caption Anything: Interactive Image Description with Diverse Multimodal ControlsCode3
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code GenerationCode3
Unlimiformer: Long-Range Transformers with Unlimited Length InputCode3
UCF: Uncovering Common Features for Generalizable Deepfake DetectionCode3
LibCity: A Unified Library Towards Efficient and Comprehensive Urban Spatial-Temporal PredictionCode3
TorchBench: Benchmarking PyTorch with High API Surface CoverageCode3
Learning Neural PDE Solvers with Parameter-Guided Channel AttentionCode3
Show:102550
← PrevPage 86 of 13232Next →