SOTAVerified

Language Modeling

Papers

Showing 10011050 of 14182 papers

TitleStatusHype
CogView2: Faster and Better Text-to-Image Generation via Hierarchical TransformersCode2
DiffCSE: Difference-based Contrastive Learning for Sentence EmbeddingsCode2
PaLM: Scaling Language Modeling with PathwaysCode2
Do As I Can, Not As I Say: Grounding Language in Robotic AffordancesCode2
PromptDet: Towards Open-vocabulary Detection using Uncurated ImagesCode2
LinkBERT: Pretraining Language Models with Document LinksCode2
STaR: Bootstrapping Reasoning With ReasoningCode2
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)Code2
Memorizing TransformersCode2
PERT: Pre-training BERT with Permuted Language ModelCode2
Block-Recurrent TransformersCode2
LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language ModelsCode2
Contextual Semantic Embeddings for Ontology Subsumption PredictionCode2
Online Decision TransformerCode2
ProteinBERT: a universal deep-learning model of protein sequence and functionCode2
TimeLMs: Diachronic Language Models from TwitterCode2
Cedille: A large autoregressive French language modelCode2
Pre-Trained Language Models for Interactive Decision-MakingCode2
Formal Mathematics Statement Curriculum LearningCode2
Neuro-Symbolic Language Modeling with Automaton-augmented RetrievalCode2
Synchromesh: Reliable code generation from pre-trained language modelsCode2
Black-Box Tuning for Language-Model-as-a-ServiceCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
ClipCap: CLIP Prefix for Image CaptioningCode2
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding SharingCode2
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and TasksCode2
Deduplicating Training Data Makes Language Models BetterCode2
FastMoE: A Fast Mixture-of-Expert Training SystemCode2
GPT Understands, TooCode2
When Attention Meets Fast Recurrence: Training Language Models with Reduced ComputeCode2
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNetCode2
The Pile: An 800GB Dataset of Diverse Text for Language ModelingCode2
Automatically Identifying Words That Can Serve as Labels for Few-Shot Text ClassificationCode2
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed GradientsCode2
Mirostat: A Neural Text Decoding Algorithm that Directly Controls PerplexityCode2
Simplifying Paragraph-level Question Generation via Transformer Language ModelsCode2
MPNet: Masked and Permuted Pre-training for Language UnderstandingCode2
BAE: BERT-based Adversarial Examples for Text ClassificationCode2
Self-Supervised Log ParsingCode2
CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language ModelCode2
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model ParallelismCode2
MASS: Masked Sequence to Sequence Pre-training for Language GenerationCode2
Knowledge Representation Learning: A Quantitative ReviewCode2
Training RNNs as Fast as CNNsCode2
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts LayerCode2
End-To-End Memory NetworksCode2
InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofingCode1
Describe Anything Model for Visual Question Answering on Text-rich ImagesCode1
Evaluating Morphological Alignment of Tokenizers in 70 LanguagesCode1
Differential MambaCode1
Show:102550
← PrevPage 21 of 284Next →

No leaderboard results yet.