SOTAVerified

Masked Language Modeling

Papers

Showing 101150 of 475 papers

TitleStatusHype
MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NERCode1
Sentence Bottleneck Autoencoders from Transformer Language ModelsCode1
Knowledge Perceived Multi-modal Pretraining in E-commerceCode1
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text ClassificationCode1
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge GraphsCode1
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word AlignmentCode1
Luna: Linear Unified Nested AttentionCode1
TreeBERT: A Tree-Based Pre-Trained Model for Programming LanguageCode1
KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation ExtractionCode1
On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic DependenciesCode1
MMBERT: Multimodal BERT Pretraining for Improved Medical VQACode1
Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine TranslationCode1
MERMAID: Metaphor Generation with Symbolism and Discriminative DecodingCode1
CDLM: Cross-Document Language ModelingCode1
AraELECTRA: Pre-Training Text Discriminators for Arabic Language UnderstandingCode1
RealFormer: Transformer Likes Residual AttentionCode1
TAP: Text-Aware Pre-training for Text-VQA and Text-CaptionCode1
Pre-training Protein Language Models with Label-Agnostic Binding Pairs Enhances Performance in Downstream TasksCode1
StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language ModelingCode1
Cold-start Active Learning through Self-supervised Language ModelingCode1
Cross-Thought for Sentence Encoder Pre-trainingCode1
SPLAT: Speech-Language Joint Pre-Training for Spoken Language UnderstandingCode1
XDA: Accurate, Robust Disassembly with Transfer LearningCode1
GraPPa: Grammar-Augmented Pre-Training for Table Semantic ParsingCode1
Intermediate Training of BERT for Product MatchingCode1
The Lottery Ticket Hypothesis for Pre-trained BERT NetworksCode1
Language-agnostic BERT Sentence EmbeddingCode1
Pre-training via ParaphrasingCode1
MC-BERT: Efficient Language Pre-Training via a Meta ControllerCode1
Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLPCode1
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-trainingCode1
Segatron: Segment-Aware Transformer for Language Modeling and UnderstandingCode1
Train No Evil: Selective Masking for Task-Guided Pre-TrainingCode1
TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented DialogueCode1
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than GeneratorsCode1
Talking-Heads AttentionCode1
REALM: Retrieval-Augmented Language Model Pre-TrainingCode1
UNITER: UNiversal Image-TExt Representation LearningCode1
LXMERT: Learning Cross-Modality Encoder Representations from TransformersCode1
Mask-Predict: Parallel Decoding of Conditional Masked Language ModelsCode1
GeoRecon: Graph-Level Representation Learning for 3D Molecules via Reconstruction-Based Pretraining0
Masked Language Models are Good Heterogeneous Graph GeneralizersCode0
Improving Low-Resource Morphological Inflection via Self-Supervised Objectives0
HAD: Hybrid Architecture Distillation Outperforms Teacher in Genomic Sequence Modeling0
Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations0
ADALog: Adaptive Unsupervised Anomaly detection in Logs with Self-attention Masked Language Model0
CodeSSM: Towards State Space Models for Code Understanding0
In-Context Learning can distort the relationship between sequence likelihoods and biological fitness0
Low-Resource Transliteration for Roman-Urdu and Urdu Using Transformer-Based Models0
Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them0
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.