Lemmatization

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 351 papers

Title	Date	Tasks	Status	Score
Enhancing Sequence-to-Sequence Neural Lemmatization with External Resources	Jan 28, 2021	Data AugmentationDecoder	CodeCode Available	5
Stylistic Fingerprints, POS-tags and Inflected Languages: A Case Study in Polish	Jun 5, 2022	Authorship AttributionLemmatization	CodeCode Available	5
Tagging and parsing of multidomain collections	Jun 17, 2020	Dependency ParsingLanguage Modeling	CodeCode Available	5
The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs	May 21, 2020	LemmatizationWord Embeddings	CodeCode Available	5
From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources	Aug 1, 2018	Coreference ResolutionLemmatization	CodeCode Available	5
BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer	Apr 19, 2023	Dependency ParsingExtractive Question-Answering	CodeCode Available	5
Training Data Augmentation for Context-Sensitive Neural Lemmatization Using Inflection Tables and Raw Text	Apr 2, 2019	Data AugmentationLEMMA	CodeCode Available	5
Tree-Stack LSTM in Transition Based Dependency Parsing	Oct 1, 2018	Dependency ParsingLemmatization	CodeCode Available	5
Cross-Lingual Lemmatization and Morphology Tagging with Two-Stage Multilingual BERT Fine-Tuning	Aug 1, 2019	LemmatizationMorphological Analysis	CodeCode Available	5
Unsupervised Lemmatization as Embeddings-Based Word Clustering	Aug 22, 2019	ClusteringLEMMA	CodeCode Available	5
Cross-lingual Named Entity Corpus for Slavic Languages	Mar 30, 2024	LEMMALemmatization	CodeCode Available	5
Imitation Learning for Neural Morphological String Transduction	Aug 31, 2018	Imitation LearningLemmatization	CodeCode Available	5
DBTagger: Multi-Task Learning for Keyword Mapping in NLIDBs Using Bi-Directional Recurrent Neural Networks	Jan 11, 2021	LemmatizationMulti-Task Learning	CodeCode Available	5
Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings	May 18, 2020	Cultural Vocal Bursts Intensity PredictionLemmatization	CodeCode Available	5
LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs	Aug 10, 2018	LemmatizationPart-Of-Speech Tagging	CodeCode Available	5
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology	Jul 23, 2019	LEMMALemmatization	CodeCode Available	5
Development of a Hindi Lemmatizer	May 24, 2013	LemmatizationMachine Translation	CodeCode Available	5
Revisiting NMT for Normalization of Early English Letters	Jun 1, 2019	LemmatizationMachine Translation	CodeCode Available	5
CELI: An Experiment with Cross Language Textual Entailment	Jul 1, 2012	LemmatizationNamed Entity Recognition (NER)	—Unverified	0
CBNU System for SIGMORPHON 2019 Shared Task 2: a Pipeline Model	Aug 1, 2019	LEMMALemmatization	—Unverified	0
ANNLOR: A Na\" Notation-system for Lexical Outputs Ranking	Jul 1, 2012	LemmatizationLexical Simplification	—Unverified	0
Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages	May 1, 2012	Lemmatization	—Unverified	0
Building a multilingual parallel corpus for human users	May 1, 2012	Lemmatization	—Unverified	0
An Extensible Multilingual Open Source Lemmatizer	Sep 1, 2017	Information RetrievalLEMMA	—Unverified	0
AI-KU: Using Co-Occurrence Modeling for Semantic Similarity	Aug 1, 2014	Information RetrievalLanguage Modelling	—Unverified	0

Show:10 25 50

← PrevPage 3 of 15Next →

No leaderboard results yet.