Lemmatization

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 351 papers

Title	Date	Tasks	Status
UBC\_UOS-TYPED: Regression for typed-similarity	Jun 1, 2013	LemmatizationNamed Entity Recognition (NER)	—Unverified
Ubiquitous Usage of a Broad Coverage French Corpus: Processing the Est Republicain corpus	May 1, 2012	ArticlesDependency Parsing	—Unverified
UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task	Oct 1, 2018	Dependency ParsingLemmatization	—Unverified
UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings	Jun 5, 2020	LemmatizationPOS	—Unverified
UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging	Aug 19, 2019	LemmatizationMorphological Analysis	—Unverified
UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing	May 1, 2016	Dependency ParsingLemmatization	—Unverified
UdS-(retrain\|distributional\|surface): Improving POS Tagging for OOV Words in German CMC and Web Data	Aug 1, 2016	Language ModelingLanguage Modelling	—Unverified
Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks	Feb 3, 2019	Data AugmentationLEMMA	—Unverified
Universal Morphologies for the Caucasus region	May 1, 2018	Lemmatization	—Unverified
USF at SemEval-2019 Task 6: Offensive Language Detection Using LSTM With Word Embeddings	Jun 1, 2019	General ClassificationLemmatization	—Unverified
Using longest common subsequence and character models to predict word forms	Aug 1, 2016	LemmatizationMorphological Inflection	—Unverified
Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages	May 1, 2014	LemmatizationMorphological Analysis	—Unverified
Utilizing Subword Entities in Character-Level Sequence-to-Sequence Lemmatization Models	Dec 1, 2020	DecoderLEMMA	—Unverified
UZH in BioNLP 2013	Aug 1, 2013	ChunkingDependency Parsing	—Unverified
UZH@SMM4H: System Descriptions	Oct 1, 2018	Document ClassificationGeneral Classification	—Unverified
Vacaspati: A Diverse Corpus of Bangla Literature	Jul 11, 2023	LemmatizationPOS	—Unverified
Very Large-Scale Lexical Resources to Enhance Chinese and Japanese Machine Translation	May 1, 2018	LemmatizationMachine Translation	—Unverified
Voting for POS tagging of Latin texts: Using the flair of FLAIR to better Ensemble Classifiers by Example of Latin	May 1, 2020	LemmatizationPart-Of-Speech Tagging	—Unverified
``Vreselijk mooi!'' (terribly beautiful): A Subjectivity Lexicon for Dutch Adjectives.	May 1, 2012	BIG-bench Machine LearningLemmatization	—Unverified
Weighting Finite-State Transductions With Neural Context	Jun 1, 2016	LemmatizationStructured Prediction	—Unverified
Word-Formation Network for Czech	May 1, 2014	LemmatizationMachine Translation	—Unverified
WSD for n-best reranking and local language modeling in SMT	Jul 1, 2012	Language ModelingLanguage Modelling	—Unverified
YAMAMA: Yet Another Multi-Dialect Arabic Morphological Analyzer	Dec 1, 2016	LemmatizationMorphological Analysis	—Unverified
ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus	Jun 1, 2022	LemmatizationPart-Of-Speech Tagging	—Unverified
Exploring the Use of Foundation Models for Named Entity Recognition and Lemmatization Tasks in Slavic Languages	Apr 11, 2023	Lemmatizationnamed-entity-recognition	—Unverified
Facilitating Multi-Lingual Sense Annotation: Human Mediated Lemmatizer	Jan 1, 2014	LemmatizationWord Sense Disambiguation	—Unverified
Factored Machine Translation Systems for Russian-English	Aug 1, 2013	LemmatizationMachine Translation	—Unverified
Fast and Accurate Decision Trees for Natural Language Processing Tasks	Sep 1, 2017	AttributeBIG-bench Machine Learning	—Unverified
Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings	Jun 1, 2018	Information RetrievalLemmatization	—Unverified
Few-Shot and Zero-Shot Learning for Historical Text Normalization	Mar 12, 2019	LemmatizationMulti-Task Learning	—Unverified
First Steps towards the Semi-automatic Development of a Wordformation-based Lexicon of Latin	May 1, 2012	Information RetrievalLemmatization	—Unverified
FOLK-Gold ― A Gold Standard for Part-of-Speech-Tagging of Spoken German	May 1, 2016	LemmatizationPart-Of-Speech Tagging	—Unverified
Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style	Apr 1, 2017	Gender ClassificationGeneral Classification	—Unverified
Generating a Gold Standard for a Swedish Sentiment Lexicon	May 1, 2018	LemmatizationMachine Translation	—Unverified
GliLem: Leveraging GliNER for Contextualized Lemmatization in Estonian	Dec 29, 2024	Information RetrievalLEMMA	—Unverified
H2-Golden-Retriever: Methodology and Tool for an Evidence-Based Hydrogen Research Grantsmanship	Nov 16, 2022	Lemmatizationnamed-entity-recognition	—Unverified
Handling Unknown Words in Arabic FST Morphology	Jul 1, 2012	Lemmatization	—Unverified
Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin	Aug 1, 2019	LEMMALemmatization	—Unverified
HHU at SemEval-2016 Task 1: Multiple Approaches to Measuring Semantic Textual Similarity	Jun 1, 2016	LemmatizationNamed Entity Recognition (NER)	—Unverified
Holaaa!! writin like u talk is kewl but kinda hard 4 NLP	May 1, 2012	Domain AdaptationLanguage Modelling	—Unverified
How low is too low? A monolingual take on lemmatisation in Indian languages	Jun 1, 2021	Data AugmentationLemmatization	—Unverified
Illinois-LH: A Denotational and Distributional Approach to Semantics	Aug 1, 2014	LemmatizationNatural Language Inference	—Unverified
Impact of Feature Selection on Micro-Text Classification	Aug 27, 2017	ClassificationClustering	—Unverified
Improving Neural Translation Models with Linguistic Factors	Dec 1, 2016	Constituency ParsingDependency Parsing	—Unverified
Improving the Morphological Analysis of Classical Sanskrit	Dec 1, 2016	BIG-bench Machine LearningLemmatization	—Unverified
Indexation libre et contr\^ol\'ee d'articles scientifiques. Pr\'esentation et r\'esultats du d\'efi fouille de textes DEFT2012 (Controlled and free indexing of scientific papers. Presentation and results of the DEFT2012 text-mining challenge) [in French]	Jun 1, 2012	Lemmatization	—Unverified
Investigating Sub-Word Embedding Strategies for the Morphologically Rich and Free Phrase-Order Hungarian	Aug 1, 2019	LemmatizationMorphological Analysis	—Unverified
Iula2Standoff: a tool for creating standoff documents for the IULACT	May 1, 2012	LemmatizationPOS	—Unverified
IWNLP: Inverse Wiktionary for Natural Language Processing	Jul 1, 2015	LemmatizationPart-Of-Speech Tagging	—Unverified
JAIST: Combining multiple features for Answer Selection in Community Question Answering	Jun 1, 2015	Answer SelectionCommunity Question Answering	—Unverified

Show:10 25 50

← PrevPage 4 of 8Next →

No leaderboard results yet.