Lemmatization

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–175 of 351 papers

Title	Date	Tasks	Status
Corpora and Processing Tools for Non-standard Contemporary and Diachronic Balkan Slavic	Sep 1, 2019	LemmatizationPOS	—Unverified
Indexation libre et contr\^ol\'ee d'articles scientifiques. Pr\'esentation et r\'esultats du d\'efi fouille de textes DEFT2012 (Controlled and free indexing of scientific papers. Presentation and results of the DEFT2012 text-mining challenge) [in French]	Jun 1, 2012	Lemmatization	—Unverified
Improving the Morphological Analysis of Classical Sanskrit	Dec 1, 2016	BIG-bench Machine LearningLemmatization	—Unverified
CoRoLa --- The Reference Corpus of Contemporary Romanian Language	May 1, 2014	LemmatizationSentence	—Unverified
Improving Neural Translation Models with Linguistic Factors	Dec 1, 2016	Constituency ParsingDependency Parsing	—Unverified
Coreference Resolution in FreeLing 4.0	May 1, 2018	Constituency Parsingcoreference-resolution	—Unverified
ASOBEK at SemEval-2016 Task 1: Sentence Representation with Character N-gram Embeddings for Semantic Textual Similarity	Jun 1, 2016	Language ModelingLanguage Modelling	—Unverified
Iula2Standoff: a tool for creating standoff documents for the IULACT	May 1, 2012	LemmatizationPOS	—Unverified
Analysing cross-lingual transfer in lemmatisation for Indian languages	Dec 1, 2020	Cross-Lingual TransferLemmatization	—Unverified
JAIST: Combining multiple features for Answer Selection in Community Question Answering	Jun 1, 2015	Answer SelectionCommunity Question Answering	—Unverified
JHUBC's Submission to LT4HALA EvaLatin 2020	May 1, 2020	DecoderLemmatization	—Unverified
Joint Diacritization, Lemmatization, Normalization, and Fine-Grained Morphological Tagging	Oct 5, 2019	LemmatizationMorphological Tagging	—Unverified
Impact of Feature Selection on Micro-Text Classification	Aug 27, 2017	ClassificationClustering	—Unverified
Context Sensitive Neural Lemmatization with Lematus	Jun 1, 2018	DecoderLemmatization	—Unverified
Illinois-LH: A Denotational and Distributional Approach to Semantics	Aug 1, 2014	LemmatizationNatural Language Inference	—Unverified
Context Sensitive Lemmatization Using Two Successive Bidirectional Gated Recurrent Networks	Jul 1, 2017	AttributeLEMMA	—Unverified
KLUE-CORE: A regression model of semantic textual similarity	Jun 1, 2013	LemmatizationQuestion Answering	—Unverified
A Simple Joint Model for Improved Contextual Neural Lemmatization	Apr 4, 2019	LEMMALemmatization	—Unverified
Korp --- the corpus infrastructure of Spr	May 1, 2012	Lemmatization	—Unverified
LABDA at SemEval-2017 Task 10: Relation Classification between keyphrases via Convolutional Neural Network	Aug 1, 2017	ArticlesGeneral Classification	—Unverified
LAMB: A Good Shepherd of Morphologically Rich Languages	Nov 1, 2016	Lemmatization	—Unverified
LatinCy: Synthetic Trained Pipelines for Latin NLP	May 7, 2023	LemmatizationMorphological Tagging	—Unverified
How low is too low? A monolingual take on lemmatisation in Indian languages	Jun 1, 2021	Data AugmentationLemmatization	—Unverified
Learning Representations for Text-level Discourse Parsing	Jul 1, 2015	Discourse ParsingLemmatization	—Unverified
Context based lemmatizer for Polish language	Jul 23, 2022	LEMMALemmatization	—Unverified

Show:10 25 50

← PrevPage 7 of 15Next →

No leaderboard results yet.