Lemmatization

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–175 of 351 papers

Title	Date	Tasks	Status
Attention-free encoder decoder for morphological processing	Oct 1, 2018	DecoderLemmatization	—Unverified
NLP-Cube: End-to-End Raw Text Processing With Neural Networks	Oct 1, 2018	LemmatizationSentence	CodeCode Available
LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs	Oct 1, 2018	LemmatizationMachine Translation	CodeCode Available
Building a Lemmatizer and a Spell-checker for Sorani Kurdish	Sep 27, 2018	Language ModelingLanguage Modelling	—Unverified
Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks	Sep 10, 2018	Dependency ParsingLemmatization	CodeCode Available
Imitation Learning for Neural Morphological String Transduction	Aug 31, 2018	Imitation LearningLemmatization	CodeCode Available
LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs	Aug 10, 2018	LemmatizationPart-Of-Speech Tagging	CodeCode Available
Neural Transition-based String Transduction for Limited-Resource Setting in Morphology	Aug 1, 2018	LemmatizationMachine Translation	CodeCode Available
From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources	Aug 1, 2018	Coreference ResolutionLemmatization	CodeCode Available
Local String Transduction as Sequence Labeling	Aug 1, 2018	LemmatizationMachine Translation	—Unverified
An Evaluation of Lexicon-based Sentiment Analysis Techniques for the Plays of Gotthold Ephraim Lessing	Aug 1, 2018	LemmatizationSentiment Analysis	—Unverified
Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora	Jul 26, 2018	Lemmatizationnamed-entity-recognition	CodeCode Available
Character-level Supervision for Low-resource POS Tagging	Jul 1, 2018	Feature EngineeringLEMMA	—Unverified
Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings	Jun 1, 2018	Information RetrievalLemmatization	—Unverified
IUCM at SemEval-2018 Task 11: Similar-Topic Texts as a Comprehension Knowledge Source	Jun 1, 2018	ClusteringLemmatization	CodeCode Available
Context Sensitive Neural Lemmatization with Lematus	Jun 1, 2018	DecoderLemmatization	—Unverified
Tw-StAR at SemEval-2018 Task 1: Preprocessing Impact on Multi-label Emotion Classification	Jun 1, 2018	ClassificationEmotion Classification	—Unverified
Robustness of sentence length measures in written texts	May 2, 2018	LemmatizationSentence	—Unverified
The Use of Text Alignment in Semi-Automatic Error Analysis: Use Case in the Development of the Corpus of the Latvian Language Learners	May 1, 2018	Language AcquisitionLemmatization	—Unverified
Parser combinators for Tigrinya and Oromo morphology	May 1, 2018	LemmatizationMachine Translation	—Unverified
Universal Morphologies for the Caucasus region	May 1, 2018	Lemmatization	—Unverified
Developing New Linguistic Resources and Tools for the Galician Language	May 1, 2018	LemmatizationNamed Entity Recognition (NER)	—Unverified
Moving TIGER beyond Sentence-Level	May 1, 2018	Boundary DetectionDependency Parsing	—Unverified
Sudachi: a Japanese Tokenizer for Business	May 1, 2018	ChunkingLemmatization	CodeCode Available
Generating a Gold Standard for a Swedish Sentiment Lexicon	May 1, 2018	LemmatizationMachine Translation	—Unverified

Show:10 25 50

← PrevPage 7 of 15Next →

No leaderboard results yet.