Lemmatization

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 351 papers

Title	Date	Tasks	Status
TreeAnnotator: Versatile Visual Annotation of Hierarchical Text Relations	May 1, 2018	Lemmatization	—Unverified
A Morphologically Annotated Corpus of Emirati Arabic	May 1, 2018	LemmatizationMachine Translation	—Unverified
Moving TIGER beyond Sentence-Level	May 1, 2018	Boundary DetectionDependency Parsing	—Unverified
BioRo: The Biomedical Corpus for the Romanian Language	May 1, 2018	Lemmatization	—Unverified
Parser combinators for Tigrinya and Oromo morphology	May 1, 2018	LemmatizationMachine Translation	—Unverified
SentiArabic: A Sentiment Analyzer for Standard Arabic	May 1, 2018	Arabic Sentiment AnalysisLemmatization	—Unverified
Sudachi: a Japanese Tokenizer for Business	May 1, 2018	ChunkingLemmatization	CodeCode Available
Automatic Categorization of Tagalog Documents Using Support Vector Machines	Nov 1, 2017	Document ClassificationGeneral Classification	—Unverified
Build Fast and Accurate Lemmatization for Arabic	Oct 18, 2017	Information RetrievalLemmatization	—Unverified
Adapting the TTL Romanian POS Tagger to the Biomedical Domain	Sep 1, 2017	ChunkingDomain Adaptation	—Unverified
Evaluation of Finite State Morphological Analyzers Based on Paradigm Extraction from Wiktionary	Sep 1, 2017	Language ModelingLanguage Modelling	—Unverified
Fast and Accurate Decision Trees for Natural Language Processing Tasks	Sep 1, 2017	AttributeBIG-bench Machine Learning	—Unverified
Automatically Acquired Lexical Knowledge Improves Japanese Joint Morphological and Dependency Analysis	Sep 1, 2017	LemmatizationMorphological Analysis	—Unverified
bleu2vec: the Painfully Familiar Metric on Continuous Vector Space Steroids	Sep 1, 2017	LemmatizationMachine Translation	—Unverified
An Extensible Multilingual Open Source Lemmatizer	Sep 1, 2017	Information RetrievalLEMMA	—Unverified
Lemmatization of Multi-word Common Noun Phrases and Named Entities in Polish	Sep 1, 2017	Lemmatization	—Unverified
Impact of Feature Selection on Micro-Text Classification	Aug 27, 2017	ClassificationClustering	—Unverified
KeyXtract Twitter Model - An Essential Keywords Extraction Model for Twitter Designed using NLP Tools	Aug 9, 2017	Lemmatizationmodel	—Unverified
Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe	Aug 1, 2017	Dependency ParsingLemmatization	—Unverified
Lexical Correction of Polish Twitter Political Data	Aug 1, 2017	Entity Extraction using GANLemmatization	—Unverified
LABDA at SemEval-2017 Task 10: Relation Classification between keyphrases via Convolutional Neural Network	Aug 1, 2017	ArticlesGeneral Classification	—Unverified
DT\_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output	Aug 1, 2017	LemmatizationSemantic Similarity	—Unverified
ECNU at SemEval-2017 Task 4: Evaluating Effective Features on Machine Learning Methods for Twitter Message Polarity Classification	Aug 1, 2017	BIG-bench Machine LearningFeature Engineering	—Unverified
RACAI's Natural Language Processing pipeline for Universal Dependencies	Aug 1, 2017	LemmatizationSentence	—Unverified
QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings	Aug 1, 2017	LemmatizationSemantic Textual Similarity	—Unverified

Show:10 25 50

← PrevPage 8 of 15Next →

No leaderboard results yet.