Lemmatization

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 351 papers

Title	Date	Tasks	Status
Attention-free encoder decoder for morphological processing	Oct 1, 2018	DecoderLemmatization	—Unverified
NLP-Cube: End-to-End Raw Text Processing With Neural Networks	Oct 1, 2018	LemmatizationSentence	CodeCode Available
UZH@SMM4H: System Descriptions	Oct 1, 2018	Document ClassificationGeneral Classification	—Unverified
Building a Lemmatizer and a Spell-checker for Sorani Kurdish	Sep 27, 2018	Language ModelingLanguage Modelling	—Unverified
Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks	Sep 10, 2018	Dependency ParsingLemmatization	CodeCode Available
Imitation Learning for Neural Morphological String Transduction	Aug 31, 2018	Imitation LearningLemmatization	CodeCode Available
LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs	Aug 10, 2018	LemmatizationPart-Of-Speech Tagging	CodeCode Available
Local String Transduction as Sequence Labeling	Aug 1, 2018	LemmatizationMachine Translation	—Unverified
An Evaluation of Lexicon-based Sentiment Analysis Techniques for the Plays of Gotthold Ephraim Lessing	Aug 1, 2018	LemmatizationSentiment Analysis	—Unverified
From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources	Aug 1, 2018	Coreference ResolutionLemmatization	CodeCode Available
Neural Transition-based String Transduction for Limited-Resource Setting in Morphology	Aug 1, 2018	LemmatizationMachine Translation	CodeCode Available
Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora	Jul 26, 2018	Lemmatizationnamed-entity-recognition	CodeCode Available
Character-level Supervision for Low-resource POS Tagging	Jul 1, 2018	Feature EngineeringLEMMA	—Unverified
Fast Query Expansion on an Accounting Corpus using Sub-Word Embeddings	Jun 1, 2018	Information RetrievalLemmatization	—Unverified
IUCM at SemEval-2018 Task 11: Similar-Topic Texts as a Comprehension Knowledge Source	Jun 1, 2018	ClusteringLemmatization	CodeCode Available
Tw-StAR at SemEval-2018 Task 1: Preprocessing Impact on Multi-label Emotion Classification	Jun 1, 2018	ClassificationEmotion Classification	—Unverified
Context Sensitive Neural Lemmatization with Lematus	Jun 1, 2018	DecoderLemmatization	—Unverified
Robustness of sentence length measures in written texts	May 2, 2018	LemmatizationSentence	—Unverified
The Use of Text Alignment in Semi-Automatic Error Analysis: Use Case in the Development of the Corpus of the Latvian Language Learners	May 1, 2018	Language AcquisitionLemmatization	—Unverified
SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts	May 1, 2018	Domain AdaptationLemmatization	CodeCode Available
Developing New Linguistic Resources and Tools for the Galician Language	May 1, 2018	LemmatizationNamed Entity Recognition (NER)	—Unverified
Very Large-Scale Lexical Resources to Enhance Chinese and Japanese Machine Translation	May 1, 2018	LemmatizationMachine Translation	—Unverified
Universal Morphologies for the Caucasus region	May 1, 2018	Lemmatization	—Unverified
Generating a Gold Standard for a Swedish Sentiment Lexicon	May 1, 2018	LemmatizationMachine Translation	—Unverified
Coreference Resolution in FreeLing 4.0	May 1, 2018	Constituency Parsingcoreference-resolution	—Unverified
TreeAnnotator: Versatile Visual Annotation of Hierarchical Text Relations	May 1, 2018	Lemmatization	—Unverified
A Morphologically Annotated Corpus of Emirati Arabic	May 1, 2018	LemmatizationMachine Translation	—Unverified
Moving TIGER beyond Sentence-Level	May 1, 2018	Boundary DetectionDependency Parsing	—Unverified
BioRo: The Biomedical Corpus for the Romanian Language	May 1, 2018	Lemmatization	—Unverified
Parser combinators for Tigrinya and Oromo morphology	May 1, 2018	LemmatizationMachine Translation	—Unverified
SentiArabic: A Sentiment Analyzer for Standard Arabic	May 1, 2018	Arabic Sentiment AnalysisLemmatization	—Unverified
Sudachi: a Japanese Tokenizer for Business	May 1, 2018	ChunkingLemmatization	CodeCode Available
Automatic Categorization of Tagalog Documents Using Support Vector Machines	Nov 1, 2017	Document ClassificationGeneral Classification	—Unverified
Build Fast and Accurate Lemmatization for Arabic	Oct 18, 2017	Information RetrievalLemmatization	—Unverified
Adapting the TTL Romanian POS Tagger to the Biomedical Domain	Sep 1, 2017	ChunkingDomain Adaptation	—Unverified
Evaluation of Finite State Morphological Analyzers Based on Paradigm Extraction from Wiktionary	Sep 1, 2017	Language ModelingLanguage Modelling	—Unverified
Fast and Accurate Decision Trees for Natural Language Processing Tasks	Sep 1, 2017	AttributeBIG-bench Machine Learning	—Unverified
Automatically Acquired Lexical Knowledge Improves Japanese Joint Morphological and Dependency Analysis	Sep 1, 2017	LemmatizationMorphological Analysis	—Unverified
bleu2vec: the Painfully Familiar Metric on Continuous Vector Space Steroids	Sep 1, 2017	LemmatizationMachine Translation	—Unverified
An Extensible Multilingual Open Source Lemmatizer	Sep 1, 2017	Information RetrievalLEMMA	—Unverified
Lemmatization of Multi-word Common Noun Phrases and Named Entities in Polish	Sep 1, 2017	Lemmatization	—Unverified
Impact of Feature Selection on Micro-Text Classification	Aug 27, 2017	ClassificationClustering	—Unverified
KeyXtract Twitter Model - An Essential Keywords Extraction Model for Twitter Designed using NLP Tools	Aug 9, 2017	Lemmatizationmodel	—Unverified
Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe	Aug 1, 2017	Dependency ParsingLemmatization	—Unverified
Lexical Correction of Polish Twitter Political Data	Aug 1, 2017	Entity Extraction using GANLemmatization	—Unverified
LABDA at SemEval-2017 Task 10: Relation Classification between keyphrases via Convolutional Neural Network	Aug 1, 2017	ArticlesGeneral Classification	—Unverified
DT\_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output	Aug 1, 2017	LemmatizationSemantic Similarity	—Unverified
ECNU at SemEval-2017 Task 4: Evaluating Effective Features on Machine Learning Methods for Twitter Message Polarity Classification	Aug 1, 2017	BIG-bench Machine LearningFeature Engineering	—Unverified
RACAI's Natural Language Processing pipeline for Universal Dependencies	Aug 1, 2017	LemmatizationSentence	—Unverified
QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings	Aug 1, 2017	LemmatizationSemantic Textual Similarity	—Unverified

Show:10 25 50

← PrevPage 4 of 8Next →

No leaderboard results yet.