Lemmatization

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–350 of 351 papers

Title	Date	Tasks	Status
Cross-Language Plagiarism Detection Methods	Sep 1, 2013	LemmatizationMachine Translation	—Unverified
A unified lexical processing framework based on the Margin Infused Relaxed Algorithm. A case study on the Romanian Language	Sep 1, 2013	LemmatizationSpeech Synthesis	—Unverified
Chimera -- Three Heads for English-to-Czech Translation	Aug 1, 2013	LemmatizationMachine Translation	—Unverified
Factored Machine Translation Systems for Russian-English	Aug 1, 2013	LemmatizationMachine Translation	—Unverified
Lemmatization and Morphosyntactic Tagging of Croatian and Serbian	Aug 1, 2013	LemmatizationPart-Of-Speech Tagging	—Unverified
Modernizing historical Slovene words with character-based SMT	Aug 1, 2013	LemmatizationMachine Translation	—Unverified
UZH in BioNLP 2013	Aug 1, 2013	ChunkingDependency Parsing	—Unverified
DKPro Similarity: An Open Source Framework for Text Similarity	Aug 1, 2013	LemmatizationSemantic Textual Similarity	—Unverified
An extended morphological analyzer of German handling verbal forms with separated separable particles (Un analyseur morphologique \'etendu de l'allemand traitant les formes verbales \`a particule s\'epar\'ee) [in French]	Jun 1, 2013	LemmatizationMorphological Analysis	—Unverified
Towards an automatic identification of chiasmus of words (Vers une identification automatique du chiasme de mots) [in French]	Jun 1, 2013	Lemmatization	—Unverified
CNGL-CORE: Referential Translation Machines for Measuring Semantic Similarity	Jun 1, 2013	LemmatizationMachine Translation	—Unverified
UBC\_UOS-TYPED: Regression for typed-similarity	Jun 1, 2013	LemmatizationNamed Entity Recognition (NER)	—Unverified
[LVIC-LIMSI]: Using Syntactic Features and Multi-polarity Words for Sentiment Analysis in Twitter	Jun 1, 2013	General ClassificationLemmatization	—Unverified
NRC: A Machine Translation Approach to Cross-Lingual Word Sense Disambiguation (SemEval-2013 Task 10)	Jun 1, 2013	LemmatizationMachine Translation	—Unverified
SSA-UO: Unsupervised Sentiment Analysis in Twitter	Jun 1, 2013	LemmatizationProduct Recommendation	—Unverified
KLUE-CORE: A regression model of semantic textual similarity	Jun 1, 2013	LemmatizationQuestion Answering	—Unverified
Simultaneous Word-Morpheme Alignment for Statistical Machine Translation	Jun 1, 2013	LemmatizationMachine Translation	—Unverified
Morphological Analysis and Disambiguation for Dialectal Arabic	Jun 1, 2013	LemmatizationMachine Translation	—Unverified
Development of a Hindi Lemmatizer	May 24, 2013	LemmatizationMachine Translation	CodeCode Available
Lexical Categories for Improved Parsing of Web Data	Dec 1, 2012	Dependency ParsingLemmatization	—Unverified
The Floating Arabic Dictionary: An Automatic Method for Updating a Lexical Database through the Detection and Lemmatization of Unknown Words	Dec 1, 2012	Lemmatization	—Unverified
Enhancing Lemmatization for Mongolian and its Application to Statistical Machine Translation	Dec 1, 2012	Information RetrievalLemmatization	—Unverified
Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization	Sep 14, 2012	LemmatizationText Summarization	CodeCode Available
Statistical Parsing of Spanish and Data Driven Lemmatization	Jul 1, 2012	LemmatizationPart-Of-Speech Tagging	—Unverified
Handling Unknown Words in Arabic FST Morphology	Jul 1, 2012	Lemmatization	—Unverified
WSD for n-best reranking and local language modeling in SMT	Jul 1, 2012	Language ModelingLanguage Modelling	—Unverified
SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media	Jul 1, 2012	LemmatizationSentiment Analysis	—Unverified
Probabilistic Lexical Generalization for French Dependency Parsing	Jul 1, 2012	Dependency ParsingLemmatization	—Unverified
ANNLOR: A Na\" Notation-system for Lexical Outputs Ranking	Jul 1, 2012	LemmatizationLexical Simplification	—Unverified
CELI: An Experiment with Cross Language Textual Entailment	Jul 1, 2012	LemmatizationNamed Entity Recognition (NER)	—Unverified
Enrichir et raisonner sur des espaces s\'emantiques pour l'attribution de mots-cl\'es (Enriching and reasoning on semantic spaces for keyword extraction) [in French]	Jun 1, 2012	ChunkingKeyword Extraction	—Unverified
Indexation libre et contr\^ol\'ee d'articles scientifiques. Pr\'esentation et r\'esultats du d\'efi fouille de textes DEFT2012 (Controlled and free indexing of scientific papers. Presentation and results of the DEFT2012 text-mining challenge) [in French]	Jun 1, 2012	Lemmatization	—Unverified
Mining wisdom	Jun 1, 2012	LemmatizationText Classification	—Unverified
NeoTag: a POS Tagger for Grammatical Neologism Detection	May 1, 2012	LemmatizationPOS	—Unverified
Analyzing and Aligning German compound nouns	May 1, 2012	LemmatizationTranslation	—Unverified
The Political Speech Corpus of Bulgarian	May 1, 2012	LemmatizationMorphological Analysis	—Unverified
Adapting and evaluating a generic term extraction tool	May 1, 2012	LemmatizationTerm Extraction	—Unverified
Linguistic Analysis Processing Line for Bulgarian	May 1, 2012	Language ModellingLemmatization	—Unverified
The Netlog Corpus. A Resource for the Study of Flemish Dutch Internet Language	May 1, 2012	LemmatizationPOS	—Unverified
Ubiquitous Usage of a Broad Coverage French Corpus: Processing the Est Republicain corpus	May 1, 2012	ArticlesDependency Parsing	—Unverified
``Vreselijk mooi!'' (terribly beautiful): A Subjectivity Lexicon for Dutch Adjectives.	May 1, 2012	BIG-bench Machine LearningLemmatization	—Unverified
The goo300k corpus of historical Slovene	May 1, 2012	LEMMALemmatization	—Unverified
The annotation of the C-ORAL-BRASIL oral through the implementation of the Palavras Parser	May 1, 2012	Lemmatization	—Unverified
ROMBAC: The Romanian Balanced Annotated Corpus	May 1, 2012	ChunkingLemmatization	—Unverified
Building a multilingual parallel corpus for human users	May 1, 2012	Lemmatization	—Unverified
Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages	May 1, 2012	Lemmatization	—Unverified
Korp --- the corpus infrastructure of Spr	May 1, 2012	Lemmatization	—Unverified
Iula2Standoff: a tool for creating standoff documents for the IULACT	May 1, 2012	LemmatizationPOS	—Unverified
Holaaa!! writin like u talk is kewl but kinda hard 4 NLP	May 1, 2012	Domain AdaptationLanguage Modelling	—Unverified
First Steps towards the Semi-automatic Development of a Wordformation-based Lexicon of Latin	May 1, 2012	Information RetrievalLemmatization	—Unverified

Show:10 25 50

← PrevPage 7 of 8Next →

No leaderboard results yet.