SOTAVerified|Agents Browse Leaderboard About Blog

Lemmatization

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–60 of 351 papers

Title	Date	Tasks	Status	Score
Enhancing Sequence-to-Sequence Neural Lemmatization with External Resources	Jan 28, 2021	Data AugmentationDecoder	CodeCode Available	5
Transformers on Multilingual Clause-Level Morphology	Nov 3, 2022	Data AugmentationLanguage Modelling	CodeCode Available	5
Cross-lingual Named Entity Corpus for Slavic Languages	Mar 30, 2024	LEMMALemmatization	CodeCode Available	5
Training Data Augmentation for Context-Sensitive Neural Lemmatization Using Inflection Tables and Raw Text	Apr 2, 2019	Data AugmentationLEMMA	CodeCode Available	5
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology	Jul 23, 2019	LEMMALemmatization	CodeCode Available	5
Cross-Lingual Lemmatization and Morphology Tagging with Two-Stage Multilingual BERT Fine-Tuning	Aug 1, 2019	LemmatizationMorphological Analysis	CodeCode Available	5
DBTagger: Multi-Task Learning for Keyword Mapping in NLIDBs Using Bi-Directional Recurrent Neural Networks	Jan 11, 2021	LemmatizationMulti-Task Learning	CodeCode Available	5
SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts	May 1, 2018	Domain AdaptationLemmatization	CodeCode Available	5
Sudachi: a Japanese Tokenizer for Business	May 1, 2018	ChunkingLemmatization	CodeCode Available	5
From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources	Aug 1, 2018	Coreference ResolutionLemmatization	CodeCode Available	5

Show:10 25 50

← PrevPage 6 of 36Next →

No leaderboard results yet.