what is lemmatization in nlp

what is lemmatization in nlp

1 year ago 64
Nature

Lemmatization is a text pre-processing technique used in natural language processing (NLP) models to break a word down to its root meaning to identify similarities. It is the process of grouping together the inflected forms of a word so they can be analyzed as a single item, identified by the words lemma, or dictionary form. Unlike stemming, lemmatization depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence, such as neighboring sentences or even an entire document. The goal of lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. The difference between stemming and lemmatization is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech, whereas lemmatization algorithms refer to a dictionary to understand the meaning of the word before reducing it to its root word, or lemma. Lemmatization is commonly applied in artificial intelligence (AI), big data analytics, chatbots, machine learning (ML), and NLP.

Read Entire Article