Page 347 - AI Computer 10
P. 347
Finally, the words have been converted to numbers. These numbers are the values of each word for each
document. Due to less amount data, words such as 'c'. 'and' also have large value in the example. But if we work
with large amount of data, the value of IDF increases and the value of that word decreases.
For example, if the total number of documents is 10 and the word “and” occurs in all 10 documents, then:
IDF(and) = 10/10 = 1
Thus, value of “and” = log(IDF) = log(1) = 0
On the other hand, if the “pollution” occurs in 3 documents, then:
IDF(pollution) = 10/3 = 3.3333…
Thus, value of “pollution” = = log(IDF) = log(3.3333) = 0.522
Thus, the word “pollution” has considerable value, whereas the word “and” has no value in the corpus.
The outcome of TFIDF can be summarised as follows:
u Stopwords are the words that occur in most of the documents with high frequencies, and thus have low
values.
u Words which have high term frequency but low document frequency, indicate that the word is important
for one document but not for the whole corpus.
u The TFIDF values help a NLP model understand which words are to be considered while processing the
natural language. Greater is the value of a word, higher is the importance of the word for a given corpus.
Applications of TFIDF
TFIDF is commonly used in the Natural Language Processing domain. Some of its applications are:
u Document Classification: It helps in classifying the type and genre of a document.
u Topic Modelling: It helps in predicting the topic for a corpus.
u Information Retrieval System: It is used to extract the important information out of a corpus.
u Stopword filtering: It helps in removing the unnecessary words out of a text body.
NO-CODE NLP TOOLS
No Code Natural Language Processing (NLP) tools allow users to create and implement NLP applications without
requiring programming skills. These platforms provide user-friendly interfaces, enabling individuals from various
backgrounds, such as business analysts, marketers, and educators, to learn the power of NLP for text analysis,
sentiment analysis, keyword extraction, chatbots, and more.
The popular No-Code NLP tools are:
u Monkey Learn: Monkey Learn offers an easy-to-use interface for text analysis, including sentiment analysis,
keyword extraction, classification and categorization. With the help of this program, users can build custom
models and visualize results dynamically.
213
213