Page 344 - AI Computer 10
P. 344
Step II: Create Dictionary
In this step, we make a list of all the words which occur in all the three documents such as:
Subin and Sohan are friends went to school park
While creating a dictionary, you should always remember that the words which are repeated in different
documents must be written once as the dictionary contains only unique words.
Step III: Create Document Vector
In this step, we create a document vector by writing a vocabulary and the frequency of each word in the
document. Thus
Subin and Sohan are friends went to school park
1 1 1 1 1 0 0 0 0
Here, we have mentioned the value ‘1’ for words which exist in Document 1 and '0' for those which did not.
Step IV: Create Document Vector for all the Documents
In this step, we create a document vector for all the documents such as:
Subin and Sohan are friends went to school park
1 1 1 1 1 0 0 0 0
1 0 0 0 0 1 1 1 0
0 0 1 0 0 1 1 0 1
Here, you have seen that the header row contains the vocabulary of the corpus and three rows correspond
to three different documents. This is the final document vector table for our corpus. But the tokens have still
not been converted to numbers. This leads us to the final steps of our algorithm: TFIDF, a technique to extract
important and relevant information from the corpus.
AI Activity
AI Activity
Google Cloud
Keyword Extraction in NLP involves automatically identifying and extracting the most important
words or phrases from a piece of text. These keywords represent the main topics or themes within the
text and are useful for tasks like document summarization, information retrieval, and content analysis.
The purpose of the activity is to learn how to utiliae an API for performing keyword extraction from a
website.
Step 1: Open a web browser and type the given URL in the address bar:
https://cloud.google.com/natural-language
Step 2: Click the Demo option from the menu on the left side of the page that opens.
210
210