Page 344 - AI Computer 10

P. 344

Step II: Create Dictionary

In this step, we make a list of all the words which occur in all the three documents such as:
Subin and Sohan are friends went to school park

While creating a dictionary, you should always remember that the words which are repeated in different
documents must be written once as the dictionary contains only unique words.

Step III: Create Document Vector

In this step, we create a document vector by writing a vocabulary and the frequency of each word in the
document. Thus
Subin and Sohan are friends went to school park
1 1 1 1 1 0 0 0 0

Here, we have mentioned the value ‘1’ for words which exist in Document 1 and '0' for those which did not.
Step IV: Create Document Vector for all the Documents

In this step, we create a document vector for all the documents such as:

Subin and Sohan are friends went to school park
1 1 1 1 1 0 0 0 0
1 0 0 0 0 1 1 1 0
0 0 1 0 0 1 1 0 1

Here, you have seen that the header row contains the vocabulary of the corpus and three rows correspond
to three different documents. This is the final document vector table for our corpus. But the tokens have still
not been converted to numbers. This leads us to the final steps of our algorithm: TFIDF, a technique to extract
important and relevant information from the corpus.

AI Activity
AI Activity
Google Cloud

Keyword Extraction in NLP involves automatically identifying and extracting the most important
words or phrases from a piece of text. These keywords represent the main topics or themes within the
text and are useful for tasks like document summarization, information retrieval, and content analysis.
The purpose of the activity is to learn how to utiliae an API for performing keyword extraction from a
website.
Step 1: Open a web browser and type the given URL in the address bar:
https://cloud.google.com/natural-language
Step 2: Click the Demo option from the menu on the left side of the page that opens.

210
210

339 340 341 342 343 344 345 346 347 348 349