Page 344 - AI Computer 10
P. 344

Step II: Create Dictionary

            In this step, we make a list of all the words which occur in all the three documents such as:
                     Subin      and       Sohan       are     friends     went        to      school      park

            While creating a  dictionary,  you should  always remember  that  the  words which are  repeated  in different
            documents must be written once as the dictionary contains only unique words.

            Step III: Create Document Vector

            In  this  step, we create a document vector by writing a vocabulary and  the frequency of  each word  in  the
            document. Thus
                     Subin      and       Sohan       are     friends     went        to       school     park
                       1         1          1          1         1          0         0          0          0

            Here, we have mentioned the value ‘1’ for words which exist in Document 1 and '0' for those which did not.
            Step IV: Create Document Vector for all the Documents

            In this step, we create a document vector for all the documents such as:

                    Subin       and      Sohan        are     friends     went        to       school     park
                       1         1          1          1         1          0          0         0          0
                       1         0          0          0         0          1          1         1          0
                       0         0          1          0         0          1          1         0          1

            Here, you have seen that the header row contains the vocabulary of the corpus and three rows correspond
            to three different documents. This is the final document vector table for our corpus. But the tokens have still
            not  been converted to numbers. This leads us to the final steps of our algorithm: TFIDF, a technique to extract
            important and relevant information from the corpus.

                     AI Activity
                     AI Activity
                                                                                                      Google Cloud

              Keyword Extraction in NLP involves automatically identifying and extracting the most important
              words or phrases from a piece of text. These keywords represent the main topics or themes within the
              text and are useful for tasks like document summarization, information retrieval, and content analysis.
              The purpose of the activity is to learn how to utiliae an API for performing keyword extraction from a
              website.
              Step 1:  Open a web browser and type the given URL in the address bar:
                       https://cloud.google.com/natural-language
              Step 2:  Click the Demo option from the menu on the left side of the page that opens.




















                210
                210
   339   340   341   342   343   344   345   346   347   348   349