Page 143 - AI Computer 10
P. 143

DATA ACQUISITION

        An AI system is completely based on data and needs large volume of data. Thus, before initiating an AI project,
        all the data requirements must be identified and gathered. This process of identifying data requirements and
        gathering the required data is known as Data Acquisition.

        An AI model works on the basis of data which is being fed by the programmer. It is trained to predict the desired
        output by entering accurate and inaccurate datasets and making the model learn to identify the differences
        between the two. Let us understand the concept of datasets with the help of simple example.

        Example: Suppose you want to develop an AI enabled system which can predict the difference between a cat and
        a dog on the basis of images. To do this, you would need to feed hundreds of different images of both animals
        into the machine. These images act as a dataset for an AI system using which it can predict the differences
        between the two. The data with which the machine can be trained is called Training data.

        Data Feature
        The term ‘Data Feature’ also plays an important role in the AI project life cycle. Data Feature refers to the type
        of data that you want to collect for the problem scoped. In the annual function example, most appropriate data
        features would be facial characteristics, name, employee ID, date of joining, department, designation, etc.
        Sources of Data

        Nowadays, data is present everywhere, but many times, it is hard to find reliable sources of data. The different
        types of data sources are similar to a treasure trove of information. AI systems rely on various data sources to
        learn, analyse, and make decisions.
        There are various sources to collect relevant data. Some of these are:
         u Surveys:  Surveys are a way of collecting data form a group of people in order to gain information and insights
             into various topics of interest. The process involves asking people for information through questionnaires
             (telephonic or in-person) which can be online or offline. Surveys provide reliable primary data.
         u Web Scraping:  Web Scraping or Data scraping is the method of downloading information from the Internet.
             It involves sifting through various websites (government, non-profit, or commercial) that gather data about
             various entities and downloading the required data, either textual or graphical.

         u Sensors: Sensors are devices that collect physical parameters like voltage, pressure, intensity of light, or
             temperature and convert these into electrical impulses. Sensors collect live data and transmit it for further
             analysis  and  interpretation.  Sensors include  devices such  as lenses, photo-sensors, motion-sensors,
             galvanometers, voltmeters, and thermostats.

         u Cameras: Cameras are used to collect data in the form of still images or videos. CCTV or surveillance cameras
             are sources of visual data the can be acquired from various places.

         u Observation: Data can be collected through
             human observation and can be used as a labelled
             dataset for training the model.                       Surveys        Web Scraping         Sensors

         u Application Program Interface (APIs): APIs are a
             set of functions and procedures that allow one
                                                                                                          API
             application to connect to another. APIs retrieve     Cameras          Observation      (Application Program
             data from external services such  as weather                                               Interface)
             data, stock market information, and social media
             platforms.
        While extracting data, you should remember that only reliable sources can provide authentic and good quality
        data. The name of some open sourced and reliable websites are: data.gov.in and india.gov.in.

                                                                                                              9 9
   138   139   140   141   142   143   144   145   146   147   148