Page 143 - AI Computer 10
P. 143
DATA ACQUISITION
An AI system is completely based on data and needs large volume of data. Thus, before initiating an AI project,
all the data requirements must be identified and gathered. This process of identifying data requirements and
gathering the required data is known as Data Acquisition.
An AI model works on the basis of data which is being fed by the programmer. It is trained to predict the desired
output by entering accurate and inaccurate datasets and making the model learn to identify the differences
between the two. Let us understand the concept of datasets with the help of simple example.
Example: Suppose you want to develop an AI enabled system which can predict the difference between a cat and
a dog on the basis of images. To do this, you would need to feed hundreds of different images of both animals
into the machine. These images act as a dataset for an AI system using which it can predict the differences
between the two. The data with which the machine can be trained is called Training data.
Data Feature
The term ‘Data Feature’ also plays an important role in the AI project life cycle. Data Feature refers to the type
of data that you want to collect for the problem scoped. In the annual function example, most appropriate data
features would be facial characteristics, name, employee ID, date of joining, department, designation, etc.
Sources of Data
Nowadays, data is present everywhere, but many times, it is hard to find reliable sources of data. The different
types of data sources are similar to a treasure trove of information. AI systems rely on various data sources to
learn, analyse, and make decisions.
There are various sources to collect relevant data. Some of these are:
u Surveys: Surveys are a way of collecting data form a group of people in order to gain information and insights
into various topics of interest. The process involves asking people for information through questionnaires
(telephonic or in-person) which can be online or offline. Surveys provide reliable primary data.
u Web Scraping: Web Scraping or Data scraping is the method of downloading information from the Internet.
It involves sifting through various websites (government, non-profit, or commercial) that gather data about
various entities and downloading the required data, either textual or graphical.
u Sensors: Sensors are devices that collect physical parameters like voltage, pressure, intensity of light, or
temperature and convert these into electrical impulses. Sensors collect live data and transmit it for further
analysis and interpretation. Sensors include devices such as lenses, photo-sensors, motion-sensors,
galvanometers, voltmeters, and thermostats.
u Cameras: Cameras are used to collect data in the form of still images or videos. CCTV or surveillance cameras
are sources of visual data the can be acquired from various places.
u Observation: Data can be collected through
human observation and can be used as a labelled
dataset for training the model. Surveys Web Scraping Sensors
u Application Program Interface (APIs): APIs are a
set of functions and procedures that allow one
API
application to connect to another. APIs retrieve Cameras Observation (Application Program
data from external services such as weather Interface)
data, stock market information, and social media
platforms.
While extracting data, you should remember that only reliable sources can provide authentic and good quality
data. The name of some open sourced and reliable websites are: data.gov.in and india.gov.in.
9 9