Class IX - Chapter 2 AI Project Cycle Notes
Class IX - Chapter 2 AI Project Cycle Notes
Note: The data needs to be accurate and reliable as it ensures the efficiency of your
system.
Data Exploration:
Data exploration is the first step of data analysis which is used to visualize data to
uncover insights from the start or identify areas or patterns to dive into and dig
more. It allows for a deeper, more detailed, and better understanding of the data.
Modelling:
To implement your idea, you now look at different AI enabled algorithms which
work on computer vision (since you are working on visual data).
You go through several models and select the ones which match you are
requirements.
After choosing the model, you implement it. This is known as the modelling stage.
Evaluation:
As you move towards deploying your model in the real world, you test it in as many
ways as possible. The stage of testing the models is known as evaluation.
In this stage, we evaluate each and every model tried and choose the model which
gives the most efficient and reliable results.
Problem Scoping:
Data Acquisition
Data Acquisition: Data Acquisition consists of two words:
Data: Data refers to the raw facts, figures, information, or statistics.
Acquisition: Acquisition refers to acquiring data for the project.
Data
Data can be defined as a representation of facts or instructions about some entity
(students, school, sports, business, animals etc.) that can be processed or
communicated by humans or machines.
Data is a collection of facts such as numbers, words, pictures, audio clips, videos,
maps, measurements or even just descriptions of things.
Data maybe represented with the help of character such as alphabets a to z, A to Z
digits (0-9) or special characters (+, -,/,*,<>= etc.)
Data is classified into:
1. Structured Data
2. Unstructured Data
1. Structured data
Structure data is categorized as a quantitative data
It is a type of data most of us work with everyday
Structured Data has predefined data types and formats so that it fits well in the
column or fields of database or spreadsheets.
They are highly organized and easily analyzed.
The data is structured in accurately defined fields.
The data is stored in relational data bases or spreadsheets like Excel.
Examples of structure data are: name, age address etc
2. Unstructured data
Unstructured data Is categorized as qualitative data.
It cannot be Processed and analyzed using a conventional relational
database(RDBMS) methods.
Unstructured data is difficult to deconstruct because it has no predefined model,
meaning it cannot be organized in relational databases.
Instead, non-relational or no SQL database, are best fit for managing unstructured
data
Examples of unstructured data include video, audio, mobile activity, social media
activity, satellite imagery and Surveillance imagery and the list goes on.
Data sets
A data set is a set or Collection of data.
This set is normally presented in tabular form.
Every column describes a particular variable
And each row corresponds to a given number of data set as per the given question.
This is a part of data management.
The dataset consists of one or more members corresponding to each row.
Data sets describe values of a each variable for unknown quantities such as
height weight temperature volume extra of an object or values of Random
numbers. The values in the set are known as the Datum.
Training data
A training data set is a database of examples used during the learning process and
is used to fit the parameters.
Maximum part of the dataset comes under training data (usually 80% )
Test Data
A test set is a set of example used only to access the performance of the fully
specified classifier.
A very little part of the data set is used for test data( usually 20%).
Note: The training data and test data or not different, they are usually divided from
the main data set in 80-20%.
Data Features
A measurable piece of data that can be used for analysis.
In csv and Excel files they could be seen as columns.
Features are also sometimes referred to as variables or attributes.
Depending on what we are trying to analyse, the features we include in our data
set can vary widely.
System Maps
A system Map is a diagrammatic representation of a set of things working together.
It focuses on the components and boundaries of a system.
System Maps helps us to find relationships between different elements of the
program which we have scoped.
It helps to find a solution to achieve the goal of our project.
Rules for system maps are
The circle represents elements
Arrows are used to represent relationships or interconnections.
The + or - signs are indicators of the nature of relationships. The arrowhead
depicts the direction of the effect and the sign (+ or -) shows the relationship.
DATA EXPLORATION:
Data Exploration: Data exploration is the first step of data analysis which is used
to visualize data to uncover insights from the start or identify areas or patterns to
dive into and dig more. It allows for a deeper, more detailed, and better
understanding of the data.
Data Visualization is a part of this where we visualize and present the data in terms
of tables, pie charts, bar graphs, line graphs, bubble Map, etc.
Goal: The goal of data exploration is to learn about characteristics and potential
problems of a data set without the need to formulate assumptions about the data
beforehand.
In statistics, data exploration is often referred to as "exploratory data analysis" and
contrasts traditional hypothesis testing.
Since its beginnings, EDA has been a very graphical approach. Typical plots include
histograms, box plots, scatter plots and many more in order to learn about
distributions, correlations, Trends and other data characteristics.
Why to Explore?
Thus to analyse the data, we need to visualise it in some user friendly format so
that we can:
Quickly get a sense of trends relationships and patterns contained within the data.
Define strategy for which model to use at a later stage.
Communicate the same to others effectively.
Data Visualization Techniques: