Data Analysis _Unit1
Data Analysis _Unit1
Data Analytics
Course Instructor
Dr. Himanshu Rai
Data Analytics (KIT-601)
Full Credit Course
4 Credit
150 marks
External-100
Internal - 50
Vector Algebra
Dot & cross product of Vectors
Introduction
The importance of data analytics in any sector is compounded,
creating enormous quantities of knowledge that can provide
useful insights into the field. In the last ten years, this has led
to a surge in the data market.
In order to gain decision-making insights, the compilation of
data can be supplemented by its analysis. Data analytics help
organizations and businesses gain insight into the enormous
amount of knowledge they need for further production and
growth.
What Is data ?
Data is a collection of facts, such
as numbers, words,
measurements, observations or
just descriptions of things.
Why ?
Classification of Data
Structured Data
Structured data is data whose elements are addressable for
effective analysis.
It has been organized into a formatted repository that is
typically a database.
Today, those data are most processed in the development and
simplest way to manage information. Example: Relational
data.
Structured Data
Examples Of Structured Data
An 'Employee' table in a database
Semi-Structured data
Semi-structured data is a form of structured data that does
not obey the tabular structure of data models associated with
relational databases or other forms of data tables, but
nonetheless contains tags or other markers to separate
semantic elements and enforce hierarchies of records and
fields within the data.
With some process, you can store them in the relation
database.
Example: XML data.
Semi-Structured data
Examples Of Semi-structured Data
Personal data stored in an XML file-
Unstructured data
Unstructured data is a data which is not organized in a
predefined manner or does not have a predefined data model.
For Unstructured data, there are alternative platforms for
storing and managing,
It is increasingly prevalent in IT systems and is used by
organizations in a variety of business intelligence and analytics
applications.
Example: Word, PDF, Text, Media logs.
Unstructured data
Differences
Differences
Characteristic of Data
The Nine characteristics that define data are:
1. Accuracy and Precision
2. Completeness and Comprehensiveness
3. Reliability and Consistency
4. Relevance
5. Timeliness
6. Objectivity
7. Granularity
8. Availability and Accessibility
9. Confidentiality
Characteristic of Data
1. Accuracy: Data should be accurate, meaning that it is a true
representation of the real-world phenomenon it is intended to measure.
Lifecycle
Phase 1: Discovery
1. Learning the Business Domain
2. Resources
3. Framing the Problem
4. Identifying Key Stakeholders
5. Interviewing the Analytics
Sponsor
6. Developing Initial Hypotheses
7. Identifying Potential Data
Sources
Phase 2: Data Preparation
Phase 2: Data Preparation
Steps to explore, preprocess, and
condition data prior to modeling and
analysis.
It requires the presence of an analytic
sandbox, the team execute, load, and
transform, to get