Introduction to Data
Analytics
Dr. Anil Kumar Dubey
Associate Professor,
Computer Science & Engineering Department,
ABES EC, Ghaziabad
Affiliated to Dr. A.P.J. Abdul Kalam Technical University,
Uttar Pradesh, Lucknow
Basic
Data analytics is the process of storing,
organizing, and analyzing raw data to answer
questions or gain important insights. Data
analytics is integral to business because it allows
leadership to create evidence-based strategy,
understand customers to better target marketing
initiatives, and increase overall productivity.
Data analytics is the collection, transformation,
and organization of data in order to draw
conclusions, make predictions, and drive informed
Basic
Data analytics is a multidisciplinary field that
employs a wide range of analysis techniques,
including math, statistics, and computer science,
to draw insights from data sets.
Data analytics is a broad term that includes
everything from simply analyzing data to
theorizing ways of collecting data and creating
the frameworks needed to store it.
Data analytics is the science of analyzing raw
data to make conclusions about that
information.
Basic
Data analytics help a business optimize its
performance, perform more efficiently, maximize
profit, or make more strategically-guided decisions.
The techniques and processes of data analytics have
been automated into mechanical processes and
algorithms that work over raw data for human
consumption.
Various approaches to data analytics include
descriptive analytics, diagnostic analytics, predictive
analytics, and prescriptive analytics.
Conti…
Data analytics relies on a variety of software tools
including spreadsheets, data visualization, reporting
tools, data mining programs, and open-source languages.
Four key types of data analytics: descriptive, diagnostic,
predictive, and prescriptive.
Together, these four types of data analytics can help an
organization make data-driven decisions.
◦ Descriptive analytics tell us what happened.
◦ Diagnostic analytics tell us why something
happened.
◦ Predictive analytics tell us what will likely happen in
the future.
Types of Data Analytics
Four major types:
◦ Predictive (forecasting)
◦ Descriptive (business intelligence and data
mining)
◦ Prescriptive (optimization and simulation)
◦ Diagnostic analytics
Conti…
Predictive analytics
Predictive analytics turn the data into valuable,
actionable information. predictive analytics uses
data to determine the probable outcome of an
event or a likelihood of a situation occurring.
Predictive analytics holds a variety of statistical
techniques from modeling, ML, data mining ,
and game theory that analyze current and
historical facts to make predictions about a
future event.
Predictive analytics
Techniques that are used for predictive analytics
are:
◦ Linear Regression
◦ Time Series Analysis and Forecasting
◦ Data Mining
Basic Cornerstones of Predictive Analytics
◦ Predictive modeling
◦ Decision Analysis and optimization
◦ Transaction profiling
Predictive analytics
Descriptive analytics looks at data and analyze
past event for insight as to how to approach future
events.
Itlooks at past performance and understands the
performance by mining historical data to
understand the cause of success or failure in the
past.
Almost all management reporting such as sales,
marketing, operations, and finance uses this type
Predictive analytics
The descriptive model quantifies relationships in
data in a way that is often used to classify
customers or prospects into groups.
Unlike a predictive model that focuses on predicting
the behavior of a single customer, Descriptive
analytics identifies many different relationships
between customer and product.
Predictive analytics
Common examples of Descriptive analytics are
company reports that provide historic reviews like:
◦ Data Queries
◦ Reports
◦ Descriptive Statistics
◦ Data dashboard
Prescriptive Analytics
PrescriptiveAnalytics automatically synthesize
big data, mathematical science, business rule,
and machine learning to make a prediction and
then suggests a decision option to take
advantage of the prediction.
Prescriptive analytics goes beyond predicting
future outcomes by also suggesting action
benefits from the predictions and showing the
decision maker the implication of each decision
option.
Prescriptive Analytics
Prescriptive Analytics not only anticipates what
will happen and when to happen but also why it
will happen. Further, Prescriptive Analytics can
suggest decision options on how to take
advantage of a future opportunity or mitigate a
future risk and illustrate the implication of each
decision option.
For example, Prescriptive Analytics can benefit
healthcare strategic planning by using analytics to
leverage operational and usage data combined
with data of external factors such as economic
data, population demography, etc.
Diagnostic Analytics
In this analysis, we generally use historical data over
other data to answer any question or for the solution
of any problem. We try to find any dependency and
pattern in the historical data of the particular
problem.
For example, companies go for this analysis because
it gives a great insight into a problem, and they also
keep detailed information about their disposal
otherwise data collection may turn out individual for
every problem and it will be very time-consuming.
Diagnostic Analytics
Common techniques used for Diagnostic Analytics
are:
◦ Data discovery
◦ Data mining
◦ Correlations
Steps in Data Analysis
Define Data Requirements : This involves
determining how the data will be grouped or
categorized. Data can be segmented based on
various factors such as age, demographic,
income, or gender, and can consist of numerical
values or categorical data.
Data Collection : Data is gathered from
different sources, including computers, online
platforms, cameras, environmental sensors, or
through human personnel.
Steps in Data Analysis
Data Organization : Once collected, data needs to
be organized in a structured format to facilitate
analysis. This could involve using spreadsheets or
specialized software designed for managing and
analyzing statistical data.
Data Cleaning : Before analysis, data undergoes a
cleaning process to ensure accuracy and reliability.
This involves identifying and removing any duplicate
or erroneous entries, as well as addressing any
missing or incomplete data. Cleaning data helps to
mitigate potential biases and errors that could affect
Data Analytics Tool
Data Analytics How It’s Used
Tool
Artificial Makes decisions that can provide a plausible likelihood in achieving a
Intelligence goal
NoSQL Database Delivers a method for accumulation and retrieval of data
R Programming Assists data scientists in designing statistical software
Data Lakes Accumulates data without transforming it into structured data
Predictive Predicts future behavior via prior data
Analytics
Apache Spark Generates big data transformation via Python, R, Scala and Java
Prescriptive Provides guidance about what to do to achieve a desired outcome
Analytics
In-Memory Saves time by omitting the requirements to access hard drives
Database
Hadoop Ecosystem Ingests, stores, analyzes and maintains large data sets
Blockchain Distributed ledger technologies have proven valuable in managing data
challenges
Sources and nature of data
Data collection is the process of acquiring,
collecting, extracting, and storing the voluminous
amount of data which may be in the structured or
unstructured form like text, video, audio, XML
files, records, or other image files used in later
stages of data analysis.
In the process of big data analysis, “Data
collection” is the initial step before starting to
analyze the patterns or useful information in data.
The data which is to be analyzed must be
collected from different valid sources.
Sources and nature of data
Primary data
The data which is Raw, original, and extracted
directly from the official sources is known as
primary data. This type of data is collected
directly by performing techniques such as
questionnaires, interviews, and surveys.
The data collected must be according to the
demand and requirements of the target
audience on which analysis is performed
otherwise it would be a burden in the data
processing. Few methods of collecting primary
data:
Secondary data
Secondary data is the data which has already
been collected and reused again for some
valid purpose.
Thistype of data is previously recorded from
primary data and it has two types of sources
named internal source and external source.
Classification of data
Big Data includes huge volume, high
velocity, and extensible variety of data. There
are 3 types: Structured data, Semi-structured
data, and Unstructured data.
◦ Structured
◦ Semi-structured
◦ Unstructured
Structured data
Structured data is data whose elements are addressable
for effective analysis.
It has been organized into a formatted repository that is
typically a database.
It concerns all data which can be stored in
database SQL in a table with rows and columns.
They have relational keys and can easily be mapped
into pre-designed fields. Today, those data are most
processed in the development and simplest way to
manage information.
Example: Relational data.
Semi-Structured data
Semi-structured data is information that does not reside
in a relational database but that has some
organizational properties that make it easier to analyze.
Withsome processes, you can store them in the relation
database (it could be very hard for some kind of semi-
structured data), but Semi-structured exist to ease
space.
Example: XML data.
Unstructured data
Unstructured data is a data which is not organized
in a predefined manner or does not have a
predefined data model, thus it is not a good fit for a
mainstream relational database.
So for Unstructured data, there are alternative
platforms for storing and managing, it is
increasingly prevalent in IT systems and is used by
organizations in a variety of business intelligence
and analytics applications.
Example: Word, PDF, Text, Media logs.
Diff types of Data
Properties Structured data Semi-structured data Unstructured data
It is based on
It is based on Relational It is based on character
Technology XML/RDF(Resource Description
database table and binary data
Framework).
Matured transaction and No transaction
Transaction Transaction is adapted from
various concurrency management and no
management DBMS not matured
techniques concurrency
Version Versioning over Versioning over tuples or graph
Versioned as a whole
management tuples,row,tables is possible
It is more flexible than It is more flexible and
It is schema dependent and
Flexibility structured data but less there is absence of
less flexible
flexible than unstructured data schema
It is very difficult to scale DB It’s scaling is simpler than
Scalability It is more scalable.
schema structured data
New technology, not very
Robustness Very robust —
spread
Query Structured query allow Queries over anonymous nodes Only textual queries are
performance complex joining are possible possible
Need of data analytics
Implementing data analytics into the
business model means companies can help
reduce costs by identifying more efficient
ways of doing business.
A company can also use data analytics to
make better business decisions.
Thanks