0% found this document useful (0 votes)
26 views

What Is Data Anaysis

Uploaded by

5grnkkyscz
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

What Is Data Anaysis

Uploaded by

5grnkkyscz
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Class 1

What is data analysis?

Data is precious in today’s digital environment. It goes through several life stages,
including creation, testing, processing, consumption, and reuse.

Data analysis is the process of examining, cleaning, transforming, and modeling data
with the goal of discovering useful information, informing conclusions, and
supporting decision-making. Data analysis is used in various fields such as business,
healthcare, finance, social sciences, and many others to make informed decisions,
identify opportunities, and solve problems.

It involves several key steps and techniques:

Data Collection: Gathering raw data from various sources, such as databases,
surveys, experiments, or sensors.

Data Cleaning: Preparing the data for analysis by removing errors, handling
missing values, and correcting inconsistencies.

Data Transformation: Converting the data into a suitable format or structure


for analysis. This might include normalizing data, aggregating data, or
creating new derived variables.

Exploratory Data Analysis (EDA): Using statistical and graphical techniques


to understand the main characteristics of the data. This can involve visualizing
data distributions, identifying patterns, and detecting outliers.

Statistical Analysis: Applying statistical tests and models to identify


relationships, trends, and patterns within the data. This can include hypothesis
testing, regression analysis, and time series analysis.

Data Visualization: Creating charts, graphs, and other visual representations


of data to help communicate insights and findings effectively.

Predictive Modeling: Using machine learning and statistical models to make


predictions or forecasts based on historical data. Techniques can include linear
regression, decision trees, neural networks, and clustering algorithms.

Interpretation and Reporting: Summarizing the results of the analysis and


presenting them in a clear and actionable way, often through reports,
dashboards, or presentations.
What is Data Analytics Life Cycle?

These stages are mapped out in the Data Analytics Life Cycle for professionals
working on data analytics initiatives. Each stage has its own significance and
characteristics.

 Data Analytics Life Cycle Phases


o Phase 1: Data Discovery and Formation
o Phase 2: Data Preparation and Processing
o Phase 3: Design a Model
o Phase 4: Model Building
o Phase 5: Result Communication and Publication
o Phase 6: Measuring Effectiveness

The data analytics Life Cycle encompasses the process of producing, collecting,
processing, using, and analyzing data in order to meet corporate objectives. It offers a
systematic way for managing data into useful information that can help achieve
organizational or project goals; additionally, it provides guidance and strategies for
extracting this information and moving in the appropriate direction in order to meet
corporate objectives

Data professionals use the circular nature of the Life Cycle to go ahead or backward
with data analytics. Based on the new information, they can decide whether to
continue with their current research or abandon it and redo the entire analysis.
Throughout the process, they are guided by the Data Analytics Life Cycle.

Data Analytics Life Cycle Phases


The scientific method for creating a structured framework of the data analytics life
cycle involves six stages of architecture for data analytics. The framework is direct
and cyclical, meaning all big data analytics-related processes must be completed
sequentially.

Notably, these phases are circular; therefore they may be undertaken either forwards
or backwards. Below are six data analytics phases that serve as fundamental processes
in data science projects.

Phase 1: Data Discovery and Formation

Every good journey begins with a purpose in mind. In this phase, you will identify
your desired data objectives and how best to attain them through data analytics Life
Cycle implementation. Evaluations and assessments should also be undertaken during
this initial phase to develop a basic hypothesis capable of solving business issues or
problems.

In the initial step, data will be evaluated for its potential uses and demands – such as
where it comes from, what message you wish for it to send and how this incoming
information benefits your business.

As a data analyst, you will need to explore case studies using similar data analytics
and, most crucially, examine current company trends. Then you must evaluate all in-
house infrastructure and resources, as well as time and technological needs, in order
to match the previously acquired data.

Following the completion of the evaluations, the team closes this stage with
hypotheses that will be tested using data later on. This is the first and most critical
step in the life cycle of big data analytics.

Key takeaways:

 The data science team investigates and learns about the challenge.
 Create context and understanding.
 Learn about the data sources that will be required and available for the
project.
 The team develops preliminary hypotheses that can later be tested with
data.

Phase 2: Data Preparation and Processing

Data preparation and processing involves gathering, sorting, processing and purifying
collected information to make sure it can be utilized by subsequent steps of analysis.
An important element of this step is making sure all necessary information is readily
accessible before moving ahead with processing it further.

Following are methods of data acquisition

 Data Collection: Draw information from external sources.


 Data Entry: Within an organization, data entry refers to creating new
points of information using either digital technologies or manual input
procedures.
 An analytical sandbox is essential during the data preparation stage of
data analytics Life Cycle.

This phase of the analytical cycle does not need to take place in any particular order;
rather it can take place as necessary and be repeated at later times as appropriate.

Phase 3: Design a Model

After you’ve defined your business goals and gathered a large amount of data
(formatted, unformatted, or semi-formatted), it’s time to create a model that uses the
data to achieve the goal. Model planning is the name given to this stage of the data
analytics process.

There are numerous methods for loading data into the system and starting to analyze
it:

 ETL (Extract, Transform, and Load) converts the information before


loading it into a system using a set of business rules.
 ELT (Extract, Load, and Transform) loads raw data into the sandbox
before transforming it.
 ETLT (Extract, Transform, Load, Transform) is a combination of two
layers of transformation.

This step also involves teamwork to identify the approaches, techniques, and
workflow to be used in the succeeding phase to develop the model. The process of
developing a model begins with finding the relationship between data points to choose
the essential variables and, subsequently, create a suitable model.

Phase 4: Model Building


This stage of the data analytics life cycle involves creating datasets for testing,
training, and production. The data analytics professionals develop and operate the
model they designed in the previous stage with proper effort.

They use tools and methods to create and run the model. The experts also run the
model through a trial run to see if it matches the datasets.

It assists them in determining whether the tools they now have will be enough to
execute the model or if a more robust system is required for it to function
successfully.

Key Takeaways:

 The team creates datasets for use in testing, training, and production.
 The team also examines if its present tools will serve for running the
models or if a more robust environment is required for model execution.

 Rand PL/R, Octave, and WEKA are examples of free or open-source


tools.

Phase 5: Result Communication and Publication

Recall the objective you set for your company in phase 1. Now is the time to see if the
tests you ran in the previous phase matched those criteria.

The communication process begins with cooperation with key stakeholders to decide
whether the project’s outcomes are successful or not.

The project team is responsible for identifying the major conclusions of the analysis,
calculating the business value associated with the outcome, and creating a narrative to
summarize and communicate the results to stakeholders.

Phase 6: Measuring Effectiveness

As your data analytics life cycle comes to an end, the final stage is to offer
stakeholders a complete report that includes important results, coding, briefings, and
technical papers or documents.

Furthermore, to assess the effectiveness of the study, the data is transported from the
sandbox to a live environment and observed to see if the results match the desired
business aim.

If the findings meet the objectives, the reports and outcomes are finalized. However,
if the conclusion differs from the purpose stated in phase 1, then you can go back in
the data analytics life cycle to any of the previous phases to adjust your input and get
a different result.

Types of Data

Qualitative Data

Qualitative data is descriptive and conceptual. It is often categorized based on


properties, attributes, labels, and other identifiers and cannot usually be measured
with numbers. Qualitative data can be further classified into:

1. Nominal Data: Data that can be categorized but not ordered. The categories are
distinct and mutually exclusive.

Examples: Gender (male, female), eye color (blue, green, brown), type of
cuisine (Italian, Chinese, Indian).

Characteristics: There is no inherent order or ranking among the categories.

2. Ordinal Data: Data that can be categorized and ordered, but the differences between
the categories are not quantifiable.

Examples: Education level (high school, bachelor's, master's, doctorate),


customer satisfaction ratings (satisfied, dissatisfied).

Characteristics: The order matters, but the intervals between the values are not
consistent or meaningful.

Quantitative Data

Quantitative data is numerical and can be measured. It quantifies the characteristics it


describes and can be classified into:

3. Interval Data: Data with ordered categories where the intervals between values are
consistent and meaningful. However, there is no true zero point.

Examples: Temperature in Celsius or Fahrenheit, IQ scores.

Characteristics: The difference between values is meaningful, but ratios are not (e.g.,
20°C is not twice as hot as 10°C).

4. Ratio Data: Data with ordered categories where both intervals and ratios are
meaningful, and there is a true zero point.

Examples: Height, weight, age, income.


Characteristics: Both differences and ratios are meaningful (e.g., 20 kg is twice as
heavy as 10 kg, 10.3 is higher than 10).

Key Differences between Qualitative and Quantitative data

Understanding the difference between is crucial for selecting appropriate statistical


analyses and for accurately interpreting data in research.

Basis for comparison Qualitative Quantitative


Nature of data Descriptive, non-numeric Numeric, measurable
(nominal and ordinal). (interval and ratio).
Measurement Nominal: Classification Interval: Numeric with
without order. consistent intervals, but no
true zero.
Ordinal: Classification
with order but no Ratio: Numeric with
consistent interval consistent intervals and a
true zero.
Examples Nominal: Types of plants, Interval: Calendar years,
blood types, brands of temperature scales.
cars.
Ratio: Distance, duration,
Ordinal: Survey rankings, energy consumption.
levels of satisfaction,
socioeconomic status.
Types of analysis Descriptive statistics Inferential statistics

Objectives of data analysis

Data analysis can be categorized into four primary types based on its nature and
objectives: descriptive, diagnostic, predictive, and prescriptive. Each type serves a
different purpose and uses different techniques to analyze data.

 Descriptive Analysis: What happened? (Past)


 Diagnostic Analysis: Why did it happen? (Understanding past)
 Predictive Analysis: What will happen? (Future)
 Prescriptive Analysis: What should we do? (Actionable insights)

Descriptive Analysis Diagnostic Predictive Prescriptive Analysis


Analysis Analysis

Purpose To understand what To understand To forecast To recommend


has happened in the why something future events actions or strategies
past. happened. or trends based based on data
on historical
insights.
data.
Technique Summarization, Drill-down, Machine Optimization
s aggregation, and data mining, learning algorithms,
visualization of and correlation algorithms, simulation, and
historical data. analysis. statistical
decision analysis.
models, and
Use of statistical Use of root time series
measures like cause analysis analysis. Use of techniques
mean, median, and like linear
mode, standard identification Techniques programming,
deviation, and of patterns or like regression scenario analysis,
frequency counts anomalies. analysis, and expert systems.
classification,
clustering, and
neural
networks.

Example s Sales reports that Analysis to Sales Recommending


show total determine why forecasting optimal pricing
revenue, number a specific based on strategies to
of units sold, and marketing historical sales
maximize revenue.
average selling campaign led data and
price. to increased market
sales. conditions. Suggesting
Website analytics inventory
showing the Investigation Predicting management
number of into the reasons customer churn practices to reduce
visitors, page behind a spike by analyzing stockouts and
views, and in customer past customer overstock.
average session complaints. behavior and
duration. identifying risk
factors.
Out put Reports, Detailed Predictive Actionable
dashboards, and reports that models and recommendations
charts that explain causes forecasts that and decision
provide insights of observed provide likely
support tools that
into past trends or future
performance and issues, often outcomes,
guide specific
trends. including often with actions to achieve
visualizations associated desired outcomes.
that highlight probabilities or
key drivers or confidence
correlations. intervals.

Each type of analysis builds on the previous ones, providing a comprehensive


approach to data-driven decision-making.

You might also like