0% found this document useful (0 votes)
7 views

_unit2 DATA SCIENCE

The document outlines the data analytics life cycle, which consists of six phases: Discovery, Data Preparation, Model Planning, Model Building, Communication Results, and Operationalize, each detailing specific tasks and tools used. It also distinguishes between data science and data analytics, highlighting their differences in scope, goals, and required skills. Additionally, it discusses the pivotal role of data analytics in enhancing operations and decision-making across industries through techniques like data mining, management, statistical analysis, and presentation.

Uploaded by

ragavihr131211
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

_unit2 DATA SCIENCE

The document outlines the data analytics life cycle, which consists of six phases: Discovery, Data Preparation, Model Planning, Model Building, Communication Results, and Operationalize, each detailing specific tasks and tools used. It also distinguishes between data science and data analytics, highlighting their differences in scope, goals, and required skills. Additionally, it discusses the pivotal role of data analytics in enhancing operations and decision-making across industries through techniques like data mining, management, statistical analysis, and presentation.

Uploaded by

ragavihr131211
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

UNIT -2

BASICS OF DATA ANALYTICS

Data analytics life cycle:


●​ The data analytic life cycle is designed for Big Data problems and
data science projects. The cycle is iterative to represent real project.

●​ Big Data, step–by–step methodology is needed to organize the


activities and tasks involved with acquiring, processing, analyzing,
and repurposing data.

Phase 1: Discovery

●​ The data science team learns and investigates the problem.

●​ Develop context and understanding.

●​ Come to know about data sources needed and available for the

project.

●​ The team formulates the initial hypothesis that can be later tested

with data.

Phase 2: Data Preparation

●​ Steps to explore, preprocess, and condition data before modeling

and analysis.

●​ It requires the presence of an analytic sandbox, the team

executes, loads, and transforms, to get data into the sandbox.


●​ Data preparation tasks are likely to be performed multiple times

and not in predefined order.

●​ Several tools commonly used for this phase are – Hadoop, Alpine

Miner, Open Refine, etc.

Phase 3: Model Planning

●​ The team explores data to learn about relationships between

variables and subsequently, selects key variables and the most

suitable models.

●​ In this phase, the data science team develops data sets for

training, testing, and production purposes.

●​ Team builds and executes models based on the work done in the

model planning phase.

●​ Several tools commonly used for this phase are – Matlab and

STASTICA.

Phase 4: Model Building

●​ Team develops datasets for testing, training, and production

purposes.

●​ Team also considers whether its existing tools will suffice for

running the models or if they need more robust environment for

executing models.
●​ Free or open-source tools – Rand PL/R, Octave, WEKA.

●​ Commercial tools – Matlab and STASTICA.

Phase 5: Communication Results

●​ After executing model team need to compare outcomes of

modeling to criteria established for success and failure.

●​ Team considers how best to articulate findings and outcomes to

various team members and stakeholders, taking into account

warning, assumptions.

●​ Team should identify key findings, quantify business value, and

develop narrative to summarize and convey findings to

stakeholders.

Phase 6: Operationalize

●​ The team communicates benefits of project more broadly and

sets up pilot project to deploy work in controlled way before

broadening the work to full enterprise of users.

●​ This approach enables team to learn about performance and

related constraints of the model in production environment on

small scale which make adjustments before full deployment.

●​ The team delivers final reports, briefings, codes.

●​ Free or open source tools – Octave, WEKA, SQL, MADlib.


Review of data:

Difference between data science and data analytics:

Feature Data science Data analytics


Python is the most commonly The Knowledge of Python
used language for data science and R Language is essential
Coding along with the use of other for Data Analytics.
language languages such as C++, Java,
Perl, etc.

Programming In-depth knowledge of Basic Programming skills is


Skills programming is required for necessary for data analytics.
data science.

Use of Machine Data Science makes use of Data Analytics does not use
Learning machine learning algorithms to machine learning to get the
get insights. insight of data.

Other Skills Data Science makes use of Data Hadoop Based analysis is
mining activities for getting used for getting conclusions
meaningful insights. from raw data.
Scope The scope of data science is The Scope of data analysis is
large. micro i.e., small.

Goals Data science deals with Data Analysis makes use of


explorations and new existing resources.
innovations.

Data Type Data Science mostly deals with Data Analytics deals with
unstructured data. structured data.

Statistical Skills Statistical skills are necessary The statistical skills are of
in the field of Data Science.. minimal or no use in data
analytics.

The Role of Data Analytics


Data analytics plays a pivotal role in enhancing operations, efficiency, and
performance across various industries by uncovering valuable patterns and
insights.

Implementing data analytics techniques can provide companies with a


competitive advantage. The process typically involves four fundamental
steps:

❖​Data Mining :

This step involves gathering data and information from diverse sources

and transforming them into a standardized format for subsequent

analysis.

Data mining can be a time-intensive process compared to other steps

but is crucial for obtaining a comprehensive dataset.


❖​Data Management : Once collected, data needs to be stored,

managed, and made accessible. Creating a database is essential for

managing the vast amounts of information collected during the

mining process.

SQL (Structured Query Language) remains a widely used tool for

database management, facilitating efficient querying and analysis of

relational databases.

❖​Statistical Analysis : In this step, the gathered data is subjected to

statistical analysis to identify trends and patterns. Statistical

modeling is used to interpret the data and make predictions about

future trends.

Open-source programming languages like Python, as well as

specialized tools like R, are commonly used for statistical analysis and

graphical modeling.

❖​Data Presentation : The insights derived from data analytics need

to be effectively communicated to stakeholders.

This final step involves formatting the results in a manner that is

accessible and understandable to various stakeholders, including

decision-makers, analysts, and shareholders. Clear and concise data


presentation is essential for driving informed decision-making and

driving business growth.

Usage of Data Analytics


There are some key domains and strategic planning techniques in which Data
Analytics has played a vital role:

●​ Improved Decision-Making – If we have supporting data in favor of

a decision, then we can implement them with even more success

probability.

For example, if a certain decision or plan has to lead to better outcomes

then there will be no doubt in implementing them again.

●​ Better Customer Service – Churn modeling is the best example of

this in which we try to predict or identify what leads to customer

churn and change those things accordingly, so that the attrition of

the customers is as low as possible which is a most important factor

in any organization.

●​ Efficient Operations – Data Analytics can help us understand what

is the demand of the situation and what should be done to get

better results then we will be able to streamline our processes

which in turn will lead to efficient operations.

●​ Effective Marketing – Market segmentation techniques have been

implemented to target this important factor only in which we are


supposed to find the marketing techniques which will help us

increase our sales and leads to effective marketing strategies.

You might also like