UNIT -2
BASICS OF DATA ANALYTICS
Data analytics life cycle:
● The data analytic life cycle is designed for Big Data problems and
data science projects. The cycle is iterative to represent real project.
● Big Data, step–by–step methodology is needed to organize the
activities and tasks involved with acquiring, processing, analyzing,
and repurposing data.
Phase 1: Discovery
● The data science team learns and investigates the problem.
● Develop context and understanding.
● Come to know about data sources needed and available for the
project.
● The team formulates the initial hypothesis that can be later tested
with data.
Phase 2: Data Preparation
● Steps to explore, preprocess, and condition data before modeling
and analysis.
● It requires the presence of an analytic sandbox, the team
executes, loads, and transforms, to get data into the sandbox.
● Data preparation tasks are likely to be performed multiple times
and not in predefined order.
● Several tools commonly used for this phase are – Hadoop, Alpine
Miner, Open Refine, etc.
Phase 3: Model Planning
● The team explores data to learn about relationships between
variables and subsequently, selects key variables and the most
suitable models.
● In this phase, the data science team develops data sets for
training, testing, and production purposes.
● Team builds and executes models based on the work done in the
model planning phase.
● Several tools commonly used for this phase are – Matlab and
STASTICA.
Phase 4: Model Building
● Team develops datasets for testing, training, and production
purposes.
● Team also considers whether its existing tools will suffice for
running the models or if they need more robust environment for
executing models.
● Free or open-source tools – Rand PL/R, Octave, WEKA.
● Commercial tools – Matlab and STASTICA.
Phase 5: Communication Results
● After executing model team need to compare outcomes of
modeling to criteria established for success and failure.
● Team considers how best to articulate findings and outcomes to
various team members and stakeholders, taking into account
warning, assumptions.
● Team should identify key findings, quantify business value, and
develop narrative to summarize and convey findings to
stakeholders.
Phase 6: Operationalize
● The team communicates benefits of project more broadly and
sets up pilot project to deploy work in controlled way before
broadening the work to full enterprise of users.
● This approach enables team to learn about performance and
related constraints of the model in production environment on
small scale which make adjustments before full deployment.
● The team delivers final reports, briefings, codes.
● Free or open source tools – Octave, WEKA, SQL, MADlib.
Review of data:
Difference between data science and data analytics:
Feature Data science Data analytics
Python is the most commonly The Knowledge of Python
used language for data science and R Language is essential
Coding along with the use of other for Data Analytics.
language languages such as C++, Java,
Perl, etc.
Programming In-depth knowledge of Basic Programming skills is
Skills programming is required for necessary for data analytics.
data science.
Use of Machine Data Science makes use of Data Analytics does not use
Learning machine learning algorithms to machine learning to get the
get insights. insight of data.
Other Skills Data Science makes use of Data Hadoop Based analysis is
mining activities for getting used for getting conclusions
meaningful insights. from raw data.
Scope The scope of data science is The Scope of data analysis is
large. micro i.e., small.
Goals Data science deals with Data Analysis makes use of
explorations and new existing resources.
innovations.
Data Type Data Science mostly deals with Data Analytics deals with
unstructured data. structured data.
Statistical Skills Statistical skills are necessary The statistical skills are of
in the field of Data Science.. minimal or no use in data
analytics.
The Role of Data Analytics
Data analytics plays a pivotal role in enhancing operations, efficiency, and
performance across various industries by uncovering valuable patterns and
insights.
Implementing data analytics techniques can provide companies with a
competitive advantage. The process typically involves four fundamental
steps:
❖Data Mining :
This step involves gathering data and information from diverse sources
and transforming them into a standardized format for subsequent
analysis.
Data mining can be a time-intensive process compared to other steps
but is crucial for obtaining a comprehensive dataset.
❖Data Management : Once collected, data needs to be stored,
managed, and made accessible. Creating a database is essential for
managing the vast amounts of information collected during the
mining process.
SQL (Structured Query Language) remains a widely used tool for
database management, facilitating efficient querying and analysis of
relational databases.
❖Statistical Analysis : In this step, the gathered data is subjected to
statistical analysis to identify trends and patterns. Statistical
modeling is used to interpret the data and make predictions about
future trends.
Open-source programming languages like Python, as well as
specialized tools like R, are commonly used for statistical analysis and
graphical modeling.
❖Data Presentation : The insights derived from data analytics need
to be effectively communicated to stakeholders.
This final step involves formatting the results in a manner that is
accessible and understandable to various stakeholders, including
decision-makers, analysts, and shareholders. Clear and concise data
presentation is essential for driving informed decision-making and
driving business growth.
Usage of Data Analytics
There are some key domains and strategic planning techniques in which Data
Analytics has played a vital role:
● Improved Decision-Making – If we have supporting data in favor of
a decision, then we can implement them with even more success
probability.
For example, if a certain decision or plan has to lead to better outcomes
then there will be no doubt in implementing them again.
● Better Customer Service – Churn modeling is the best example of
this in which we try to predict or identify what leads to customer
churn and change those things accordingly, so that the attrition of
the customers is as low as possible which is a most important factor
in any organization.
● Efficient Operations – Data Analytics can help us understand what
is the demand of the situation and what should be done to get
better results then we will be able to streamline our processes
which in turn will lead to efficient operations.
● Effective Marketing – Market segmentation techniques have been
implemented to target this important factor only in which we are
supposed to find the marketing techniques which will help us
increase our sales and leads to effective marketing strategies.