0% found this document useful (0 votes)

345 views30 pages

Lesson 6 Data Life Cycle Part 2

The document discusses the key phases and activities involved in model planning for data analytics projects. These include exploring the data, selecting relevant variables, identifying candidate models based on the project goals and data structure, and developing datasets for training and testing the models. Iteration between model planning and building phases is common to refine the models. The overall goal is to select the right analytical techniques and variables to address the business objectives.

Uploaded by

Neerom Baldemoro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

345 views30 pages

Lesson 6 Data Life Cycle Part 2

Uploaded by

Neerom Baldemoro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Introduction to

Data Science
DATA ANALYTICS LIFE CYCLE
PART 2
Module Objectives
At the end of this module, students must be able to:
1. describe the processes involves in model planning such as data
exploration, variable and model selection;
2. enumerate the key decisions needed to finalize the model as well as
the tools available for model building;
3. discuss the importance of communicating the results obtained to key
stakeholders;
4. describe the steps in operationalizing the results;
Recap:
From previous discussion, we learn about
1. an overview of the data analytics life cycle;
2. the seven key roles in an analytics project;
3. the discovery phase(phase 1) where data science team learns about
the business domain, assesses resources available as well as formulate
initial hypotheses to test in learning about the data.
4. the data preparation phase (phase 2) about preparation of the analytic
sandbox, performing ETLT, data conditioning, etc.
Phase 3 – Model Planning

 In Phase 3, the data science team identifies

candidate models to apply to the data for
clustering, classifying, or finding relationships
in the data depending on the goal of the
project, as shown

 It is during this phase that the team refers to

the hypotheses developed in Phase 1, when
they first became acquainted with the data and
understanding the business problems or
domain area.

 These hypotheses help the team frame the

analytics to execute in Phase 4 and select the
right methods to achieve its objectives.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 3 – Model Planning
Some of the activities to consider in this phase include the following:
 Assess the structure of the datasets. The structure of the datasets is one
factor that dictates the tools and analytical techniques for the next phase.
Depending on whether the team plans to analyze textual data or
transactional data, for example, different tools and approaches are required.

 Ensure that the analytical techniques enable the team to meet the business
objectives and accept or reject the working hypotheses.

 Determine if the situation warrants a single model or a series of techniques

as part of a larger analytic workflow.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 3 – Model Planning

 In addition to the considerations just listed, it is useful to research and

understand how other analysts generally approach a specific kind of
problem.

 Given the kind of data and resources that are available, evaluate whether
similar, existing approaches will work or if the team will need to create
something new. Many times teams can get ideas from analogous problems
that other people have solved in different industry verticals or domain areas.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 3 – Model Planning
 Table 2-2 summarizes the results of
an exercise of this type, involving
several domain areas and the types of
models previously used in a
classification type of problem after
conducting research on churn models
in multiple industry verticals.

 Performing this sort of diligence gives

the team ideas of how others have
solved similar problems and presents
the team with a list of candidate
models to try as part of the model
planning phase.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Model Planning - Data Exploration and Variable Selection

 In Phase 3, the objective of the data exploration is to understand the

relationships among the variables to inform selection of the variables and
methods and to understand the problem domain. As with earlier phases of
the Data Analytics Lifecycle, it is important to spend time and focus
attention on this preparatory work to make the subsequent phases of model
selection and execution easier and more efficient.

 A common way to conduct this step involves using tools to perform data
visualizations. Approaching the data exploration in this way aids the team in
previewing the data and assessing relationships between variables at a high
level.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Model Planning - Data Exploration and Variable Selection

 As the team begins to question assumptions and test initial ideas of the
project sponsors and stakeholders, it needs to consider the inputs and data
that will be needed, and then it must examine whether these inputs are
actually correlated with the outcomes that the team plans to predict or
analyze.

 Some methods and types of models will handle correlated variables better
than others. Depending on what the team is attempting to solve, it may
need to consider an alternate method, reduce the number of data inputs, or
transform the inputs to allow the team to use the best method for a given
business problem.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Model Planning - Data Exploration and Variable Selection

 The key to this approach is to aim for capturing the most essential predictors and variables
rather than considering every possible variable that people think may influence the
outcome.

 Approaching the problem in this manner requires iterations and testing to identify the most
essential variables for the intended analyses. The team should plan to test a range of
variables to include in the model and then focus on the most important and influential
variables.

 If the team plans to run regression analyses, identify the candidate predictors and outcome
variables of the model. Plan to create variables that determine outcomes but demonstrate
a strong relationship to the outcome rather than to the other input variables. This includes
remaining vigilant for problems such as serial correlation, multicollinearity, and other typical
data modeling challenges that interfere with the validity of these models.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Model Planning – Model Selection
 In the model selection subphase, the team’s main goal is to choose an analytical
technique, or a short list of candidate techniques, based on the end goal of the
project.

 For the context of this book, a model is discussed in general terms. In this case, a
model simply refers to an abstraction from reality. One observes events
happening in a real-world situation or with live data and attempts to construct models
that emulate this behavior with a set of rules and conditions.

 In the case of machine learning and data mining, these rules and conditions are
grouped into several general sets of techniques, such as classification, association
rules, and clustering. When reviewing this list of types of potential models, the team
can winnow down the list to several viable models to try to address a given problem.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Model Planning – Model Selection
 An additional consideration in this area for dealing with Big Data involves
determining if the team will be using techniques that are best suited for
structured data, unstructured data, or a hybrid approach.

 Lastly, the team should take care to identify and document the modeling
assumptions it is making as it chooses and constructs preliminary models.

 Typically, teams create the initial models using a statistical software package
such as R, SAS, or Matlab. Although these tools are designed for data mining
and machine learning algorithms, they may have limitations when applying
the models to very large datasets, as is common with Big Data.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 4 – Model Building

 In Phase 4, the data science team needs to develop datasets for training,
testing, and production purposes. These datasets enable the data scientist
to develop the analytical model and train it (“training data”), while holding
aside some of the data (“hold-out data” or “test data”) for testing the model.

 During this process, it is critical to ensure that the training and test datasets
are sufficiently robust for the model and analytical techniques. A simple way
to think of these datasets is to view the training dataset for conducting the
initial experiments and the test sets for validating an approach once the
initial experiments and models have been run.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 4 – Model Building
 In the model building phase, as
shown, an analytical model is
developed and fit on the training
data and evaluated (scored)
against the test data.

 The phases of model planning and

model building can overlap quite a
bit, and in practice one can iterate
back and forth between the two
phases for a while before settling
on a final model.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 4 – Model Building

 Although the modeling techniques and logic required to develop models can
be highly complex, the actual duration of this phase can be short compared
to the time spent preparing the data and defining the approaches.

 In general, plan to spend more time preparing and learning the data
(Phases 1–2) and crafting a presentation of the findings (Phase 5). Phases
3 and 4 tend to move more quickly, although they are more complex from a
conceptual standpoint. As part of this phase, the data science team needs
to execute the models defined in Phase 3.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 4 – Model Building

 During this phase, users run models from analytical software packages, such as R or SAS, on file
extracts and small datasets for testing purposes. In addition, we assess the validity of the model and its
results as well as determine if the model accounts for most of the data and has robust predictive power.

 Also, at this point, we refine the models to optimize the results, such as by modifying variable inputs or
reducing correlated variables where appropriate. In Phase 3, the team may have had some knowledge
of correlated variables or problematic data attributes, which will be confirmed or denied once the models
are actually executed.

 When immersed in the details of constructing models and transforming data, many small decisions are
often made about the data and the approach for the modeling. These details can be easily forgotten
once the project is completed. Therefore, it is vital to record the results and logic of the model during
this phase. In addition, one must take care to record any operating assumptions that were made in the
modeling process regarding the data or the context.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 4 – Model Building
Creating robust models that are suitable to a specific situation requires
thoughtful consideration to ensure the models being developed ultimately
meet the objectives outlined in Phase 1. Questions to consider include these:

 Does the model appear valid and accurate on the test data?
 Does the model output/behavior make sense to the domain experts? That
is, does it appear as if the model is giving answers that make sense in this
context?
 Do the parameter values of the fitted model make sense in the context of
the domain?
 Is the model sufficiently accurate to meet the goal?
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 4 – Model Building

 Does the model avoid intolerable mistakes?

 Are more data or more inputs needed? Do any of the inputs need to be
transformed or eliminated?
 Will the kind of model chosen support the runtime requirements?
 Is a different form of the model required to address the business problem? If
so, go back to the model planning phase and revise the modeling approach.

 Once the data science team can evaluate either if the model is sufficiently
robust to solve the problem or if the team has failed, it can move to the next
phase in the Data Analytics Lifecycle.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 4 – Model Building
There are many tools available to assist in this phase, focused primarily on
statistical analysis or data mining software. Common tools in this space
include, but are not limited to, the following:

Commercial Tools: Free or Open Source tools:

1. SAS Enterprise Miner 1. R and PL/R

2. SPSS Modeler 2. Octave
3. Matlab 3. WEKA
4. Alpine Miner 4. Python
5. STATISTICA 5. SQL
6. Mathematica

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 5 – Communicate the Results
 After executing the model, the team needs to
compare the outcomes of the modeling to the
criteria established for success and failure.

 In Phase 5, as shown, the team considers

how best to articulate the findings and
outcomes to the various team members and
stakeholders, taking into account caveats,
assumptions, and any limitations of the
results.

 Because the presentation is often circulated

within an organization, it is critical to articulate
the results properly and position the findings
in a way that is appropriate for the audience.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 5 – Communicate the Results
 As part of Phase 5, the team needs to determine if it succeeded or failed in its objectives.
Many times people do not want to admit to failing, but in this instance failure should not be
considered as a true failure, but rather as a failure of the data to accept or reject a given
hypothesis adequately.

 This concept can be counterintuitive for those who have been told their whole careers not to
fail. However, the key is to remember that the team must be rigorous enough with the data to
determine whether it will prove or disprove the hypotheses outlined in Phase 1 (discovery).

 Sometimes teams have only done a superficial analysis, which is not robust enough to
accept or reject a hypothesis. Other times, teams perform very robust analysis and are
searching for ways to show results, even when results may not be there. It is important to
strike a balance between these two extremes when it comes to analyzing data and being
pragmatic in terms of showing real-world results.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 5 – Communicate the Results
 When conducting this assessment, determine if the results are statistically
significant and valid. If they are, identify the aspects of the results that stand
out and may provide salient findings when it comes time to communicate them.

 If the results are not valid, think about adjustments that can be made to refine
and iterate on the model to make it valid. During this step, assess the results
and identify which data points may have been surprising and which were in line
with the hypotheses that were developed in Phase 1.

 Comparing the actual results to the ideas formulated early on produces

additional ideas and insights that would have been missed if the team had not
taken time to formulate initial hypotheses early in the process.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 5 – Communicate the Results
 By this time, the team should have determined which model or models
address the analytical challenge in the most appropriate way. In addition,
the team should have ideas of some of the findings as a result of the
project. The best practice in this phase is to record all the findings and then
select the three most significant ones that can be shared with the
stakeholders.

 In addition, the team needs to reflect on the implications of these findings

and measure the business value. Depending on what emerged as a result
of the model, the team may need to spend time quantifying the business
impact of the results to help prepare for the presentation and demonstrate
the value of the findings.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 5 – Communicate the Results
 Now that the team has run the model, completed a thorough discovery
phase, and learned a great deal about the datasets, reflect on the project
and consider what obstacles were in the project and what can be improved
in the future.

 Make recommendations for future work or improvements to existing

processes, and consider what each of the team members and stakeholders
needs to fulfill her responsibilities. For instance, sponsors must champion
the project. Stakeholders must understand how the model affects their
processes.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 5 – Communicate the Results
 For example, if the team has created a model to predict customer churn, the
Marketing team must understand how to use the churn model predictions in
planning their interventions.

 Production engineers need to operationalize the work that has been done.
In addition, this is the phase to underscore the business benefits of the work
and begin making the case to implement the logic into a live production
environment.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 6 - Operationalize

 In the final phase, the team communicates the benefits of the project more broadly and sets
up a pilot project to deploy the work in a controlled way before broadening the work to a full
enterprise or ecosystem of users.

 Phase 6 represents the first time that most analytics teams approach deploying the new
analytical methods or models in a production environment. Rather than deploying these
models immediately on a wide-scale basis, the risk can be managed more effectively and the
team can learn by undertaking a small scope, pilot deployment before a wide-scale rollout.

 This approach enables the team to learn about the performance and related constraints of
the model in a production environment on a small scale and make adjustments before a full
deployment.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 6 - Operationalize

 Be aware that this phase can bring in a new set of team members—usually
the engineers responsible for the production environment who have a new
set of issues and concerns beyond those of the core project team.

 This technical group needs to ensure that running the model fits smoothly
into the production environment and that the model can be integrated into
related business processes.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 6 - Operationalize
 Part of the operationalizing phase includes creating a mechanism for
performing ongoing monitoring of model accuracy and, if accuracy
degrades, finding ways to retrain the model.

 If feasible, design alerts for when the model is operating “out-of-bounds.”

This includes situations when the inputs are beyond the range that the
model was trained on, which may cause the outputs of the model to be
inaccurate or invalid. If this begins to happen regularly, the model needs to
be retrained on new data.

*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 6 - Operationalize
Although many roles represent many interests within a project, these interests
usually overlap, and most of them can be met with four main deliverables.
 Presentation for project sponsors: This contains high-level takeaways for
executive level stakeholders, with a few key messages to aid their decision-
making process. Focus on clean, easy visuals for the presenter to explain and
for the viewer to grasp.
 Presentation for analysts, which describes business process changes and
reporting changes. Fellow data scientists will want the details and are
comfortable with technical graphs such as Receiver Operating Characteristic
[ROC] curves, density plots, and histograms
 Code for technical people.
 Technical specifications of implementing the code.
*Text taken from Data Science and Big Data Analytics by EMC Education Services
Phase 6 - Operationalize

 As a general rule, the more executive the audience, the more succinct the
presentation needs to be. Most executive sponsors attend many briefings in
the course of a day or a week. Ensure that the presentation gets to the point
quickly and frames the results in terms of value to the sponsor’s
organization.

 When presenting to other audiences with more quantitative backgrounds,

focus more time on the methodology and findings. In these instances, the
team can be more expansive in describing the outcomes, methodology, and
analytical experiment with a peer group.

*Text taken from Data Science and Big Data Analytics by EMC Education Services

Azure Databricks
67% (6)
Azure Databricks
69 pages
ch4 23 11 2023
100% (1)
ch4 23 11 2023
81 pages
20461C 00
100% (1)
20461C 00
7 pages
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
Dbms MCQ 01: Database Administrator
100% (1)
Dbms MCQ 01: Database Administrator
84 pages
ADABAS File Access Guide
100% (1)
ADABAS File Access Guide
49 pages
Squid Guard Basic Manual
No ratings yet
Squid Guard Basic Manual
8 pages
Descriptive Data Analytics
No ratings yet
Descriptive Data Analytics
56 pages
Data Mining
No ratings yet
Data Mining
87 pages
Lesson1 - Data Definitions
No ratings yet
Lesson1 - Data Definitions
57 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Final - Unit 3 Data Preprocessing - Phases
No ratings yet
Final - Unit 3 Data Preprocessing - Phases
42 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
22 pages
Big Data - S
No ratings yet
Big Data - S
79 pages
Report Design & Data Monitor Using Businessobjects Dashboard Design
No ratings yet
Report Design & Data Monitor Using Businessobjects Dashboard Design
74 pages
Training in R For Data Statistics
No ratings yet
Training in R For Data Statistics
113 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats
No ratings yet
Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats
53 pages
Examples On Triggers: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
No ratings yet
Examples On Triggers: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
15 pages
Session 3 4 Data Literacy Privacy Ethics
100% (1)
Session 3 4 Data Literacy Privacy Ethics
19 pages
Module No 5 Relational Database Design
No ratings yet
Module No 5 Relational Database Design
160 pages
SQL
No ratings yet
SQL
101 pages
DBMS Module 2
No ratings yet
DBMS Module 2
125 pages
DataMining S
No ratings yet
DataMining S
103 pages
Perl Tutorial
No ratings yet
Perl Tutorial
32 pages
Unit 2 Da
No ratings yet
Unit 2 Da
69 pages
Unit-2 SQL Updated
No ratings yet
Unit-2 SQL Updated
102 pages
Blue Team Fundamentals
No ratings yet
Blue Team Fundamentals
11 pages
RDBMS
No ratings yet
RDBMS
155 pages
Subqueries
No ratings yet
Subqueries
32 pages
SQL Basic
100% (1)
SQL Basic
53 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
02 - Data Preparation and Cleaning
No ratings yet
02 - Data Preparation and Cleaning
16 pages
Module 4
No ratings yet
Module 4
63 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
1 page
Unit 01
No ratings yet
Unit 01
32 pages
Lesson 3 Big Data Overview
No ratings yet
Lesson 3 Big Data Overview
30 pages
Business Operations and Analytics
No ratings yet
Business Operations and Analytics
33 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
4-Stored Procedures
No ratings yet
4-Stored Procedures
22 pages
Lesson 2 Linear Regression
100% (1)
Lesson 2 Linear Regression
21 pages
DBMS Module1 Part1
No ratings yet
DBMS Module1 Part1
66 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
DBMS - Module 3 Ppts - Jan28th (Autosaved)
100% (1)
DBMS - Module 3 Ppts - Jan28th (Autosaved)
104 pages
Unit 8 Communication Skills Lecturer Slides
100% (1)
Unit 8 Communication Skills Lecturer Slides
30 pages
Syllabus:: 1.1 Data Mining
No ratings yet
Syllabus:: 1.1 Data Mining
30 pages
Health Data Quality
100% (1)
Health Data Quality
20 pages
Mana Mohan R
No ratings yet
Mana Mohan R
147 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
Lecture1 Big Data
No ratings yet
Lecture1 Big Data
47 pages
Advanced SQL: Stored Procedures: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
No ratings yet
Advanced SQL: Stored Procedures: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
23 pages
DBMS Module 1
No ratings yet
DBMS Module 1
56 pages
Unit 6
No ratings yet
Unit 6
143 pages
Big Data Analytics and Visualization Lab
No ratings yet
Big Data Analytics and Visualization Lab
193 pages
MS 20761C: Querying Data With Transact-SQL
No ratings yet
MS 20761C: Querying Data With Transact-SQL
7 pages
Lesson 3 Data Cleaning and Preparation
No ratings yet
Lesson 3 Data Cleaning and Preparation
105 pages
Study Material DF
No ratings yet
Study Material DF
152 pages
Chapter 5: Advanced SQL: Database System Concepts, 6 Ed
No ratings yet
Chapter 5: Advanced SQL: Database System Concepts, 6 Ed
77 pages
Ethics Privacy and Security
No ratings yet
Ethics Privacy and Security
27 pages
Mapping CGEIT and COBIT
No ratings yet
Mapping CGEIT and COBIT
47 pages
Big Data Analytics and Artificial Intelligence in
No ratings yet
Big Data Analytics and Artificial Intelligence in
10 pages
PSIT03 (Group 1) Overview-of-Risk-Management-Frameworks - NIST-RMF-ISO-31000-and-COBIT-4
No ratings yet
PSIT03 (Group 1) Overview-of-Risk-Management-Frameworks - NIST-RMF-ISO-31000-and-COBIT-4
23 pages
Blue Team Fundamentals Module 02
No ratings yet
Blue Team Fundamentals Module 02
63 pages
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
4
No ratings yet
4
1 page
11
No ratings yet
11
3 pages
X
No ratings yet
X
2 pages
Asd
No ratings yet
Asd
2 pages
Z
No ratings yet
Z
5 pages
Communications Management Merged
No ratings yet
Communications Management Merged
58 pages
Vision: MEC32P-2 Mechanics of Deformable Bodies
No ratings yet
Vision: MEC32P-2 Mechanics of Deformable Bodies
14 pages
Screencapture Chegg Homework Help Questions and Answers Velocity M S Rocket Given Function Time Seconds Table Shows Velocity Rocket Specific Time q74830069 2021 08 05 10 - 25 - 47
No ratings yet
Screencapture Chegg Homework Help Questions and Answers Velocity M S Rocket Given Function Time Seconds Table Shows Velocity Rocket Specific Time q74830069 2021 08 05 10 - 25 - 47
2 pages
Procurement Management
No ratings yet
Procurement Management
14 pages
Risk Management
No ratings yet
Risk Management
27 pages
Who Does The Messenger Claim Is A "Walking Dead Man"?
No ratings yet
Who Does The Messenger Claim Is A "Walking Dead Man"?
2 pages
Iim Indore'S: Integrated Program in Business Analytics (IPBA)
No ratings yet
Iim Indore'S: Integrated Program in Business Analytics (IPBA)
8 pages
The DB2 Database Manager Instance
No ratings yet
The DB2 Database Manager Instance
39 pages
Name Four Common Misconceptions About Digital Preservation
No ratings yet
Name Four Common Misconceptions About Digital Preservation
6 pages
111236
No ratings yet
111236
154 pages
The Definitive Datawindow 2 Covers PowerBuilder
No ratings yet
The Definitive Datawindow 2 Covers PowerBuilder
825 pages
List of Approved Voters 02024-08-14 125758
No ratings yet
List of Approved Voters 02024-08-14 125758
57 pages
21CD744
No ratings yet
21CD744
2 pages
Splunk SQL To SPL
No ratings yet
Splunk SQL To SPL
3 pages
Cs498 Week 12 Slide
No ratings yet
Cs498 Week 12 Slide
100 pages
A Database Management System (DBMS) Is A Software Package Designed To Define, Manipulate, Retrieve and Manage Data in A Database
No ratings yet
A Database Management System (DBMS) Is A Software Package Designed To Define, Manipulate, Retrieve and Manage Data in A Database
8 pages
BCS Database Programming Notes 1 - 12
No ratings yet
BCS Database Programming Notes 1 - 12
254 pages
PHP - Form Introduction: Dynamic Websites
No ratings yet
PHP - Form Introduction: Dynamic Websites
3 pages
New Ip List
No ratings yet
New Ip List
13 pages
1000+ Job Openings
No ratings yet
1000+ Job Openings
48 pages
Sayan - Resume For MSI
No ratings yet
Sayan - Resume For MSI
1 page
General Ledger Reference: Technical Manual
No ratings yet
General Ledger Reference: Technical Manual
244 pages
Batch 1 List - RA Feedback Sheet - May
No ratings yet
Batch 1 List - RA Feedback Sheet - May
4 pages
Duplicate Detection Using Algorithms
No ratings yet
Duplicate Detection Using Algorithms
3 pages
SCA Assignment 1
No ratings yet
SCA Assignment 1
2 pages
VLDS Arch Final Report PDF
No ratings yet
VLDS Arch Final Report PDF
83 pages
Answers Azure
No ratings yet
Answers Azure
17 pages
Unit V NoSQL Databases
No ratings yet
Unit V NoSQL Databases
124 pages
Sourabh23 Resume
No ratings yet
Sourabh23 Resume
1 page
Analitical Function
No ratings yet
Analitical Function
18 pages
ArcGIS Shapefile Files Types & Extensions
No ratings yet
ArcGIS Shapefile Files Types & Extensions
4 pages
SQL Example
No ratings yet
SQL Example
11 pages

Lesson 6 Data Life Cycle Part 2

Uploaded by

Lesson 6 Data Life Cycle Part 2

Uploaded by

Introduction to

 In Phase 3, the data science team identifies

 It is during this phase that the team refers to

 These hypotheses help the team frame the

 Determine if the situation warrants a single model or a series of techniques

 In addition to the considerations just listed, it is useful to research and

 Performing this sort of diligence gives

 In Phase 3, the objective of the data exploration is to understand the

 The phases of model planning and

 Does the model avoid intolerable mistakes?

Commercial Tools: Free or Open Source tools:

1. SAS Enterprise Miner 1. R and PL/R

 In Phase 5, as shown, the team considers

 Because the presentation is often circulated

 Comparing the actual results to the ideas formulated early on produces

 In addition, the team needs to reflect on the implications of these findings

 Make recommendations for future work or improvements to existing

 If feasible, design alerts for when the model is operating “out-of-bounds.”

 When presenting to other audiences with more quantitative backgrounds,

You might also like