0% found this document useful (0 votes)

2 views

Unit V

Uploaded by

apdeshmukh371122

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Unit V

Uploaded by

apdeshmukh371122

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Explosion

The rapid or exponential increase in the amount of data that is generated and
stored in the computing systems, which reaches a level where data management
becomes difficult, is called a “Data Explosion”

Three characteristics define Big Data: volume, variety, and velocity.

Volume: Big data sets must include millions of unstructured, low-density data
points. Companies that use big data can keep anything from dozens of terabytes to
hundreds of petabytes of user data. The advent of cloud computing means
companies now have access to zettabytes of data! All data is saved regardless of
apparent importance. Big data specialists argue that sometimes the answers to
business questions can lie in unexpected data.

Velocity: Velocity refers to the fast generation and application of big data. Big
data is received, analyzed, and interpreted in quick succession to provide the most
up-to-date findings. Many big data platforms even record and interpret data in real
-time.

Variety: Big data sets contain different types of data within the same unstructured
database. Traditional data management systems use structured relational
databases that contain specific data types with set relationships to other data
types. Big data analytics programs use many different types of unstructured data
to find all correlations between all types of data. Big data approaches often lead to
a more complete picture of how each factor is related
Life Cycle of Data Analytics
The Data analytics lifecycle was designed to address Big Data problems and data
science projects. The process is repeated to show the real projects. To address the
specific demands for conducting analysis on Big Data, the step-by-step
methodology is required to plan the various tasks associated with the acquisition,
processing, analysis, and recycling of data.

Phase 1: Discovery -

The data science team is trained and researches the issue.

Create context and gain understanding.

Learn about the data sources that are needed and accessible to the project.

The team comes up with an initial hypothesis, which can be later confirmed with
evidence.

Phase 2: Data Preparation -

Methods to investigate the possibilities of pre-processing, analysing, and preparing

data before analysis and modelling.

The team performs, loads, and transforms to bring information to the data sandbox.

Data preparation tasks can be repeated and not in a predetermined sequence.

Some of the tools used commonly for this process include - Hadoop, Alpine Miner,
Open Refine, etc.

Phase 3: Model Planning -

The team studies data to discover the connections between variables. Later, it
selects the most significant variables as well as the most effective models.

In this phase, the data science teams create data sets that can be used for training
for testing, production, and training goals.

The team builds and implements models based on the work completed in the
modelling planning phase.
Some of the tools used commonly for this stage are MATLAB and STASTICA.

Phase 4: Model Building -

The team creates datasets for training, testing as well as production use.

The team is also evaluating whether its current tools are sufficient to run the
models or if they require an even more robust environment to run models.

Tools that are free or open-source or free tools Rand PL/R, Octave, WEKA.

Commercial tools - MATLAB, STASTICA.

Phase 5: Communication Results -

Following the execution of the model, team members will need to evaluate the
outcomes of the model to establish criteria for the success or failure of the model.

The team is considering how best to present findings and outcomes to the various
members of the team and other stakeholders while taking into consideration
cautionary tales and assumptions.

The team should determine the most important findings, quantify their value to the
business and create a narrative to present findings and summarize them to all
stakeholders.

Phase 6: Operationalize -

The team distributes the benefits of the project to a wider audience. It sets up a
pilot project that will deploy the work in a controlled manner prior to expanding the
project to the entire enterprise of users.

This technique allows the team to gain insight into the performance and constraints
related to the model within a production setting at a small scale and then make
necessary adjustments before full deployment.

The team produces the last reports, presentations, and codes.

Open source or free tools such as WEKA, SQL, MADlib, and Octave.

Data Analytics I Unit Notes
No ratings yet
Data Analytics I Unit Notes
8 pages
Life Cycle of Data Analytics
No ratings yet
Life Cycle of Data Analytics
3 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
2 pages
Unit V
No ratings yet
Unit V
3 pages
_unit2 DATA SCIENCE
No ratings yet
_unit2 DATA SCIENCE
8 pages
Big Data
No ratings yet
Big Data
4 pages
Module I(Introduction Data Analytics Life Cycle) Part II (1)
No ratings yet
Module I(Introduction Data Analytics Life Cycle) Part II (1)
103 pages
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
LIFE CYCLE
No ratings yet
LIFE CYCLE
35 pages
Syllabus Solving
No ratings yet
Syllabus Solving
73 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
2 pages
Big Data Analytics
100% (1)
Big Data Analytics
11 pages
Data Science QB
No ratings yet
Data Science QB
58 pages
W3 - DA Life Cycle
No ratings yet
W3 - DA Life Cycle
49 pages
JobRecord MUHAMMAD NAEEM f70a3eba Db3d 11ef a12f 96f32f87411b
No ratings yet
JobRecord MUHAMMAD NAEEM f70a3eba Db3d 11ef a12f 96f32f87411b
63 pages
Data Science QB
No ratings yet
Data Science QB
42 pages
Unit I Big Data
No ratings yet
Unit I Big Data
256 pages
data-analytics-lifecycle
No ratings yet
data-analytics-lifecycle
4 pages
dsbd
No ratings yet
dsbd
23 pages
TP 4 2docuatrimestre
No ratings yet
TP 4 2docuatrimestre
10 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
BUSINESS ANALYTICS UNIT I
No ratings yet
BUSINESS ANALYTICS UNIT I
45 pages
Unit2-Data Science
No ratings yet
Unit2-Data Science
20 pages
Tugas Analitika Data (Yasa Hapipudin)
No ratings yet
Tugas Analitika Data (Yasa Hapipudin)
4 pages
FDS Introduction
No ratings yet
FDS Introduction
41 pages
Data Science Methodology
No ratings yet
Data Science Methodology
21 pages
BDA1
No ratings yet
BDA1
2 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
ATW115 Slides Chp02
No ratings yet
ATW115 Slides Chp02
52 pages
Big Data
No ratings yet
Big Data
10 pages
DSUR_EA2352001010391_W3
No ratings yet
DSUR_EA2352001010391_W3
3 pages
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
Bda Bi Jit Chapter-3
No ratings yet
Bda Bi Jit Chapter-3
40 pages
Ch1-Introduction to Data Analytics & LifeCycle
No ratings yet
Ch1-Introduction to Data Analytics & LifeCycle
26 pages
Notes For DMML
No ratings yet
Notes For DMML
27 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Unit 3 Batnote
No ratings yet
Unit 3 Batnote
1 page
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet
Data Analytics
No ratings yet
Data Analytics
11 pages
Big Data Analytics Life Cycle
No ratings yet
Big Data Analytics Life Cycle
3 pages
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-01-29 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-01-29 Reference-Material-I
53 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
What Is A Data Analytics Lifecycle
No ratings yet
What Is A Data Analytics Lifecycle
8 pages
Designofproposalforbigdataanalyticsmodel CRP AWICE2016
No ratings yet
Designofproposalforbigdataanalyticsmodel CRP AWICE2016
4 pages
DS Unit 1
No ratings yet
DS Unit 1
26 pages
Unit4 - DataAnalytics and IoT PDF
No ratings yet
Unit4 - DataAnalytics and IoT PDF
40 pages
Data Science: Lesson 5
No ratings yet
Data Science: Lesson 5
6 pages
DS&BDA Unit 3
No ratings yet
DS&BDA Unit 3
51 pages
Introduction-to-Data-Analytics
No ratings yet
Introduction-to-Data-Analytics
15 pages
As You Delve Into The World of Data Analytics
No ratings yet
As You Delve Into The World of Data Analytics
10 pages
What Is Data Anaysis
No ratings yet
What Is Data Anaysis
8 pages
Unit 2 PPT (BA)
No ratings yet
Unit 2 PPT (BA)
33 pages
Big Data Categories-Life Cycle
No ratings yet
Big Data Categories-Life Cycle
15 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Lesson 6 Data Life Cycle Part 2
No ratings yet
Lesson 6 Data Life Cycle Part 2
30 pages
BigData Theory Updated 2
No ratings yet
BigData Theory Updated 2
28 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Unit - I DA.pptx
No ratings yet
Unit - I DA.pptx
107 pages
DAAL_Assignment No 9
No ratings yet
DAAL_Assignment No 9
1 page
A3
No ratings yet
A3
5 pages
DAAL_Assignment No 10
No ratings yet
DAAL_Assignment No 10
2 pages
DAAL_Assignment No 7
No ratings yet
DAAL_Assignment No 7
2 pages
A5
No ratings yet
A5
4 pages
DAAL_Assignment No 8
No ratings yet
DAAL_Assignment No 8
1 page
S.Y Syllabus
No ratings yet
S.Y Syllabus
57 pages
Urban Health Problems & Nuhm
No ratings yet
Urban Health Problems & Nuhm
48 pages
Heckscher-Ohlin Theory (Factor Proportions Theory)
No ratings yet
Heckscher-Ohlin Theory (Factor Proportions Theory)
4 pages
Introduction To Probability
No ratings yet
Introduction To Probability
77 pages
Calculate Circle Area Using Java Example: Home Fundamentals Common Java - Lang File Io Collections Applets & Awt Misc Swing
No ratings yet
Calculate Circle Area Using Java Example: Home Fundamentals Common Java - Lang File Io Collections Applets & Awt Misc Swing
4 pages
American Survival Guide - July 2018
100% (1)
American Survival Guide - July 2018
116 pages
Tet Syllabus LP 2021
No ratings yet
Tet Syllabus LP 2021
13 pages
Environment Ecology MCQs PDF
No ratings yet
Environment Ecology MCQs PDF
11 pages
Nemo Dongle Migration Instructions
No ratings yet
Nemo Dongle Migration Instructions
7 pages
News Report
No ratings yet
News Report
6 pages
Winter Conference - 20101104 Programbook114
No ratings yet
Winter Conference - 20101104 Programbook114
72 pages
Authentic Syllabus For Unilag, Uniilorin & Delsu Post Utme
No ratings yet
Authentic Syllabus For Unilag, Uniilorin & Delsu Post Utme
4 pages
How People Learn 2003
100% (1)
How People Learn 2003
12 pages
CH 12
No ratings yet
CH 12
40 pages
Sample Thesis On Stress Management
100% (3)
Sample Thesis On Stress Management
7 pages
Cardiovascular Blood Vessels
No ratings yet
Cardiovascular Blood Vessels
12 pages
As 3868-1991 Earth-Moving Machinery - Design Guide For Access Systems
0% (1)
As 3868-1991 Earth-Moving Machinery - Design Guide For Access Systems
7 pages
Acoustic Insulation Under Floor
No ratings yet
Acoustic Insulation Under Floor
10 pages
Power BI BS Analysis
No ratings yet
Power BI BS Analysis
13 pages
Richard Allen - Limited App PDF
No ratings yet
Richard Allen - Limited App PDF
2 pages
Aluminium Alloy 6066 Plate Suppliers
No ratings yet
Aluminium Alloy 6066 Plate Suppliers
10 pages
TNMC Credit
No ratings yet
TNMC Credit
7 pages
Download ebooks file Human Biology Zambak 1st Edition Abdurrahman Elma all chapters
100% (13)
Download ebooks file Human Biology Zambak 1st Edition Abdurrahman Elma all chapters
67 pages
West Bengal Board Class 12 Biological Sciences Syllabus
No ratings yet
West Bengal Board Class 12 Biological Sciences Syllabus
9 pages
Structural Analysis of Anal Um in in Um Multi Hull
No ratings yet
Structural Analysis of Anal Um in in Um Multi Hull
14 pages
PW_Lec25_dispersion_and_tanks_in_series_models.pdf (1)
No ratings yet
PW_Lec25_dispersion_and_tanks_in_series_models.pdf (1)
48 pages
2. RECENT ACTUAL TEST 2_READING
No ratings yet
2. RECENT ACTUAL TEST 2_READING
12 pages
Mechanical Vibrations: Why Do We Study ?
No ratings yet
Mechanical Vibrations: Why Do We Study ?
27 pages
Performance Analysis of Brain Control Interface in Drone Applications
No ratings yet
Performance Analysis of Brain Control Interface in Drone Applications
5 pages
Transformations That Work
No ratings yet
Transformations That Work
16 pages
10 MNC Companies
No ratings yet
10 MNC Companies
5 pages

Unit V

Uploaded by

Unit V

Uploaded by

Data Explosion

Three characteristics define Big Data: volume, variety, and velocity.

The data science team is trained and researches the issue.

Create context and gain understanding.

Phase 2: Data Preparation -

Methods to investigate the possibilities of pre-processing, analysing, and preparing

Data preparation tasks can be repeated and not in a predetermined sequence.

Phase 3: Model Planning -

Phase 4: Model Building -

Commercial tools - MATLAB, STASTICA.

Phase 5: Communication Results -

The team produces the last reports, presentations, and codes.

You might also like