Unit V

Uploaded by

apdeshmukh371122

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Unit V

Uploaded by

apdeshmukh371122

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

1. What do you mean by Data Explosion?

The rapid and exponential increase/growth of data generated from various sources such as
social media, mobile devices, sensors, IoT devices, and digital platforms which are stored
in computing systems, when reaches a level where data management becomes difficult, is
called a “Data Explosion”.

2. What are the sources of Big Data

Social media: Platforms like Facebook, Twitter, and Instagram generate large volumes of
user-generated data, including text, images, and videos.

IoT Devices and Sensors: Smart devices, wearables, and industrial sensors constantly
produce data in real-time, used in industries from healthcare to manufacturing.

Transactional Data: Financial and retail transactions, including e-commerce, banking, and
point-of-sale systems, generate structured data on purchases and customer behaviour.

Web and Mobile Applications: User interactions on websites and mobile apps produce data
like clickstream, browsing history, and app usage patterns.

Machine-generated Data: System logs, network traffic, and automated machine data in
sectors like telecommunications, transportation, and cybersecurity contribute to big data.

Public Data and Open Data: Government records, scientific research data, and open
databases provide data across fields like healthcare, environment, and demographics.

3. Explain the 3 characteristics of Big Data

The three primary characteristics of big data are:

1. Volume: This refers to the vast amounts of data generated from various sources. Big data
systems handle terabytes to petabytes of data, making traditional data storage and
processing methods insufficient. The volume aspect emphasizes the scale and capacity
required to manage and analyze such large datasets.

2. Velocity: This represents the speed at which data is generated, collected, and processed.
With real-time data sources like social media, IoT sensors, and financial transactions, big data
needs fast, continuous processing to provide timely insights and decision-making.

3. Variety: Big data comes in diverse formats, including structured data (like databases), semi-
structured data (like JSON files), and unstructured data (like images, videos, and texts). This
variety demands flexible tools and models to integrate and analyze different types of data
effectively & efficiently.
4. What do you mean by Data Analytic Lifecycle?

The Data Analytics Lifecycle is a systematic approach to managing and executing data analysis
projects, guiding teams from defining objectives to delivering actionable insights. It was designed to
address Big Data problems, as well as specific demands for conducting analysis on Big Data.
It typically consists of six stages:

1. Discovery: Understanding the business problem, objectives, and resources (such as data,
tools, and team skills) available for analysis. This stage sets the project’s foundation.

2. Data Preparation: Gathering, cleaning, transforming, and exploring data to ensure it’s ready
for analysis. This often involves handling missing values, identifying outliers, and combining
data sources.

3. Model Planning: Determining the analytical techniques, algorithms, and tools to apply.
Teams often use data exploration and hypothesis testing to decide the best modeling
approach.

4. Model Building: Constructing, training, and validating models based on the planned
approach. This involves iterating to refine models for improved accuracy and reliability.

5. Communication of Results: Presenting insights to stakeholders in a clear, actionable format

using dashboards, visualizations, or reports, ensuring the results are understandable and
aligned with business objectives.

6. Operationalize: Deploying the model or insights into production for real-world application,
which could involve setting up monitoring systems and training teams to ensure long-term
effectiveness.

5. Explain the 6 phases of Data Analytic Lifecycle.

1) Discovery: This phase focuses on understanding the business problem and defining clear
objectives. The team assesses available resources, such as data sources, tools, technologies, and
expertise, to create a project scope and outline the project’s goals.

2) Data Preparation: In this phase, data is collected, cleaned, and transformed to prepare it for
analysis. This involves handling missing data, removing inconsistencies, combining data from
multiple sources, and conducting exploratory analysis to understand data characteristics and
ensure quality.
Some of the tools used commonly for this process include - Hadoop, Alpine Miner,
Open Refine, etc.

3) Model Planning: Here, the team identifies suitable analytical methods and algorithms based on
the data and objectives. Techniques like regression, clustering, and classification may be selected
depending on the problem. This phase often includes creating a preliminary model structure and
strategy for testing hypotheses.
Some of the tools used commonly for this stage are MATLAB and STASTICA.

4) Model Building: This phase involves developing and training models based on the chosen
methods. The team iterates on the model, adjusting parameters and features to improve
accuracy and reliability, and uses training data to refine the model’s predictive capability.
Tools that are free or open-source or free tools Rand PL/R, Octave, WEKA.
Commercial tools - MATLAB, STASTICA.
5) Communication of Results: In this phase, the team presents findings to stakeholders in a clear
and actionable format. The insights are often communicated through visualizations, dashboards,
or reports tailored to the audience’s needs, making complex data understandable. This phase
focuses on ensuring that stakeholders understand how the insights address the original business
problem and how they can use them to make informed decisions.
6) Operationalize: This final phase involves integrating the model, insights, or analytics system into
the business’s operational processes for long-term use. This could include deploying the model
into production, setting up monitoring systems, and establishing maintenance workflows to
ensure its continued accuracy and relevance. Operationalizing also includes training teams and
setting up processes to ensure the model or analytics solution is effectively utilized over time.
Open source or free tools such as WEKA, SQL, MADlib, and Octave.

Salesforce AI Associate Dumps
100% (4)
Salesforce AI Associate Dumps
60 pages
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
60% (10)
Sunbeam Popcorn Maker FPSBPP7310 FPSBPP7316
9 pages
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
No ratings yet
Wiley - Operations Management - An Integrated Approach, 7th Edition - 978-1-119-49706-6
3 pages
Data Analytics Lecture Notes
100% (1)
Data Analytics Lecture Notes
10 pages
Ford Escape 4wd Workshop Manual v6 3 0l 2008
100% (4)
Ford Escape 4wd Workshop Manual v6 3 0l 2008
7,556 pages
2019 Book EssentialsOfBusinessAnalytics PDF
93% (14)
2019 Book EssentialsOfBusinessAnalytics PDF
971 pages
AP Statistics Chapter 3
0% (1)
AP Statistics Chapter 3
3 pages
Udemy 2024 Learning Trends Top 100 Surging Skills Infographic
100% (1)
Udemy 2024 Learning Trends Top 100 Surging Skills Infographic
1 page
Home Depot Strategy
100% (1)
Home Depot Strategy
8 pages
SAP ABAP On HANA by Ravi Anand
100% (3)
SAP ABAP On HANA by Ravi Anand
82 pages
All CMD Comands
100% (2)
All CMD Comands
5 pages
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
Unit V
No ratings yet
Unit V
4 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
_unit2 DATA SCIENCE
No ratings yet
_unit2 DATA SCIENCE
8 pages
Unit - I - 2
No ratings yet
Unit - I - 2
63 pages
What Is Data Anaysis
No ratings yet
What Is Data Anaysis
8 pages
Unit 3 Batnote
No ratings yet
Unit 3 Batnote
1 page
What Is A Data Analytics Lifecycle
No ratings yet
What Is A Data Analytics Lifecycle
8 pages
Ch1-Introduction to Data Analytics & LifeCycle
No ratings yet
Ch1-Introduction to Data Analytics & LifeCycle
26 pages
Unit - I DA.pptx
No ratings yet
Unit - I DA.pptx
107 pages
BUSINESS ANALYTICS UNIT I
No ratings yet
BUSINESS ANALYTICS UNIT I
45 pages
Chap 1
No ratings yet
Chap 1
42 pages
Data Science QB
No ratings yet
Data Science QB
42 pages
Data Science QB
No ratings yet
Data Science QB
58 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
Syllabus Solving
No ratings yet
Syllabus Solving
73 pages
DataAnalytics-Chap-1
No ratings yet
DataAnalytics-Chap-1
36 pages
Module I(Introduction Data Analytics Life Cycle) Part II (1)
No ratings yet
Module I(Introduction Data Analytics Life Cycle) Part II (1)
103 pages
Unit 2 PPT (BA)
No ratings yet
Unit 2 PPT (BA)
33 pages
dsbd
No ratings yet
dsbd
23 pages
Big Data Categories-Life Cycle
No ratings yet
Big Data Categories-Life Cycle
15 pages
Da CH1 Slqa
No ratings yet
Da CH1 Slqa
6 pages
Unit I Big Data
No ratings yet
Unit I Big Data
256 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
Unit 1 Rept
No ratings yet
Unit 1 Rept
61 pages
BDA Unit 1 Bigdata Intro
No ratings yet
BDA Unit 1 Bigdata Intro
69 pages
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-01-29 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-01-29 Reference-Material-I
53 pages
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
From Everand
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
Unit2-Data Science
No ratings yet
Unit2-Data Science
20 pages
Adobe Scan 27-Mar-2024
No ratings yet
Adobe Scan 27-Mar-2024
12 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
Big Data Assignments Answer
No ratings yet
Big Data Assignments Answer
15 pages
Data Analytics
No ratings yet
Data Analytics
11 pages
DATA ANALYTICS 1
No ratings yet
DATA ANALYTICS 1
13 pages
TP 4 2docuatrimestre
No ratings yet
TP 4 2docuatrimestre
10 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
2.Data analysis Vs analytics
No ratings yet
2.Data analysis Vs analytics
6 pages
analytics and data science
No ratings yet
analytics and data science
12 pages
Unit I - BigData
No ratings yet
Unit I - BigData
47 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
DAR Question Bank 1
No ratings yet
DAR Question Bank 1
2 pages
Assignement - Data Science For Business Growth and Big Data and Business Analytics
No ratings yet
Assignement - Data Science For Business Growth and Big Data and Business Analytics
5 pages
Ch1-Introduction to data analytics & LifeCycle
No ratings yet
Ch1-Introduction to data analytics & LifeCycle
25 pages
Introduction-to-Data-Analytics
No ratings yet
Introduction-to-Data-Analytics
15 pages
Business Undestanding and Data Collection
No ratings yet
Business Undestanding and Data Collection
27 pages
Bda Combined
No ratings yet
Bda Combined
102 pages
NEW-QUESTION-BANK-BUSINESS-ANALYTICS
No ratings yet
NEW-QUESTION-BANK-BUSINESS-ANALYTICS
60 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Data analyses
No ratings yet
Data analyses
9 pages
Unit 1
No ratings yet
Unit 1
36 pages
MODULE 1
No ratings yet
MODULE 1
40 pages
Big Data Analytics Life Cycle
No ratings yet
Big Data Analytics Life Cycle
3 pages
Module 1 - Introduction To Data Analytics
No ratings yet
Module 1 - Introduction To Data Analytics
21 pages
LIFE CYCLE
No ratings yet
LIFE CYCLE
35 pages
Document (1)
No ratings yet
Document (1)
10 pages
L01-Fundamentals of Big Data and Data Analytics (1)
No ratings yet
L01-Fundamentals of Big Data and Data Analytics (1)
58 pages
2 Data Analytics
No ratings yet
2 Data Analytics
49 pages
DAAL_Assignment No 9
No ratings yet
DAAL_Assignment No 9
1 page
A3
No ratings yet
A3
5 pages
DAAL_Assignment No 10
No ratings yet
DAAL_Assignment No 10
2 pages
DAAL_Assignment No 7
No ratings yet
DAAL_Assignment No 7
2 pages
A5
No ratings yet
A5
4 pages
DAAL_Assignment No 8
No ratings yet
DAAL_Assignment No 8
1 page
S.Y Syllabus
No ratings yet
S.Y Syllabus
57 pages
A Collection of Fraud Schemes
67% (3)
A Collection of Fraud Schemes
54 pages
Resume Updated
100% (3)
Resume Updated
2 pages
Consumer Reports Buying Guide 2021
100% (1)
Consumer Reports Buying Guide 2021
227 pages
Political Analysis
No ratings yet
Political Analysis
11 pages
(PDF) Introduction To Selling Value - Course-Final
No ratings yet
(PDF) Introduction To Selling Value - Course-Final
75 pages
GRE Text Completion and Sentence Equivalence Practice Questions
100% (2)
GRE Text Completion and Sentence Equivalence Practice Questions
32 pages
Online Casino Software For Sale and Casino Software Solutions
No ratings yet
Online Casino Software For Sale and Casino Software Solutions
2 pages
ATS Resume Template PDF
No ratings yet
ATS Resume Template PDF
1 page
TED Talks List
100% (2)
TED Talks List
15 pages
Outdoor Living Skills (PDFDrive) PDF
No ratings yet
Outdoor Living Skills (PDFDrive) PDF
157 pages
Globalization Strategy Playbook: Document Revision History
100% (2)
Globalization Strategy Playbook: Document Revision History
93 pages
Data Analytics Concepts Techniques and A PDF
100% (11)
Data Analytics Concepts Techniques and A PDF
451 pages
Focus Investing PDF
No ratings yet
Focus Investing PDF
18 pages
The Chemical Engineer - Issue 983 - May 2023
No ratings yet
The Chemical Engineer - Issue 983 - May 2023
68 pages
Cyber Resilience Blueprint
No ratings yet
Cyber Resilience Blueprint
12 pages
SAP GTS Case Study - Citrix - Systems
100% (1)
SAP GTS Case Study - Citrix - Systems
2 pages
QuickBooks Online Core Certification Self Study Workbook V21.2.2
100% (1)
QuickBooks Online Core Certification Self Study Workbook V21.2.2
55 pages
Guidance On Good Data and Record Management Practices
No ratings yet
Guidance On Good Data and Record Management Practices
44 pages
2015 Book IntroductionToNursingInformati
100% (1)
2015 Book IntroductionToNursingInformati
456 pages
Microsoft AppSource Partner Listing Guidelines PDF
No ratings yet
Microsoft AppSource Partner Listing Guidelines PDF
10 pages
NIST 2 Framework
100% (1)
NIST 2 Framework
32 pages
Whitepaper - Third-Party Risk Management Services
No ratings yet
Whitepaper - Third-Party Risk Management Services
24 pages
Projects PDF
100% (1)
Projects PDF
6 pages
C and C++
No ratings yet
C and C++
1,093 pages
Beti Senior Research
No ratings yet
Beti Senior Research
42 pages
Satyam Rana 4 sem business analytics
No ratings yet
Satyam Rana 4 sem business analytics
29 pages
Implementation and Maintenance Student Guide
No ratings yet
Implementation and Maintenance Student Guide
157 pages
FOXPRO
No ratings yet
FOXPRO
5 pages
4th Summative
No ratings yet
4th Summative
3 pages
Database Recovery Techniques
No ratings yet
Database Recovery Techniques
42 pages
PGXPM-DT - DTC - Step 6 - Identify Insights
No ratings yet
PGXPM-DT - DTC - Step 6 - Identify Insights
2 pages
Hashing
No ratings yet
Hashing
4 pages
Informatica Pushdown Tips
No ratings yet
Informatica Pushdown Tips
8 pages
file_exec
No ratings yet
file_exec
3 pages
Eti Chapter 1
No ratings yet
Eti Chapter 1
26 pages
MySQL Quizzes
No ratings yet
MySQL Quizzes
42 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
Internship Report
No ratings yet
Internship Report
49 pages
DBMS ques
No ratings yet
DBMS ques
3 pages
ISM Lab Practical File
No ratings yet
ISM Lab Practical File
25 pages
Kub Commands
No ratings yet
Kub Commands
2 pages
Performance Specification Digital Terrain Elevation Data (Dted)
No ratings yet
Performance Specification Digital Terrain Elevation Data (Dted)
45 pages
B16139@students - Iitmandi.ac - in (+91) 9113168590
No ratings yet
B16139@students - Iitmandi.ac - in (+91) 9113168590
2 pages

Unit V

Uploaded by

Unit V

Uploaded by

1. What do you mean by Data Explosion?

2. What are the sources of Big Data

3. Explain the 3 characteristics of Big Data

The three primary characteristics of big data are:

5. Communication of Results: Presenting insights to stakeholders in a clear, actionable format

5. Explain the 6 phases of Data Analytic Lifecycle.

You might also like