0% found this document useful (0 votes)

251 views19 pages

Chapter 1 Introduction To Big Data

This document provides an introduction to big data, including definitions and characteristics. It defines big data as massive datasets from various sources that are analyzed to reveal patterns and optimize decision making. Key characteristics of big data include volume, velocity, and variety. There are three main types of data: structured, unstructured, and semi-structured. Traditional data management stored structured data in data warehouses, while big data tools like Hadoop can handle large volumes and varieties of data more efficiently. The document provides examples of how insurance companies, manufacturers, hotels, and public services can benefit from big data analytics.

Uploaded by

shubham.ojha2102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

251 views19 pages

Chapter 1 Introduction To Big Data

Uploaded by

shubham.ojha2102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Chapter 1

Introduction to Big Data

Introduction

1. What is BigData?

2. BigData Characteristics

3. Types of BigData

4. Traditional vs. Big Data business approach

5. Case study of Big Data Solutions

What is BigData?

• Massive datasets

• Collected from variety of data sources

• E-business and social media creates 2.5 Exabyte(1018 byte) of

data per day.

• To reveal new insights for optimized decision making.

• Used to stored for analysis to reveal hidden correlation and

patterns which is “BIG DATA ANALYTICS”
Trends of Data Generation

Year: 2020
Data: 50 ZB

Year: 2017
Data: 30 ZB

Year: 2010
Data: 20 ZB

Year:
2006
Data: 10
ZB
Big Data: Results of 3 computing Trends

Social Network Big Data Cloud Computing

Mobile
compu
ting
Volume of Big Data

Big Data (In Petabytes)

Web (In Terabytes)

CRM (In Gigabytes)

ERP (In Megabytes)

Transaction Operations

Customer Segmentation Support

Offer History Dynamic Pricing Behavior Weblogs

Sensor RFID UserClick Mobile Web

Characteristics of Big Data

1. Volume

2. Velocity

3. Variety
Five V’s of Big Data
Types of Big Data
What is Structured Data?

• Structured data usually resides in relational databases (RDBMS).

• Even text strings of variable length like names are contained in records,
making it a simple matter to search.

• Data may be human- or machine-generated as long as the data is created

within an RDBMS structure.
• This format is eminently searchable both with human generated queries and
via algorithms using type of data and field names, such as alphabetical or
numeric, currency or date.

• Common relational database applications with structured data include airline

reservation systems, inventory control, sales transactions, and ATM
activity.

• Structured Query Language (SQL) enables queries on this type of structured

data within relational databases.
What is Unstructured Data?
• Unstructured data has internal structure but is not structured via pre-
defined data models or schema.
• It may be textual or non-textual, and human- or machine-generated.
• It may also be stored within a non-relational database like NoSQL.

• Typical human-generated unstructured data includes:

1. Text files: Word processing, spreadsheets, presentations, email, logs.
2. Social Media: Data from Facebook, Twitter, LinkedIn.
3. Website: YouTube, Instagram, photo sharing sites.
4. Mobile data: Text messages, locations.
5. Communications: Chat, IM, phone recordings, collaboration software.
6. Media: MP3, digital photos, audio and video files.
7. Business applications: MS Office documents, productivity applications.

• Typical machine-generated unstructured data includes:

1. Satellite imagery: Weather data, land forms, military movements.
2. Scientific data: Oil and gas exploration, space exploration, seismic imagery,
atmospheric data.
3. Digital surveillance: Surveillance photos and video.
4. Sensor data: Traffic, weather, oceanographic sensors.
What is Semi-structured data ?

• Semi-structured data maintains internal tags and markings that identify

separate data elements, which enables information grouping and
hierarchies.
• Both documents and databases can be semi-structured.
• Email is a very common example of a semi-structured data type.

• Examples of Semi-structured Data:

1. Markup language XML : XML is a set of document encoding rules that defines
a human- and machine-readable format.
2. Open standard JSON (JavaScript Object Notation) : Its structure consists of
name/value pairs (or object, hash table, etc.) and an ordered value list (or array,
sequence, list).
3. NoSQL : NoSQL databases differ from relational databases because they do not
separate the organization (schema) from the data. It also allows for easier data
exchange between databases. Some newer NoSQL databases
ike MongoDB and Couchbase .
Traditional data management Approach

• Traditional data management store structure data in data

mart and data warehouses which are distributed
throughout the organization.

• Copying all the data from each of these systems to a

centralized location and keeping it updated is not an easy
task.

• Moreover, sampling the data will not serve the purpose of

extracting required information.

• This approach was able to handle huge volume of

transactions but up to an extent.
Big Data Approach

• Many IT tools are available for Big Data projects.

• Hadoop- Storage requirement

• Apache Spark- Stream Processing

• When used, these tools can dramatically reduce the time-to-

value- in most cases from more than 2 years to less than 4
months.
Advantages of using Hadoop:

1. Scalability
2. No pre-processing of data
3. Handles un-structure data
4. No limit of data and time
5. Protection against H/W failure
Beneficial Domains
• Insurance companies: To understand the likelihood of fraud by
accessing the internal and external data while processing claims.

• Manufacturers and Distributers: benefitted by realizing supply

chain issues earlier so that they can take decisions on different logistical
approaches to avoid the additional cost associated with material delays,
overstock or stock-out conditions.

• Hotels and Telecommunications companies: to serves

customers likely to have better clarity on customer needs.

• Public Services: such as traffic, ambulance, transportations, etc can

optimize their delivery mechanism.

• Smart city: To make cities more efficient and sustainable

to improve the lives of the citizens.
Case Study
1. Clickstream Analytics
2. Feedback analysis using word count
Thank you

Credit Card Number Generator & Validator
100% (3)
Credit Card Number Generator & Validator
2 pages
Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
"Women's Safety App Using Android": Bachelor of Engineering
95% (22)
"Women's Safety App Using Android": Bachelor of Engineering
54 pages
Omni Questions
No ratings yet
Omni Questions
29 pages
Dart by Example - Sample Chapter
No ratings yet
Dart by Example - Sample Chapter
27 pages
Como Publicar REST Web Service - ABAP SAP
No ratings yet
Como Publicar REST Web Service - ABAP SAP
5 pages
API Test Cases
No ratings yet
API Test Cases
6 pages
Module 1 BDA
No ratings yet
Module 1 BDA
103 pages
Unit - I Part I
No ratings yet
Unit - I Part I
48 pages
Unit 1
No ratings yet
Unit 1
26 pages
Unit 1.1 - Introduction to Big Data Analytics
No ratings yet
Unit 1.1 - Introduction to Big Data Analytics
19 pages
Bda M1
No ratings yet
Bda M1
111 pages
Big Data UNIT I
No ratings yet
Big Data UNIT I
91 pages
Cloud computing
No ratings yet
Cloud computing
86 pages
BDA_ppt1
No ratings yet
BDA_ppt1
45 pages
BIG DATA (UNIT 1)
No ratings yet
BIG DATA (UNIT 1)
32 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
BDA Presentations M1 P1
No ratings yet
BDA Presentations M1 P1
40 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
UNIT 1
No ratings yet
UNIT 1
57 pages
big Data
No ratings yet
big Data
21 pages
BDA M1 (40pgs)
No ratings yet
BDA M1 (40pgs)
40 pages
BIGDATA ANALYTICS
No ratings yet
BIGDATA ANALYTICS
19 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Introduction To Bigdata
No ratings yet
Introduction To Bigdata
31 pages
Lec 1 - Introduction to Big Data
No ratings yet
Lec 1 - Introduction to Big Data
37 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
big data unit - 1
No ratings yet
big data unit - 1
12 pages
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
No ratings yet
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
9 pages
Presentation 1
No ratings yet
Presentation 1
27 pages
Big Data Intro
No ratings yet
Big Data Intro
12 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Unit-1
No ratings yet
Unit-1
107 pages
BigData Unit-1
No ratings yet
BigData Unit-1
72 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
Assignment DBMS
No ratings yet
Assignment DBMS
4 pages
Big Data Analytics
No ratings yet
Big Data Analytics
21 pages
BD 1
No ratings yet
BD 1
15 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
Big Data Lecture # 1
No ratings yet
Big Data Lecture # 1
15 pages
Big Data Analytics Notess
No ratings yet
Big Data Analytics Notess
69 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
34 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
Lecture 1
No ratings yet
Lecture 1
25 pages
Unit 5 - Principles of Big Data 2
No ratings yet
Unit 5 - Principles of Big Data 2
14 pages
20210913115458D3708 - Session 01 Introduction To Big Data Analytics
No ratings yet
20210913115458D3708 - Session 01 Introduction To Big Data Analytics
28 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
Module 1
No ratings yet
Module 1
54 pages
BDA Unit 1
No ratings yet
BDA Unit 1
60 pages
BDU1
No ratings yet
BDU1
39 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Big Data - Unit-1 - KCS-061
No ratings yet
Big Data - Unit-1 - KCS-061
63 pages
Introduction To Big Data - Presentation
No ratings yet
Introduction To Big Data - Presentation
30 pages
bda qb answer
No ratings yet
bda qb answer
39 pages
BDA Question Answer
No ratings yet
BDA Question Answer
29 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
1. Data Science
No ratings yet
1. Data Science
54 pages
Bda (Chapter 1)
No ratings yet
Bda (Chapter 1)
8 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
DbSchema - Visual Database Designer and Managemen
No ratings yet
DbSchema - Visual Database Designer and Managemen
21 pages
Microservices Architecture
No ratings yet
Microservices Architecture
14 pages
Unit 5 - Chapter 2 - Introduction To MongoDB
No ratings yet
Unit 5 - Chapter 2 - Introduction To MongoDB
53 pages
What Is JSON
No ratings yet
What Is JSON
9 pages
Pygithub PDF
100% (1)
Pygithub PDF
178 pages
Module 10 Cloud Automation
No ratings yet
Module 10 Cloud Automation
28 pages
Research_interndhip_report_Format
No ratings yet
Research_interndhip_report_Format
30 pages
Tutorial NodeJS
No ratings yet
Tutorial NodeJS
16 pages
ppt 3
No ratings yet
ppt 3
8 pages
Exstream Cloud Service 23.2 Quick Start Guide
No ratings yet
Exstream Cloud Service 23.2 Quick Start Guide
18 pages
Python Application Programming - 18CS752 - Syllabus
No ratings yet
Python Application Programming - 18CS752 - Syllabus
4 pages
JSON
No ratings yet
JSON
10 pages
D3 Tips and Tricks PDF
No ratings yet
D3 Tips and Tricks PDF
551 pages
Expense Tracker
No ratings yet
Expense Tracker
34 pages
OpenSAP Ui51 Week 1 Unit 1 ISU Exercises
No ratings yet
OpenSAP Ui51 Week 1 Unit 1 ISU Exercises
21 pages
TM Forum Input To ONAP Modeling Workshop 2017-12-14
100% (1)
TM Forum Input To ONAP Modeling Workshop 2017-12-14
20 pages
05-Choosing Appropriate Message Transformation and Routing Patterns
No ratings yet
05-Choosing Appropriate Message Transformation and Routing Patterns
21 pages
Contributing To Chrome DevTools Protocol
No ratings yet
Contributing To Chrome DevTools Protocol
6 pages
PDF Learn Google Flutter Fast: 65 Example Apps Mark Clow download
100% (1)
PDF Learn Google Flutter Fast: 65 Example Apps Mark Clow download
65 pages
ANC_MIG_CSN - 27 FEB 2024
No ratings yet
ANC_MIG_CSN - 27 FEB 2024
103 pages
Web Services Interview Questions - SOAP, RESTful
No ratings yet
Web Services Interview Questions - SOAP, RESTful
15 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
1 page
Rudrajeet Singh: Professional Summary
No ratings yet
Rudrajeet Singh: Professional Summary
3 pages
Splunk SPLK-1001 v2022-01-21 q144
No ratings yet
Splunk SPLK-1001 v2022-01-21 q144
33 pages

Chapter 1 Introduction To Big Data

Uploaded by

Chapter 1 Introduction To Big Data

Uploaded by

Chapter 1

Introduction to Big Data

4. Traditional vs. Big Data business approach

5. Case study of Big Data Solutions

• Collected from variety of data sources

• E-business and social media creates 2.5 Exabyte(1018 byte) of

• To reveal new insights for optimized decision making.

• Used to stored for analysis to reveal hidden correlation and

Social Network Big Data Cloud Computing

Big Data (In Petabytes)

Web (In Terabytes)

CRM (In Gigabytes)

ERP (In Megabytes)

Customer Segmentation Support

Offer History Dynamic Pricing Behavior Weblogs

Sensor RFID UserClick Mobile Web

• Structured data usually resides in relational databases (RDBMS).

• Data may be human- or machine-generated as long as the data is created

• Common relational database applications with structured data include airline

• Structured Query Language (SQL) enables queries on this type of structured

• Typical human-generated unstructured data includes:

• Typical machine-generated unstructured data includes:

• Semi-structured data maintains internal tags and markings that identify

• Examples of Semi-structured Data:

• Traditional data management store structure data in data

• Copying all the data from each of these systems to a

• Moreover, sampling the data will not serve the purpose of

• This approach was able to handle huge volume of

• Many IT tools are available for Big Data projects.

• Hadoop- Storage requirement

• When used, these tools can dramatically reduce the time-to-

• Manufacturers and Distributers: benefitted by realizing supply

• Hotels and Telecommunications companies: to serves

• Public Services: such as traffic, ambulance, transportations, etc can

• Smart city: To make cities more efficient and sustainable

You might also like