0% found this document useful (0 votes)
251 views19 pages

Chapter 1 Introduction To Big Data

This document provides an introduction to big data, including definitions and characteristics. It defines big data as massive datasets from various sources that are analyzed to reveal patterns and optimize decision making. Key characteristics of big data include volume, velocity, and variety. There are three main types of data: structured, unstructured, and semi-structured. Traditional data management stored structured data in data warehouses, while big data tools like Hadoop can handle large volumes and varieties of data more efficiently. The document provides examples of how insurance companies, manufacturers, hotels, and public services can benefit from big data analytics.

Uploaded by

shubham.ojha2102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
251 views19 pages

Chapter 1 Introduction To Big Data

This document provides an introduction to big data, including definitions and characteristics. It defines big data as massive datasets from various sources that are analyzed to reveal patterns and optimize decision making. Key characteristics of big data include volume, velocity, and variety. There are three main types of data: structured, unstructured, and semi-structured. Traditional data management stored structured data in data warehouses, while big data tools like Hadoop can handle large volumes and varieties of data more efficiently. The document provides examples of how insurance companies, manufacturers, hotels, and public services can benefit from big data analytics.

Uploaded by

shubham.ojha2102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Chapter 1

Introduction to Big Data


Introduction

1. What is BigData?

2. BigData Characteristics

3. Types of BigData

4. Traditional vs. Big Data business approach

5. Case study of Big Data Solutions


What is BigData?

• Massive datasets

• Collected from variety of data sources

• E-business and social media creates 2.5 Exabyte(1018 byte) of


data per day.

• To reveal new insights for optimized decision making.

• Used to stored for analysis to reveal hidden correlation and


patterns which is “BIG DATA ANALYTICS”
Trends of Data Generation

Year: 2020
Data: 50 ZB

Year: 2017
Data: 30 ZB

Year: 2010
Data: 20 ZB

Year:
2006
Data: 10
ZB
Big Data: Results of 3 computing Trends

Social Network Big Data Cloud Computing

Mobile
compu
ting
Volume of Big Data

Big Data (In Petabytes)

Web (In Terabytes)

CRM (In Gigabytes)

ERP (In Megabytes)

Transaction Operations

Customer Segmentation Support

Offer History Dynamic Pricing Behavior Weblogs

Sensor RFID UserClick Mobile Web


Characteristics of Big Data

1. Volume

2. Velocity

3. Variety
Five V’s of Big Data
Types of Big Data
What is Structured Data?

• Structured data usually resides in relational databases (RDBMS).

• Even text strings of variable length like names are contained in records,
making it a simple matter to search.

• Data may be human- or machine-generated as long as the data is created


within an RDBMS structure.
• This format is eminently searchable both with human generated queries and
via algorithms using type of data and field names, such as alphabetical or
numeric, currency or date.

• Common relational database applications with structured data include airline


reservation systems, inventory control, sales transactions, and ATM
activity.

• Structured Query Language (SQL) enables queries on this type of structured


data within relational databases.
What is Unstructured Data?
• Unstructured data has internal structure but is not structured via pre-
defined data models or schema.
• It may be textual or non-textual, and human- or machine-generated.
• It may also be stored within a non-relational database like NoSQL.

• Typical human-generated unstructured data includes:


1. Text files: Word processing, spreadsheets, presentations, email, logs.
2. Social Media: Data from Facebook, Twitter, LinkedIn.
3. Website: YouTube, Instagram, photo sharing sites.
4. Mobile data: Text messages, locations.
5. Communications: Chat, IM, phone recordings, collaboration software.
6. Media: MP3, digital photos, audio and video files.
7. Business applications: MS Office documents, productivity applications.

• Typical machine-generated unstructured data includes:


1. Satellite imagery: Weather data, land forms, military movements.
2. Scientific data: Oil and gas exploration, space exploration, seismic imagery,
atmospheric data.
3. Digital surveillance: Surveillance photos and video.
4. Sensor data: Traffic, weather, oceanographic sensors.
What is Semi-structured data ?

• Semi-structured data maintains internal tags and markings that identify


separate data elements, which enables information grouping and
hierarchies.
• Both documents and databases can be semi-structured.
• Email is a very common example of a semi-structured data type.

• Examples of Semi-structured Data:


1. Markup language XML : XML is a set of document encoding rules that defines
a human- and machine-readable format.
2. Open standard JSON (JavaScript Object Notation) : Its structure consists of
name/value pairs (or object, hash table, etc.) and an ordered value list (or array,
sequence, list).
3. NoSQL : NoSQL databases differ from relational databases because they do not
separate the organization (schema) from the data. It also allows for easier data
exchange between databases. Some newer NoSQL databases
ike MongoDB and Couchbase .
Traditional data management Approach

• Traditional data management store structure data in data


mart and data warehouses which are distributed
throughout the organization.

• Copying all the data from each of these systems to a


centralized location and keeping it updated is not an easy
task.

• Moreover, sampling the data will not serve the purpose of


extracting required information.

• This approach was able to handle huge volume of


transactions but up to an extent.
Big Data Approach

• Many IT tools are available for Big Data projects.

• Hadoop- Storage requirement


• Apache Spark- Stream Processing

• When used, these tools can dramatically reduce the time-to-


value- in most cases from more than 2 years to less than 4
months.
Advantages of using Hadoop:

1. Scalability
2. No pre-processing of data
3. Handles un-structure data
4. No limit of data and time
5. Protection against H/W failure
Beneficial Domains
• Insurance companies: To understand the likelihood of fraud by
accessing the internal and external data while processing claims.

• Manufacturers and Distributers: benefitted by realizing supply


chain issues earlier so that they can take decisions on different logistical
approaches to avoid the additional cost associated with material delays,
overstock or stock-out conditions.

• Hotels and Telecommunications companies: to serves


customers likely to have better clarity on customer needs.

• Public Services: such as traffic, ambulance, transportations, etc can


optimize their delivery mechanism.

• Smart city: To make cities more efficient and sustainable


to improve the lives of the citizens.
Case Study
1. Clickstream Analytics
2. Feedback analysis using word count
Thank you

You might also like