0% found this document useful (0 votes)
1K views

Bigdata PPT Slides (E)

Big data refers to large and complex datasets that are difficult to process using traditional database management tools. It requires new techniques and algorithms to extract value from the data. Big data generates value from storing and processing very large digital datasets that cannot be analyzed with traditional computing. Hadoop is a software platform that makes it easy to distribute big data across commodity servers and process it in parallel using MapReduce. Organizations are using big data analytics to gain customer insights, tap internal data sources, and build better information ecosystems. Big data represents a large commercial opportunity comparable to previous technology revolutions.

Uploaded by

sai project
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views

Bigdata PPT Slides (E)

Big data refers to large and complex datasets that are difficult to process using traditional database management tools. It requires new techniques and algorithms to extract value from the data. Big data generates value from storing and processing very large digital datasets that cannot be analyzed with traditional computing. Hadoop is a software platform that makes it easy to distribute big data across commodity servers and process it in parallel using MapReduce. Organizations are using big data analytics to gain customer insights, tap internal data sources, and build better information ecosystems. Big data represents a large commercial opportunity comparable to previous technology revolutions.

Uploaded by

sai project
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

1

⦿ Collection of data sets so large and complex that it


becomes difficult to process using on-hand
database management tools or traditional data
processing applications

⦿ “Big Data” is the data whose scale, diversity, and


complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract value
and hidden knowledge from it
⦿ ‘Big Data’ is similar to ‘small data’, but bigger in size
⦿ An aim to solve new problems or old problems in a better
way
⦿ Big Data generates value from the storage and processing
of very large quantities of digital information that cannot
be analyzed with traditional computing techniques.
Handling bigdata- 2
Parallel computing
• Imagine a 1gb text file, all the status updates on Facebook in a day
• Now suppose that a simple counting of the number of rows
takes 10 minutes.
• Select count(*) from fb_status
•What do you do if you have 6 months data, a file of size 200GB, if
you still want to find the results in 10 minutes?
• Parallel computing?
• Put multiple CPUs in a machine (100?)
• Write a code that will calculate 200 parallel counts and finally
sums up
• But you need a super computer
MapReduce Programming Model
3

 Processing data using special map() and reduce()


functions
The map() function is called on every item in the input
and emits a series of intermediate key/value pairs(Local
calculation)
 All values associated with a given key are grouped
together
The reduce() function is called on every unique key, and
its value list, and emits a value that is added to the
output(final organization)
Hadoop 4

•Hadoop is a bunch of tools, it has many components. HDFS


and MapReduce are two core components of Hadoop
• HDFS: Hadoop Distributed File System
• makes our job easy to store the data on commodity
hardwar
• Built to expect hardware failures
• Intended for large files & batch inserts
• MapReduce
• For parallel processing
•So Hadoop is a software platform that lets one easily write
and run applications that process bigdata
So what is Hadoop? 5

• Hadoop is not Bigdata


• Hadoop is not a database

• Hadoop is a platform/framework
• Which allows the user to quickly write and test distributed
systems
• Which is efficient in automatically distributing the data
and work across machines
Hadoop ecosystem 6
Big Data ecosystem 7

28
Application
Smarter
Of Big Data 8

analytics
Multi-
Healthcare channel
sales

Homeland Telecom
Security

Trading
Traffic Analytics
Co ntrol

Search
Manufacturing Quality
9

• Will be so overwhelmed
• Need the right people and solve the right problems

• Costs escalate too fast


• Isn’t necessary to capture 100%

• Many sources of big data


is privacy
• self-regulation
• Legal regulation
10

⦿ Our newest research finds that organizations are using big


data to targ et customer-centric outcomes, tap into
internal data and build a better information ecosystem.

⦿ Big Data is already an important part of the $64 billion


database and data analytics market

⦿ It offers commercial opportunities of a comparable


scale to enterprise software in the late 1980s

⦿ And the Internet boom of the 1990s, and the social media
explosion of today.

You might also like