A Seminar Presentation On "Big Data": Presented By: Divyanshu Bhardwaj Department of Computer Science VIII Semester
A Seminar Presentation On "Big Data": Presented By: Divyanshu Bhardwaj Department of Computer Science VIII Semester
BIG DATA
Presented by:
Divyanshu Bhardwaj
Department of Computer Science
VIII Semester
Targeted Information
To know what you need
before you even know you
need it based on past
purchasing habits!
To notify you of your
expiring drivers license
or credit cards or last
refill on a Rx, etc.
To give you turn-by-turn
directions to a shelter in
case of emergency.
Big Science
The Large Hadron Collider experiments represent about 150
million sensors delivering data 40 million times per second.
There are nearly 600 million collisions per second.
Government
In 2012, the Obama administration announced the Big
Data Research and Development Initiative, which
explored how big data could be used to address
important problems facing the government. The
initiative was composed of 84 different big data
programs spread across six departments.
Big data analysis played a large role in Barack Obama's
successful 2012 re-election campaign.
The NASA Center for Climate Simulation (NCCS) stores
32 petabytes of climate observations and simulations on
the Discover supercomputing cluster.
Private Sector
Amazon.com handles millions of back-end operations every
day, as well as queries from more than half a million thirdparty sellers. The core technology that keeps Amazon running
is Linux-based and as of 2005 they had the worlds three
largest Linux databases, with capacities of 7.8 TB, 18.5 TB,
and 24.7 TB.
Wal-Mart handles more than 1 million customer transactions
every hour, which is imported into databases estimated to
contain more than 2.5 petabytes (2560 terabytes) of data
the equivalent of 167 times the information contained in all
the books in the US Library of Congress.
Facebook handles 50 billion photos from its user base.
Apache Hadoop
Apache Hadoop is an open-source software
framework that supports data-intensive distributed
applications.
Hadoop implements a computational paradigm
where the application is divided into many small
fragments of work, each of which may be executed
or re-executed on any node in the cluster.
It enables applications to work with thousands of
computation-independent computers and petabytes
of data.
Importance of Hadoop
Organizations are discovering that important
predictions can be made by sorting through and
analyzing Big Data.
However, since 80% of this data is "unstructured", it
must be formatted (or structured) in a way that that
makes it suitable for data mining and subsequent
analysis.
Hadoop is the core platform for structuring Big
Data, and solves the problem of making it useful for
analytics purposes.
References
[1] http://en.wikipedia.org/wiki/Big_data
[2] http://www.zettaset.com/info-center/what-isbig-data-and-hadoop.php
[3] http://www.fastcodesign.com/1669551/howcompanies-like-amazon-use-big-data-to-makeyou-love-them
[4] http://en.wikipedia.org/wiki/Dataintensive_computing
[5] http://www.youtube.com/Big_Data_Analytics