The Age of Big Data: Kayvan Tirdad
The Age of Big Data: Kayvan Tirdad
Kayvan Tirdad
[email protected]
Contents
1 2 3 Introduction: Explosion in Quantity of Data
4
5 5
Contents
1 6 2 7 3 8 Some Challenges in Big Data
4 9
10 5
2012
LHC 1 (40 TB/S) 640TB per Flight
Air Bus A380 - 1 billion line of code - each engine generate 10 TB every 30 min
Entertainment
Internet images, Hollywood movies, MP3 files,
Medicine
MRI & CT scans, patient records,
Ignore
- 1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750 Day - The cost for 1000 cloud node each processing 1PB 2000 * 3060$ = 6,120,000$
10
Job
- The U.S. could face a shortage by 2018 of 140,000 to 190,000 people with
"deep analytical talent" and of 1.5 million people capable of analyzing data in ways that enable business decisions. (McKinsey & Co) - Big Data industry is worth more than $100 billion growing at almost 10% a year (roughly twice as fast as the software business)
Microsoft
HDInsight Server
IBM
Netezza
11
- Moneyball had a huge impact in other teams in MLB And there is a moneyball movie!!!!!
12
- predictive modeling - mybarackobama.com - drive traffic to other campaign sites Facebook page (33 million "likes") YouTube channel (240,000 subscribers and 246 million page views). - a contest to dine with Sarah Jessica Parker - Every single night, the team ran 66,000 computer simulations, Reddit!!! - Amazon web services
13
Nate Silvers, Five thirty Eight blog Predict Obama had a 86% chance of winning Predicted all 50 state correctly Sam Wang, the Princeton Election Consortium The probability of Obama's re-election at more than 98%
14
The Linked Open Data Ripper Mapping, Ranking, Visualization, Key Matching, Snappiness
Demonstrate the Value of Semantics: let data integration drive DBMS technology Large volumes of heterogeneous data, like link data and RDF
15
16
3- How would your business change if you used big data for widespread, real time customization?
4- How can big data augment or even replace Management? 5-Could you create a new business model based on data?
17
Map Reduce
pioneered by Google popularized by Yahoo! (Hadoop)
18
Data-parallel programming model An associated parallel and distributed implementation for commodity clusters Pioneered by Google Processes 20 PB of data per day Popularized by open-source Hadoop Used by Yahoo!, Facebook, Amazon, and the list is growing
19
MAP
<K1, V1>
<K2,V2>
<K3,V3>
REDUCE
20
Automatic Parallelization:
Depending on the size of RAW INPUT DATA instantiate multiple MAP tasks Similarly, depending upon the number of intermediate <key, value> partitions instantiate multiple REDUCE tasks
Run-time:
Data partitioning Task scheduling Handling machine failures Managing inter-machine communication
21
Zeta-Byte Horizon
As of 2009, the entire World Wide Web was estimated to contain close to 500 exabytes. This is a half zettabyte the total amount of global data is expected to grow to 2.7 zettabytes during 2012. This is 48% up from 2011
22
Wrap Up
Book Review
23
References
1.
24
B. Brown, M. Chuiu and J. Manyika, Are you ready for the era of Big Data? McKinsey Quarterly, Oct 2011, McKinsey Global Institute 2. C. Bizer, P. Bonez, M. L. Bordie and O. Erling, The Meaningful Use of Big Data: Four Perspective Four Challenges SIGMOD Vol. 40, No. 4, December 2011 3. D. Boyd and K. Crawford, Six Provation for Big Data A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, September 2011, Oxford Internet Institute 4. D. Agrawal, S. Das and A. E. Abbadi, Big Data and Cloud Computing: Current State and Future Opportunities ETDB 2011, Uppsala, Sweden 5. D. Agrawal, S. Das and A. E. Abbadi, Big Data and Cloud Computing: New Wine or Just New Bottles? VLDB 2010, Vol. 3, No. 2 6. F. J. Alexander, A. Hoisie and A. Szalay, Big Data IEEE Computing in Science and Engineering journal 2011 7. O. Trelles, P Prins, M. Snir and R. C. Jansen, Big Data, but are we ready? Nature Reviews, Feb 2011 8. K. Bakhshi, Considerations for Big data: Architecture and approach Aerospace Conference, 2012 IEEE 8. S. Lohr, The Age of Big Data Thr New York times Publication, February 2012 10. M. Nielsen, Aguide to the day of big data, Nature, vol. 462, December 2009
Kayvan Tirdad