01 Introduction
01 Introduction
Lecture #1
Introduction to Data
Current trend
Big data is data whose scale, diversity, and complexity require new
architecture, techniques, algorithms, and analytics to manage it and extract
value and hidden knowledge from it…
CS 40003: Data Analytics 9
Characteristics of Big data: V3
Exponential increase in
collected/generated data
Hadoop
MapReduce
Mahout
Apache Hbase
Cassandra
MapReduce
Hadoop, Hive, Pig, Cascading, Cascalog, mrjob, Caffeine, S4, MapR, Acunu, Flume, Kafka,
Azkaban, Oozie, Greenplum
Storage
S3, HDFS, GDFS
Servers
EC2, Google App Engine, Elastic, Beanstalk, Heroku
Processing
R, Yahoo! Pipes, Mechanical Turk, Solr/Lucene, ElasticSearch, Datameer, BigSheets, Tinkerpop