Big Data Training1
Big Data Training1
tools such as Splice machine use these API's to provide SQL layer on top of Hbase
multiple namenodes/namespace
https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/
Federation.html
alter table salary add column `id` int(10) unsigned primary KEY AUTO_INCREMENT;
----------------------------
day 2
yarn
meta data on resource manager - what jobs are running on what application masters
-----------------------------------------------------------------------------
flume agent
multiplex
storm has capacity to horizontally scale? storm can scale without shutting down
topology - streaming ETL
--------------------------------------------------------------------------------
Pig
order by - expensive
- create a pool, and connect udf to the pool because udf is executing many times
sample
pig - compression
hcat loader - allows filtering to happen while loading
http://chimera.labs.oreilly.com/books/1234000001811/ch07.html#explain
---------------------------------------------------------------------------
Hive
hive partitions
when u r loading data using load, data will not be parsed and distributed based on
partition key
skewed table
you can read and write into hive from pig script
-----
create table names (id int, name string) partitioned by (state string) row format
delimited fields terminated by '\t';
-------------------------
------------------------------
Stinger
------