12_big_sql
12_big_sql
Conclusion
MapReduce is difficult
– MapReduce Java API is tedious and
requires programming expertise
– Unfamiliar languages (i.e.. Pig) also require special skills
Better Performance
– Efficient local execution of point queries
– Better Optimizer and support for complex queries
query
BigSQL supports
– Insert ( Upsert )
– Delete/Update ( not transactional in case of server failure )
– Indexes
– Dense Columns
The Big SQL server process can be started and stopped with
$BIGSQL_HOME/bin/bigsql start
$BIGSQL_HOME/bin/bigsql stop
$ $BIGSQL_HOME/bin/bigsql start
BigSQL running, pid 8479.
$ $BIGSQL_HOME/bin/bigsql stop
BigSQL pid 8479 stopped.
Client Tools
– JSqsh
• Command line client installed with Big
SQL server
– BigInsights console access
• Execute queries via the console web UI
– Big SQL Eclipse plugin
• Graphical query builder with syntax
highlighting
JsonTuple allows to store complex objects as json and parse them on demand
Big SQL does not directly dictate the storage format of a given data type!
– The SerDe that is used determines the storage representation
– Big SQL uses the Hive SerDe’s by default (LazySimpleSerDe and LazyBinarySerDe)
• Thus, Big SQL shares the Hive data representations
Loading Data
Two possibilities LOAD and LOAD USING
– LOAD simply moves files into the Hive Warehouse folder
– LOAD USING runs a MapReduce job
LOAD is very fast but files need to fit to table storage definition
– Load doesn’t check files, first queries on bad data will fail
At the moment only overwrite is supported File needs to fit to table definition. If table
is tab delimited, file needs to be tab
delimited.
Loading from Databases
Data Sources supported
– Data can be loaded from any databases that supports JDBC.
select *
from(select row_number() over (order by age asc) as rn
empno,
name,
age
from employee2) as t
where rn <= 4;
– Or session setting: