MODULE 2 and 3
MODULE 2 and 3
• Projections are physical storage for table data. There can be ‘n’ number of
projections for a single table.
• At query execution, the vertica’s optimizer chooses the best projection for
the query.
1
Vertica Object Hierarchy
Query Execution
DEPARTMENT OF CSE 9
20NHOP01
WOS
• It is a memory resident data store.
• Temporarily storing data in primary memory, speeds up loading
process & reduce fragmentation on disk.
ROS
• It is a disk resident data store.
• When Tuple mover task , moves out into ROS: ROS containers are
created and data is organized into projections on the hard disk.
Tuple mover:
Moveout
• Vertica’s optimizer component that moves out data from
memory to the disk (ROS)
Mergeout
• It combines ROS containers on the disk 10
Why this Hybrid model?
To support different types of load.
• If small frequent load or trickle load:
– Then load to WOS
– The data is WOS is still available for query result
– Size of WOS: min{25% of available memory, 2GB RAM}
11
12
Projection Design
Projection fundamentals:
1. Superprojection
2. Query-specific Projection
3. Buddy Projection
4. Live Aggregate Projection
5. Pre join projection
20
21
Superprojection
22
Query-specific Projection
Buddy Projection
Pre-join Projection
• Manually created
• Multiple tables are joined and stored in the form of
projection.
23
Live Aggregate Projection
• SUM
• MAX
• MIN
• COUNT
24
Projection properties
1. Encoding/compression: Each column is always encoded,
compressed or both. Vertica can work directly with encoded
data but not with compressed data . Compressed data must
first be uncompressed.
2. Sorting – order by: All projections contain at least one column
in the ORDER BY statement.
ORDER BY A ORDER BY A, B ORDER BY B
B A C
A B C A B C 1 1 a
1 2 a 1 1 a 1 2 d
1 1 b 1 2 b 2 1 b
1 3 a 1 3 c 2 2 e
2 1 c 2 1 d 3 2 f
2 3 d 2 2 e 3 1 c
2 2 f 2 3 f
e
Projection Properties
• Encoding/compression
• Sorting – order by
• K-safety
26
Replication and Segmentation
• Vertica distributes data evenly on different nodes. There are two
methods of distribution:
• RUNNING DBD
• Optimized super projections are created when you run the
Database Designer.
• You also have the option of creating query-specific projections
in the DBD wizard.
• MANUALLY
33
MANUALLY
• CREATE PROJECTION statement:
• CREATE PROJECTION [ IF NOT EXISTS ]
...projection-name
... [ ( ........{ projection-col | grouped-clause
......... [ ENCODING encoding-type ]
........}[,...] ...) ] AS SELECT
...select‑list from-clause
...[ ORDER BY column-expr[,...] ]
...[ segmentation-spec ] ...[ KSAFE [ k-num ] ]
34
MANUALLY
• CREATE PROJECTION projection_name
(projection_col,...) AS SELECT table_col,...
FROM existing_table ...
• => CREATE TABLE trades (stock CHAR(5), bid
INT, ask INT);
=> CREATE PROJECTION tradeproj (stock
ENCODING RLE,
GROUPED(bid ENCODING DELTAVAL, ask))
AS (SELECT * FROM trades) KSAFE 1;
35
Hands-on (Module 2)
1. Creation of schema, tables and execution of
SQL statements on Vertica Database
2. Hands-on projections
Connect to Remote Vertica Server
• Click on putty (available on desktop)
• Host name: 10.10.26.11
• Login as : hp1
• Password: ROOT@123
(login to vertica by giving the above user name and password)
• Login to dbadmin by typing the following path
/opt/vertica/bin/vsql –Udbadmin –wvertica123
dbadmin=>
(now we can create tables and perform all the queries)
Database Designer(DBD)
• Vertica's Database Designer is a tool that:
Analyses your logical schema, sample data, and, optionally, your sample
queries.
Creates a Physical Schema design (a set of projections) that can be deployed
automatically or manually.
Can be used by anyone without specialized database knowledge. Even
business users can run Database Designer.
Can be run and re-run any time for additional optimization without stopping
the database.
• The projections that Database Designer creates provide excellent query
performance within physical constraints while using disk space efficiently.
Syntax :
40
COPY DIRECT
• Best for infrequent or bulk loads
• More efficient than AUTO.
• No WOS involved
• Can create ROS fragmentation if used for small frequent loads
• COPY automatically COMMITS by default
41
42
3 ways : COPY
• Depending on the data you are loading, the COPY
statement has several load methods.
43
MERGE
• MERGE statements combine INSERT and UPDATE operations.
• The source table can include new and existing data.
• If the target table does not include any of the source table’s
records (new or existing), MERGE inserts all source data into
the target.
• The following is a MERGE example with two MERGE options
that you update (WHEN MATCHED THEN UPDATE…) or insert
data (WHEN NOT MATCHED THEN INSERT…).
• Updating one million records in two seconds.
44
Syntax:
MERGE INTO [[db-name.]schema.]target-table
[ alias ] ... USING [[db-name.]schema.]source-
table [ alias ] ... ON ( condition ) ... [ WHEN
MATCHED THEN UPDATE SET column1 =
value1 [, column2 = value2 ... ] ] ... [ WHEN
NOT MATCHED THEN INSERT ( column1 [,
column2 ...]) VALUES ( value1 [, value2 ... ] ) ]
45
MERGE INTO target TGT
USING source SRC
ON SRC.A=TGT.A
WHEN MATCHED THEN
UPDATE SET A=TGT.A, B=TGT.B, C=TGT.C, D=TGT.D,
E=TGT.E
WHEN NOT MATCHED THEN
INSERT VALUES (SRC.A,SRC.B, SRC.C, SRC.D,
SRC.E);
46
47
Purge
• In HP Vertica, delete operations do not remove rows from
physical storage.
• The DELETE command in HP Vertica marks rows as deleted.
• Purge is the process of removing the deleted data from disk.
• Purge Permanently removes deleted data from physical
storage so that the disk space can be reused.
Partitioning
• HP Vertica supports data partitioning at the table level, which
divides one large table into smaller pieces.
• Partitions are a table property that apply to all projections for a
given table. The Vertica partitioning capability divides one large
table into smaller pieces based on values in one or more columns.
• A common use for partitions is to split data by time. For instance, if
a table contains decades of data, you can partition it by year, or by
month, if the table has a year of data.
• Partitions segregate data on each node to facilitate dropping
partitions.
• Partitions can make data lifecycle management easier and improve
the performance of queries.
Differences Between Partitioning and
Segmentation
• There is a distinction between partitioning at the table level
and segmenting a projection :
50
The following diagram illustrates the flow of segmentation
and partitioning on a four-node database cluster:
1. Example table data
2. Data segmented by HASH(order_id)
3. Data segmented by hash across four nodes
4. Data partitioned by year on a single node
51
52
Hands-on (Module 3)
1. Loading data files from different sources to
Vertica database.
2. Verifying the log files after loading the data
into Vertica database.
3. Hands-on partitions.