Postgresql Course Material
Postgresql Course Material
Shankarnag
Software Engineer & Database Architect
© Copyright EnterpriseDB Corporation, 2015. All rights reserved. 1
Intro Course Agenda
Limit Value
Maximum Database Size Unlimited
Maximum Table Size 32 TB
Maximum Row Size 1.6 TB
Maximum Field Size 1 GB
Maximum Rows per Table Unlimited
Maximum Columns per Table 250-1600 (Depending on Column types)
Maximum Indexes per Table Unlimited
Row Tuple
Column Attribute
BGWRITER STATS
COLLECTOR
Data WAL Archived
CHECKPOINTER ARCHIVER Files Segments WAL
WAL Writer
• Background writer
− Writes updated data blocks to disk
• WAL writer
− Flushes write-ahead log to disk
• Checkpointer process
− Automatically performs a checkpoint based on config parameters
• Autovacuum launcher
− Starts Autovacuum workers as needed
• Autovacuum workers
− Recover free space for reuse
• Logging collector
− Routes log messages to syslog, eventlog, or log files
• Stats collector
− Collects usage statistics by relation and block
• Archiver
− Archives write-ahead log files
Shared Memory
17
Stable Database
CHECKPOINT
Stable Database
attempts to ensure an
adequate supply of clean
buffers.
Shared (data) Buffers
• bgwriter writes dirty blocks to
storage as needed.
BGWRITER
Stable Database
Transaction
Log
Stable Database
Transaction
ARCHIVE Log
COMMAND Stable Database
• Before Commit
− Uncommitted updates are in memory.
• After Commit
− Committed updates written from shared memory to WAL log file (on disk).
• After Checkpoint
− Modified data pages are written from shared memory to the data files.
25
© Copyright EnterpriseDB Corporation, 2015. All rights reserved. 25
Database Cluster Data Directory Layout
• File-per-table, file-per-index.
• A table-space is a directory.
• Each database that uses that table-space gets a subdirectory.
• Each relation using that tablespace/database combination gets one or
more files, in 1GB chunks.
• Additional files used to hold auxiliary information (free space map, visibility
map).
• Each file name is a number (see pg_class.relfilenode).
28
© Copyright EnterpriseDB Corporation, 2015. All rights reserved. 28
Sample: Data Directory Layout
Database OID Relation Data File
14307 14297
base 14300
/data 14405
14312
pg_tblspc 16650 14498
Tablespace OID
/storage1/pgtab
14300
14307
14301
16700
16651
16701
• PostgreSQL has been tested and certified for the following locales:
− en_US United States English
− zh_HK Traditional Chinese with Hong Kong SCS
− zh_TW Traditional Chinese for Taiwan
− zh_CN Simplified Chinese
− ja_JP Japanese
− ko_KR Korean
• There are many configuration parameters that effect the behavior of the database
system.
• Server config file postgresql.conf stores all the server parameters.
• All parameter names are case-insensitive.
• Some parameters require restart.
• Query to list of all parameters.
− SELECT name,setting FROM pg_settings;
• Query to list all parameters requiring server restart.
− SELECT name FROM pg_settings WHERE context = 'postmaster';
• One way to set these parameters is to edit the file postgresql.conf, which is
normally kept in the data directory.
• Some parameters can be changed per session using the SET command.
• Some parameters can be changed at the user level using ALTER USER.
• Some parameters can be changed at the database level using ALTER
DATABASE.
• The SHOW command can be used to see settings.
• pg_settings catalog table lists settings information.
50
© Copyright EnterpriseDB Corporation, 2015. All rights reserved. 50
Memory Settings
• The following parameters in postgresql.conf control the memory settings:
− shared_buffers (integer)
− Sets the amount of memory the database server uses for shared memory buffers
− temp_buffers (integer)
− Sets the maximum number of temporary buffers used by each database session
− work_mem (default 4MB)
− Specifies the amount of memory to be used by internal sort operations and hash tables before
switching to temporary disk files
− autovacuum_work_mem
− Specifies the maximum amount of memory to be used by each autovacuum worker process
− maintenance_work_mem (default 64MB)
− Specifies the maximum amount of memory to be used in maintenance operations, such as
VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY
54
© Copyright EnterpriseDB Corporation, 2015. All rights reserved. 54
Logging - When To Log
6. Open the configuration file for your Postgres database cluster and make
the following changes
− Maximum allowed connections to 50
− Authentication time to 10 mins
− Shared buffers to 256 MB
− work_mem to 10 MB
− wal_buffers to 8MB
7. Restart the server and verify the changes made in previous step
• psql has it's own set of commands, all of which start with a backslash (\).
These are in no way related to SQL commands, which operate in the
server. psql commands only affect psql.
• Some commands accept a pattern. This pattern is a modified regex. Key
points:
− * and ? are wildcards
− Double-quotes are used to specify an exact name, ignoring all special characters and
preserving case
• \l[ist][+]
− Lists the names, owners, and character set encodings of all the databases in the
server
− If + is appended to the command name, database descriptions are also displayed
• \dn+ [pattern]
− Lists schemas (namespaces)
− + adds permissions and description to output
• \df[+] [pattern]
− Lists functions
− + adds owner, language, source code and description to output
• \conninfo
− Current connection information
• \q or ^d
− Quits the edb-psql program
• \cd [ directory ]
− Change current working directory
− Tip: To print your current working directory, use
\! pwd
• \! [ command ]
− Executes the specified command
− If no command is specified, escapes to a separate Unix shell (CMD.EXE in
Windows)
• \?
− Shows help information about edb-psql commands
• \h [command]
− Shows information about SQL commands
− If a command isn't specified, lists all SQL commands
• psql --help
− Lists command line options for edb-psql
Cluster
Schemas
• There is a program that you can execute from the shell to create new
databases, createdb.
− createdb dbname
• Create Database command can be used to create a database in a cluster.
− Syntax:
CREATE DATABASE name
[ [ WITH ] [ OWNER [=] dbowner ]
[ TEMPLATE [=] template ]
[ ENCODING [=] encoding ]
[ TABLESPACE [=] tablespace ]
[ CONNECTION LIMIT [=] connlimit ] ]
• The psql tool allows you to interactively enter, edit, and execute SQL
commands.
• PGAdmin-III GUI tool can also be used to access a database.
• Lets use psql to access a database:
− Open Command prompt or terminal.
− If PATH is not set you can execute next command from the bin directory location of
postgres installation
− $psql –U postgres db
− db=#
SCHEMA
Packages Domains
USER
• The first schema named in the search path is called the current schema if
that named schema exists.
• Aside from being the first schema searched, it is also the schema in which
new tables will be created if the CREATE TABLE command does not
specify a schema name.
• To show the current search path, use the following command:
− SHOW search_path;
• To put our new schema in the path, we use:
− SET search_path TO myschema, public;
1. Write a SQL query to view name and size of all the databases in your
data cluster
2. In a previous module you learned how to create a database user, now
create a database user named mgr1
3. Create a new database mgrdb with owner mgr1
4. Create a schema mgr1 in database mgrdb. This schema must be owned
by mgr1
6. Open psql and connect to mgrdb using mgr1 user and find the result of
the following query
select * from mgrtab;
The above statement should run successfully with 0 rows returned
7. In the lab exercise of a previous module you have added user Irena. Set
the proper search_path for user Irena so that the edbstore schema
objects can be accessed without use of fully qualified table names
8. Connect to the edbstore database using the Irena user and verify if you
can access orders table without a fully qualified table name
• User/Password
• Connect Privilege
Database • Schema
Permissions
• Table level
Object privileges
• Grant/Revoke
FATAL: no pg_hba.conf entry for host "192.168.10.23", user “edbstore", database “edbuser“
FATAL: password authentication failed for user “edbuser“
FATAL: user “edbuser" does not exist
FATAL: database “edbstore" does not exist
Cluster
Schemas
101
© Copyright EnterpriseDB Corporation, 2015. All rights reserved. 101
Lab Exercise
2. You have decided to log all the connections on your data cluster.
Configure your postgresql.conf settings so that all the successful as well
as unsuccessful connections are logged in the error log file
3. Connect to edbstore database
4. Verify if the connections are logged or not
Fast
Media
Slow
Media
• Example
edb=# explain select * from emp;
QUERY PLAN
------------------------------------------------------
Seq Scan on emp (cost=0.00..1.14 rows=14 width=135)
• The numbers that are quoted by EXPLAIN are:
- Estimated start-up cost
- Estimated total cost
- Estimated number of rows output by this plan node
- Estimated average width (in bytes) of rows output by this plan node
• Postgres Optimizer and Planner use table statistics for generating Query Plans.
• Choice of Query Plans are as good as Table Statistics.
• Table statistics:
- Table statistics stores the total number of rows in each table and index, as
well as the number of disk blocks occupied by each table and index.
- Statistics table pg_class contains reltuples and relpages which contain
important statistic information for each table in a database.
• Table Statistics
- Are not updated in real time
- Can be updated using ANALYZE command
- Stored in pg_class and pg_statistics
- You can run the ANALYZE command from edb-psql on specific tables and just
specific columns
- Autovacuum will run ANALYZE as configured
- Syntax for ANALYZE
- Analyze [Table]
• VACUUM
- Removes dead rows and marks the space available for future reuse.
- Does not return the space to the operating system.
- Space is reclaimed if obsolete rows are at the end of the table.
• VACUUM FULL
- More aggressive algorithm compared to VACUUM
- Compacts tables by writing a complete new version of the table file with no
dead space.
- Takes more time.
- Requires extra disk space for the new copy of the table, until the operation
completes.
• Each heap relation has a Visibility Map which keep track of which pages contain
only tuples.
• Stored at <relfilenode>_vm.
• Helps vacuum to determine whether page contain dead rows.
• Can also be used by index-only scans to answer queries.
• VACUUM command updates the visibility map.
• The visibility map is vastly smaller, so it can be cached easily.
• REINDEX rebuilds an index using the data stored in the index's table.
• There are several reasons to use REINDEX:
− An index is corrupted.
− An index has become "bloated", that it is contains many empty or nearly-
empty pages.
− You have altered a storage parameter (such as fillfactor) for an index.
− An index build with the CONCURRENTLY option failed, leaving an "invalid"
index.
• Syntax:
− REINDEX { INDEX | TABLE | DATABASE | SYSTEM } name [
FORCE ]
1. Configure a Postgres cluster to log all SQL statements which run for more
than 500ms
2. Write a statement to view the explain plan for any slow running queries
logged from the previous step
3. Write a statement to view total pages and tuples in the orders table
• Open PGADMIN-III
• Connect with the database
cluster
• Right click on the database to
be backed up
• Click Backup
• The text files created by pg_dump are intended to be read in by the psql program.
The general command form to restore a dump is:
• infile is what you used as outfile for the pg_dump command. The database
dbname will not be created by this command, so you must create it yourself.
pg_restore Options
• -d <database name>: Connect to the specified database. Also restores to this
database if –C option is omitted.
• -C: Create the database named in the dump file & restore directly into it.
• -a: Restore the data only, not the data definitions (schema).
• -s: Restore the data definitions (schema) only, not the data.
• -n <schema>: Restore only objects from specified schema.
• -t <table>: Restore only specified table.
• -v: Verbose option.
Syntax:
pg_dumpall [options…] > filename.backup
pg_dumpall Options
-a: Data only. Do not dump schema.
-s: Data definitions (schema) only.
-g: Dump global objects only - not databases.
-r: Dump only roles.
-c: Clean (drop) databases before recreating.
-O: Skip restoration of object ownership.
-x: Do not dump privileges (grant/revoke)
--disable-triggers: Disable triggers during data-only restore.
-v: Verbose option.
• An alternative backup strategy is to directly copy the files that PPAS uses
to store the data in the database.
• You can use whatever method you prefer for doing usual file system
backups, for example:
- tar -cf backup.tar /usr/local/pgsql/data
• The database server must be shut down in order to get a usable backup.
• File system backups only work for complete backup and restoration of an
entire database cluster.
• File system snapshots work for live servers.
• Step 1: Edit the postgresql.conf file and set the archive parameters:
wal_level=archive
archive_mode=on
Unix:
archive_command= ‘cp –i %p /mnt/server/archivedir/%f </dev/null’
Windows:
archive_command= 'copy "%p" c:\\mnt\\server\\archivedir\\"%f"'
%p is the absolute path of WAL otherwise you can define the path
%f is a unique file name which will be created on above path.
− Use a standard file system backup utility to back up the /data subdirectory
recovery_target_time(timestamp)
• Concurrency – two or more sessions accessing the same data at the same
time.
• Each transaction sees snapshot of data (database version) as it was some
time ago.
• Transaction isolation - Protects transaction from viewing “inconsistent”
data (currently being updated by another transaction).
• No locking – readers don‟t block writers and writers don‟t block readers.
• Partitioning
• Partition Methods
• Partition Setup
• Partitioning Example
• Partition Table Explain Plan
• Partitioning refers to splitting what is logically one large table into smaller
physical pieces.
• Query performance can be improved dramatically for certain kinds of
queries.
• Improved Update performance. When an index no longer fits easily in
memory, both read and write operations on the index take progressively
more disk accesses.
• Range Partitioning
− Range partitions are defined via key column(s) with no overlap or gaps
• List Partitioning
− Each key value is explicitly listed for the partitioning scheme Partitioning Methods
• Performance
− As tables grow query performance slows
− Data access methods, you may only need to access portions of data frequently
• Manageability
− Allows data to be added and removed easier
− Maintenance is easier (vacuum, reindex, cluster), can focus on active data
• Scalability
− Manage larger amounts of data easier
− Remove any hardware constraints
Step 1: create table master_table (id numeric, name varchar(50) NOT NULL,
state varchar(20));
NOTE: After setting this parameter, you will need to signal the server to reload
the configuration file by using the pg_ctl utility:
Partition 1:
create table child1 (check (id between 1 and 100)) inherits (master_table);
Partition 2:
create table child2 (check (id between 101 and 200)) inherits (master_table);
• TG_WHEN
− Data type text; a string of either BEFORE or AFTER depending on the trigger's definition. "
• TG_LEVEL
− Data type text; a string of either ROW or STATEMENT depending on the trigger's
definition. "
• TG_OP
− Data type text; a string of INSERT, UPDATE, or DELETE telling for which operation the
trigger was fired. "
• TG_RELNAME
− Data type name; the name of the table that caused the trigger invocation. "
• TG_NARGS
− Data type integer; the number of arguments given to the trigger procedure in the CREATE
TRIGGER statement. "
• ROW LEVEL :- will get fire on each row affected on the database, for eg if
you are executing an update query on 1000 rows the trigger will get fired
for 1000 times.
• STATEMENT LEVEL :- here it only fires only one time compared to above.
You can use pg_upgrade utility for migrating old cluster data
directories to new version.
Syntax:
pg_upgrade [OPTIONS]...
Options:
-b, --old-bindir old cluster executable directory
-B, --new-bindir new cluster executable directory
-c, --check check clusters only, don't change any data
-d, --old-datadir old cluster data directory
-D, --new-datadir new cluster data directory
-l, --logfile log session activity to file
-p, --old-port old cluster port number (default 5432)
-P, --new-port new cluster port number (default 5432)
-u, --user clusters superuser (default "postgres")
-v, --verbose enable verbose output
• CREATE TABLE
• edb=# COPY empcsv (empno, ename, job, sal, comm, hiredate)
• edb-# FROM '/tmp/emp.csv' CSV HEADER;
• COPY
-------+--------+-----------+-----+--------------------+---------+---------+--------
− 7369 | SMITH | CLERK | | 17-DEC-80 00:00:00 | 800.00 | |
− 7499 | ALLEN | SALESMAN | | 20-FEB-81 00:00:00 | 1600.00 | 300.00 |
− 7521 | WARD | SALESMAN | | 22-FEB-81 00:00:00 | 1250.00 | 500.00 |
− 7566 | JONES | MANAGER | | 02-APR-81 00:00:00 | 2975.00 | |
− 7654 | MARTIN | SALESMAN | | 28-SEP-81 00:00:00 | 1250.00 | 1400.00 |
− 7698 | BLAKE | MANAGER | | 01-MAY-81 00:00:00 | 2850.00 | |
− 7782 | CLARK | MANAGER | | 09-JUN-81 00:00:00 | 2450.00 | |
Options:
-b, --old-bindir old cluster executable directory
-B, --new-bindir new cluster executable directory
-c, --check check clusters only, don't change any data
-d, --old-datadir old cluster data directory
-D, --new-datadir new cluster data directory
-l, --logfile log session activity to file
-p, --old-port old cluster port number (default 5432)
-P, --new-port new cluster port number (default 5432)
-u, --user clusters superuser (default "postgres")
-v, --verbose enable verbose output