and SQL).
<para>
This chapter will on occasion refer to examples found in <xref
linkend="tutorial-sql"> to change or improve them, so it will be
- of advantage if you have read that chapter. Some examples from
+ good if you have read that chapter. Some examples from
this chapter can also be found in
<filename>advanced.sql</filename> in the tutorial directory. This
- file also contains some example data to load, which is not
+ file also contains some sample data to load, which is not
repeated here. (Refer to <xref linkend="tutorial-sql-intro"> for
how to use the file.)
</para>
</para>
<para>
- The details of these commands are not important here; the important
+ The details of these commands are not important; the important
point is that there are several separate updates involved to accomplish
this rather simple operation. Our bank's officers will want to be
assured that either all these updates happen, or none of them happen.
<para>
This example is, of course, oversimplified, but there's a lot of control
- to be had over a transaction block through the use of savepoints.
+ possible in a transaction block through the use of savepoints.
Moreover, <command>ROLLBACK TO</> is the only way to regain control of a
transaction block that was put in aborted state by the
system due to an error, short of rolling it back completely and starting
One application of the rewrite system is in the realization of
<firstterm>views</firstterm>.
Whenever a query against a view
- (i.e. a <firstterm>virtual table</firstterm>) is made,
+ (i.e., a <firstterm>virtual table</firstterm>) is made,
the rewrite system rewrites the user's query to
a query that accesses the <firstterm>base tables</firstterm> given in
the <firstterm>view definition</firstterm> instead.
<para>
Once a connection is established the client process can send a query
to the <firstterm>backend</firstterm> (server). The query is transmitted using plain text,
- i.e. there is no parsing done in the <firstterm>frontend</firstterm> (client). The
+ i.e., there is no parsing done in the <firstterm>frontend</firstterm> (client). The
server parses the query, creates an <firstterm>execution plan</firstterm>,
executes the plan and returns the retrieved rows to the client
by transmitting them over the established connection.
relations, a near-exhaustive search is conducted to find the best
join sequence. The planner preferentially considers joins between any
two relations for which there exist a corresponding join clause in the
- <literal>WHERE</literal> qualification (i.e. for
+ <literal>WHERE</literal> qualification (i.e., for
which a restriction like <literal>where rel1.attr1=rel2.attr2</literal>
exists). Join pairs with no join clause are considered only when there
is no other choice, that is, a particular relation has no available
);
</programlisting>
- However, the current implementation does not enforce the array size
- limits — the behavior is the same as for arrays of unspecified
+ However, the current implementation ignores any supplied array size
+ limits, i.e., the behavior is the same as for arrays of unspecified
length.
</para>
<para>
- Actually, the current implementation does not enforce the declared
+ In addition, the current implementation does not enforce the declared
number of dimensions either. Arrays of a particular element type are
all considered to be of the same type, regardless of size or number
- of dimensions. So, declaring number of dimensions or sizes in
+ of dimensions. So, declaring the number of dimensions or sizes in
<command>CREATE TABLE</command> is simply documentation, it does not
affect run-time behavior.
</para>
<para>
- An alternative syntax, which conforms to the SQL standard, can
- be used for one-dimensional arrays.
+ An alternative syntax, which conforms to the SQL standard by using
+ they keyword <literal>ARRAY</>, can
+ be used for one-dimensional arrays;
<structfield>pay_by_quarter</structfield> could have been defined
as:
<programlisting>
where <replaceable>delim</replaceable> is the delimiter character
for the type, as recorded in its <literal>pg_type</literal> entry.
Among the standard data types provided in the
- <productname>PostgreSQL</productname> distribution, type
- <literal>box</> uses a semicolon (<literal>;</>) but all the others
- use comma (<literal>,</>). Each <replaceable>val</replaceable> is
+ <productname>PostgreSQL</productname> distribution, all use a comma
+ (<literal>,</>), except for the type <literal>box</> which uses a semicolon
+ (<literal>;</>). Each <replaceable>val</replaceable> is
either a constant of the array element type, or a subarray. An example
of an array constant is:
<programlisting>
</para>
<para>
- To set an element of an array constant to NULL, write <literal>NULL</>
+ To set an element of an array to NULL, write <literal>NULL</>
for the element value. (Any upper- or lower-case variant of
<literal>NULL</> will do.) If you want an actual string value
<quote>NULL</>, you must put double quotes around it.
</programlisting>
</para>
+ <para>
+ Multidimensional arrays must have matching extents for each
+ dimension. A mismatch causes an error, for example:
+
+<programlisting>
+INSERT INTO sal_emp
+ VALUES ('Bill',
+ '{10000, 10000, 10000, 10000}',
+ '{{"meeting", "lunch"}, {"meeting"}}');
+ERROR: multidimensional arrays must have array expressions with matching dimensions
+</programlisting>
+ </para>
+
<para>
The <literal>ARRAY</> constructor syntax can also be used:
<programlisting>
constructor syntax is discussed in more detail in
<xref linkend="sql-syntax-array-constructors">.
</para>
-
- <para>
- Multidimensional arrays must have matching extents for each
- dimension. A mismatch causes an error report, for example:
-
-<programlisting>
-INSERT INTO sal_emp
- VALUES ('Bill',
- '{10000, 10000, 10000, 10000}',
- '{{"meeting", "lunch"}, {"meeting"}}');
-ERROR: multidimensional arrays must have array expressions with matching dimensions
-</programlisting>
- </para>
</sect2>
<sect2 id="arrays-accessing">
<para>
Now, we can run some queries on the table.
- First, we show how to access a single element of an array at a time.
+ First, we show how to access a single element of an array.
This query retrieves the names of the employees whose pay changed in
the second quarter:
</programlisting>
The array subscript numbers are written within square brackets.
- By default <productname>PostgreSQL</productname> uses the
+ By default <productname>PostgreSQL</productname> uses a
one-based numbering convention for arrays, that is,
an array of <replaceable>n</> elements starts with <literal>array[1]</literal> and
ends with <literal>array[<replaceable>n</>]</literal>.
(1 row)
</programlisting>
- If any dimension is written as a slice, i.e. contains a colon, then all
+ If any dimension is written as a slice, i.e., contains a colon, then all
dimensions are treated as slices. Any dimension that has only a single
number (no colon) is treated as being from <literal>1</>
to the number specified. For example, <literal>[2]</> is treated as
<para>
An array slice expression likewise yields null if the array itself or
- any of the subscript expressions are null. However, in other corner
+ any of the subscript expressions are null. However, in other
cases such as selecting an array slice that
is completely outside the current array bounds, a slice expression
yields an empty (zero-dimensional) array instead of null. (This
does not match non-slice behavior and is done for historical reasons.)
If the requested slice partially overlaps the array bounds, then it
- is silently reduced to just the overlapping region.
+ is silently reduced to just the overlapping region instead of
+ returning null.
</para>
<para>
</programlisting>
<function>array_dims</function> produces a <type>text</type> result,
- which is convenient for people to read but perhaps not so convenient
+ which is convenient for people to read but perhaps inconvenient
for programs. Dimensions can also be retrieved with
<function>array_upper</function> and <function>array_lower</function>,
which return the upper and lower bound of a
</para>
<para>
- A stored array value can be enlarged by assigning to element(s) not already
+ A stored array value can be enlarged by assigning to elements not already
present. Any positions between those previously present and the newly
- assigned element(s) will be filled with nulls. For example, if array
+ assigned elements will be filled with nulls. For example, if array
<literal>myarray</> currently has 4 elements, it will have six
- elements after an update that assigns to <literal>myarray[6]</>,
- and <literal>myarray[5]</> will contain a null.
+ elements after an update that assigns to <literal>myarray[6]</>;
+ <literal>myarray[5]</> will contain null.
Currently, enlargement in this fashion is only allowed for one-dimensional
arrays, not multidimensional arrays.
</para>
<para>
Subscripted assignment allows creation of arrays that do not use one-based
subscripts. For example one might assign to <literal>myarray[-2:7]</> to
- create an array with subscript values running from -2 to 7.
+ create an array with subscript values from -2 to 7.
</para>
<para>
- New array values can also be constructed by using the concatenation operator,
+ New array values can also be constructed using the concatenation operator,
<literal>||</literal>:
<programlisting>
SELECT ARRAY[1,2] || ARRAY[3,4];
</para>
<para>
- The concatenation operator allows a single element to be pushed on to the
+ The concatenation operator allows a single element to be pushed to the
beginning or end of a one-dimensional array. It also accepts two
<replaceable>N</>-dimensional arrays, or an <replaceable>N</>-dimensional
and an <replaceable>N+1</>-dimensional array.
</para>
<para>
- When a single element is pushed on to either the beginning or end of a
+ When a single element is pushed to either the beginning or end of a
one-dimensional array, the result is an array with the same lower bound
subscript as the array operand. For example:
<programlisting>
</para>
<para>
- When an <replaceable>N</>-dimensional array is pushed on to the beginning
+ When an <replaceable>N</>-dimensional array is pushed to the beginning
or end of an <replaceable>N+1</>-dimensional array, the result is
analogous to the element-array case above. Each <replaceable>N</>-dimensional
sub-array is essentially an element of the <replaceable>N+1</>-dimensional
arrays, but <function>array_cat</function> supports multidimensional arrays.
Note that the concatenation operator discussed above is preferred over
- direct use of these functions. In fact, the functions exist primarily for use
+ direct use of these functions. In fact, these functions primarily exist for use
in implementing the concatenation operator. However, they might be directly
useful in the creation of user-defined aggregates. Some examples:
</indexterm>
<para>
- To search for a value in an array, you must check each value of the
- array. This can be done by hand, if you know the size of the array.
+ To search for a value in an array, each value must be checked.
+ This can be done manually, if you know the size of the array.
For example:
<programlisting>
</programlisting>
However, this quickly becomes tedious for large arrays, and is not
- helpful if the size of the array is uncertain. An alternative method is
+ helpful if the size of the array is unknown. An alternative method is
described in <xref linkend="functions-comparisons">. The above
query could be replaced by:
SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
</programlisting>
- In addition, you could find rows where the array had all values
+ In addition, you can find rows where the array has all values
equal to 10000 with:
<programlisting>
can be a sign of database misdesign. Consider
using a separate table with a row for each item that would be an
array element. This will be easier to search, and is likely to
- scale up better to large numbers of elements.
+ scale better for a large number of elements.
</para>
</tip>
</sect2>
The delimiter character is usually a comma (<literal>,</>) but can be
something else: it is determined by the <literal>typdelim</> setting
for the array's element type. (Among the standard data types provided
- in the <productname>PostgreSQL</productname> distribution, type
- <literal>box</> uses a semicolon (<literal>;</>) but all the others
- use comma.) In a multidimensional array, each dimension (row, plane,
+ in the <productname>PostgreSQL</productname> distribution, all
+ use a comma, except for <literal>box</>, which uses a semicolon (<literal>;</>).)
+ In a multidimensional array, each dimension (row, plane,
cube, etc.) gets its own level of curly braces, and delimiters
must be written between adjacent curly-braced entities of the same level.
</para>
<literal>NULL</>. Double quotes and backslashes
embedded in element values will be backslash-escaped. For numeric
data types it is safe to assume that double quotes will never appear, but
- for textual data types one should be prepared to cope with either presence
+ for textual data types one should be prepared to cope with either the presence
or absence of quotes.
</para>
or backslashes disables this and allows the literal string value
<quote>NULL</> to be entered. Also, for backwards compatibility with
pre-8.2 versions of <productname>PostgreSQL</>, the <xref
- linkend="guc-array-nulls"> configuration parameter might be turned
+ linkend="guc-array-nulls"> configuration parameter can be turned
<literal>off</> to suppress recognition of <literal>NULL</> as a NULL.
</para>
<para>
- As shown previously, when writing an array value you can write double
+ As shown previously, when writing an array value you can use double
quotes around any individual array element. You <emphasis>must</> do so
if the element value would otherwise confuse the array-value parser.
- For example, elements containing curly braces, commas (or whatever the
- delimiter character is), double quotes, backslashes, or leading or trailing
+ For example, elements containing curly braces, commas (or the matching
+ delimiter character), double quotes, backslashes, or leading or trailing
whitespace must be double-quoted. Empty strings and strings matching the
word <literal>NULL</> must be quoted, too. To put a double quote or
backslash in a quoted array element value, use escape string syntax
- and precede it with a backslash. Alternatively, you can use
+ and precede it with a backslash. Alternatively, you can avoid quotes and use
backslash-escaping to protect all data characters that would otherwise
be taken as array syntax.
</para>
<para>
- You can write whitespace before a left brace or after a right
- brace. You can also write whitespace before or after any individual item
+ You can use whitespace before a left brace or after a right
+ brace. You can also add whitespace before or after any individual item
string. In all of these cases the whitespace will be ignored. However,
whitespace within double-quoted elements, or surrounded on both sides by
non-whitespace characters of an element, is not ignored.
</para>
<para>
- It should be noted that the log shipping is asynchronous, i.e. the WAL
+ It should be noted that the log shipping is asynchronous, i.e., the WAL
records are shipped after transaction commit. As a result there is a
window for data loss should the primary server suffer a catastrophic
failure: transactions not yet shipped will be lost. The length of the
function, which some operating systems lack. If the function is not
present then setting this parameter to anything but zero will result
in an error. On some operating systems the function is present but
- does not actually do anything (e.g. Solaris).
+ does not actually do anything (e.g., Solaris).
</para>
</listitem>
</varlistentry>
If a dynamically loadable module needs to be opened and the
file name specified in the <command>CREATE FUNCTION</command> or
<command>LOAD</command> command
- does not have a directory component (i.e. the
+ does not have a directory component (i.e., the
name does not contain a slash), the system will search this
path for the required file.
</para>
The shared lock table is created to track locks on
<varname>max_locks_per_transaction</varname> * (<xref
linkend="guc-max-connections"> + <xref
- linkend="guc-max-prepared-transactions">) objects (e.g. tables);
+ linkend="guc-max-prepared-transactions">) objects (e.g., tables);
hence, no more than this many distinct objects can be locked at
any one time. This parameter controls the average number of object
locks allocated for each transaction; individual transactions
<para>
When building from the source distribution, these modules are not built
- automatically. You can build and install all of them by running
+ automatically. You can build and install all of them by running:
<screen>
<userinput>gmake</userinput>
<userinput>gmake install</userinput>
or to build and install
just one selected module, do the same in that module's subdirectory.
Many of the modules have regression tests, which can be executed by
- running
+ running:
<screen>
<userinput>gmake installcheck</userinput>
</screen>
<quote>Aliases</quote> column are the names used internally by
<productname>PostgreSQL</productname> for historical reasons. In
addition, some internally used or deprecated types are available,
- but they are not listed here.
+ but are not listed here.
</para>
<table id="datatype-table">
<row>
<entry><type>box</type></entry>
<entry></entry>
- <entry>rectangular box in the plane</entry>
+ <entry>rectangular box on a plane</entry>
</row>
<row>
<row>
<entry><type>circle</type></entry>
<entry></entry>
- <entry>circle in the plane</entry>
+ <entry>circle on a plane</entry>
</row>
<row>
<row>
<entry><type>double precision</type></entry>
<entry><type>float8</type></entry>
- <entry>double precision floating-point number</entry>
+ <entry>double precision floating-point number (8 bytes)</entry>
</row>
<row>
<row>
<entry><type>line</type></entry>
<entry></entry>
- <entry>infinite line in the plane</entry>
+ <entry>infinite line on a plane</entry>
</row>
<row>
<entry><type>lseg</type></entry>
<entry></entry>
- <entry>line segment in the plane</entry>
+ <entry>line segment on a plane</entry>
</row>
<row>
<entry><type>macaddr</type></entry>
<entry></entry>
- <entry>MAC address</entry>
+ <entry>MAC (Media Access Control) address</entry>
</row>
<row>
<row>
<entry><type>path</type></entry>
<entry></entry>
- <entry>geometric path in the plane</entry>
+ <entry>geometric path on a plane</entry>
</row>
<row>
<entry><type>point</type></entry>
<entry></entry>
- <entry>geometric point in the plane</entry>
+ <entry>geometric point on a plane</entry>
</row>
<row>
<entry><type>polygon</type></entry>
<entry></entry>
- <entry>closed geometric path in the plane</entry>
+ <entry>closed geometric path on a plane</entry>
</row>
<row>
<entry><type>real</type></entry>
<entry><type>float4</type></entry>
- <entry>single precision floating-point number</entry>
+ <entry>single precision floating-point number (4 bytes)</entry>
</row>
<row>
<row>
<entry><type>time [ (<replaceable>p</replaceable>) ] [ without time zone ]</type></entry>
<entry></entry>
- <entry>time of day</entry>
+ <entry>time of day (no time zone)</entry>
</row>
<row>
<row>
<entry><type>timestamp [ (<replaceable>p</replaceable>) ] [ without time zone ]</type></entry>
<entry></entry>
- <entry>date and time</entry>
+ <entry>date and time (no time zone)</entry>
</row>
<row>
and output functions. Many of the built-in types have
obvious external formats. However, several types are either unique
to <productname>PostgreSQL</productname>, such as geometric
- paths, or have several possibilities for formats, such as the date
+ paths, or have several possible formats, such as the date
and time types.
- Some of the input and output functions are not invertible. That is,
+ Some of the input and output functions are not invertible, i.e.
the result of an output function might lose accuracy when compared to
the original input.
</para>
<row>
<entry><type>integer</></entry>
<entry>4 bytes</entry>
- <entry>usual choice for integer</entry>
+ <entry>typical choice for integer</entry>
<entry>-2147483648 to +2147483647</entry>
</row>
<row>
</para>
<para>
- The type <type>integer</type> is the usual choice, as it offers
+ The type <type>integer</type> is the common choice, as it offers
the best balance between range, storage size, and performance.
The <type>smallint</type> type is generally only used if disk
space is at a premium. The <type>bigint</type> type should only
- be used if the <type>integer</type> range is not sufficient,
+ be used if the <type>integer</type> range is insufficient,
because the latter is definitely faster.
</para>
<para>
- The <type>bigint</type> type might not function correctly on all
- platforms, since it relies on compiler support for eight-byte
- integers. On a machine without such support, <type>bigint</type>
+ On very minimal operating systems the <type>bigint</type> type
+ might not function correctly because it relies on compiler support
+ for eight-byte integers. On such machines, <type>bigint</type>
acts the same as <type>integer</type> (but still takes up eight
- bytes of storage). However, we are not aware of any reasonable
- platform where this is actually the case.
+ bytes of storage). (We are not aware of any
+ platform where this is true.)
</para>
<para>
<type>integer</type> (or <type>int</type>),
<type>smallint</type>, and <type>bigint</type>. The
type names <type>int2</type>, <type>int4</type>, and
- <type>int8</type> are extensions, which are shared with various
+ <type>int8</type> are extensions, which are also used by
other <acronym>SQL</acronym> database systems.
</para>
especially recommended for storing monetary amounts and other
quantities where exactness is required. However, arithmetic on
<type>numeric</type> values is very slow compared to the integer
- types, or to the floating-point types described in the next section.
+ and floating-point types described in the next section.
</para>
<para>
- In what follows we use these terms: The
+ We use the following terms below: The
<firstterm>scale</firstterm> of a <type>numeric</type> is the
count of decimal digits in the fractional part, to the right of
the decimal point. The <firstterm>precision</firstterm> of a
type allows the special value <literal>NaN</>, meaning
<quote>not-a-number</quote>. Any operation on <literal>NaN</>
yields another <literal>NaN</>. When writing this value
- as a constant in a SQL command, you must put quotes around it,
+ as a constant in an SQL command, you must put quotes around it,
for example <literal>UPDATE table SET x = 'NaN'</>. On input,
the string <literal>NaN</> is recognized in a case-insensitive manner.
</para>
<para>
Inexact means that some values cannot be converted exactly to the
internal format and are stored as approximations, so that storing
- and printing back out a value might show slight discrepancies.
+ and retrieving a value might show slight discrepancies.
Managing these errors and how they propagate through calculations
is the subject of an entire branch of mathematics and computer
- science and will not be discussed further here, except for the
+ science and will not be discussed here, except for the
following points:
<itemizedlist>
<listitem>
<listitem>
<para>
- Comparing two floating-point values for equality might or might
- not work as expected.
+ Comparing two floating-point values for equality might not
+ always work as expected.
</para>
</listitem>
</itemizedlist>
notations <type>float</type> and
<type>float(<replaceable>p</replaceable>)</type> for specifying
inexact numeric types. Here, <replaceable>p</replaceable> specifies
- the minimum acceptable precision in binary digits.
+ the minimum acceptable precision in <emphasis>binary</> digits.
<productname>PostgreSQL</productname> accepts
<type>float(1)</type> to <type>float(24)</type> as selecting the
<type>real</type> type, while
<para>
Prior to <productname>PostgreSQL</productname> 7.4, the precision in
<type>float(<replaceable>p</replaceable>)</type> was taken to mean
- so many decimal digits. This has been corrected to match the SQL
+ so many <emphasis>decimal</> digits. This has been corrected to match the SQL
standard, which specifies that the precision is measured in binary
digits. The assumption that <type>real</type> and
<type>double precision</type> have exactly 24 and 53 bits in the
<para>
The data types <type>serial</type> and <type>bigserial</type>
are not true types, but merely
- a notational convenience for setting up unique identifier columns
+ a notational convenience for creating unique identifier columns
(similar to the <literal>AUTO_INCREMENT</literal> property
supported by some other databases). In the current
implementation, specifying:
Thus, we have created an integer column and arranged for its default
values to be assigned from a sequence generator. A <literal>NOT NULL</>
constraint is applied to ensure that a null value cannot be explicitly
- inserted, either. (In most cases you would also want to attach a
+ inserted. (In most cases you would also want to attach a
<literal>UNIQUE</> or <literal>PRIMARY KEY</> constraint to prevent
duplicate values from being inserted by accident, but this is
not automatic.) Lastly, the sequence is marked as <quote>owned by</>
<para>
Prior to <productname>PostgreSQL</productname> 7.3, <type>serial</type>
implied <literal>UNIQUE</literal>. This is no longer automatic. If
- you wish a serial column to be in a unique constraint or a
- primary key, it must now be specified, same as with
+ you wish a serial column to have a unique constraint or be a
+ primary key, it must now be specified just like
any other data type.
</para>
</note>
<para>
The type names <type>serial</type> and <type>serial4</type> are
equivalent: both create <type>integer</type> columns. The type
- names <type>bigserial</type> and <type>serial8</type> work just
+ names <type>bigserial</type> and <type>serial8</type> work
the same way, except that they create a <type>bigint</type>
column. <type>bigserial</type> should be used if you anticipate
the use of more than 2<superscript>31</> identifiers over the
<para>
The <type>money</type> type stores a currency amount with a fixed
fractional precision; see <xref
- linkend="datatype-money-table">.
+ linkend="datatype-money-table">. The fractional precision
+ is controlled by the database locale.
Input is accepted in a variety of formats, including integer and
- floating-point literals, as well as <quote>typical</quote>
+ floating-point literals, as well as typical
currency formatting, such as <literal>'$1,000.00'</literal>.
Output is generally in the latter form but depends on the locale.
Non-quoted numeric values can be converted to <type>money</type> by
</para>
<para>
- Since the output of this data type is locale-sensitive, it may not
+ Since the output of this data type is locale-sensitive, it might not
work to load <type>money</> data into a database that has a different
setting of <varname>lc_monetary</>. To avoid problems, before
- restoring a dump make sure <varname>lc_monetary</> has the same or
+ restoring a dump into a new database make sure <varname>lc_monetary</> has the same or
equivalent value as in the database that was dumped.
</para>
<type>character varying(<replaceable>n</>)</type> and
<type>character(<replaceable>n</>)</type>, where <replaceable>n</>
is a positive integer. Both of these types can store strings up to
- <replaceable>n</> characters in length. An attempt to store a
+ <replaceable>n</> characters in length (not bytes). An attempt to store a
longer string into a column of these types will result in an
error, unless the excess characters are all spaces, in which case
the string will be truncated to the maximum length. (This somewhat
<para>
The storage requirement for a short string (up to 126 bytes) is 1 byte
plus the actual string, which includes the space padding in the case of
- <type>character</type>. Longer strings have 4 bytes overhead instead
+ <type>character</type>. Longer strings have 4 bytes of overhead instead
of 1. Long strings are compressed by the system automatically, so
the physical requirement on disk might be less. Very long values are also
stored in background tables so that they do not interfere with rapid
access to shorter column values. In any case, the longest
possible character string that can be stored is about 1 GB. (The
maximum value that will be allowed for <replaceable>n</> in the data
- type declaration is less than that. It wouldn't be very useful to
+ type declaration is less than that. It wouldn't be useful to
change this because with multibyte character encodings the number of
- characters and bytes can be quite different anyway. If you desire to
+ characters and bytes can be quite different. If you desire to
store long strings with no specific upper limit, use
<type>text</type> or <type>character varying</type> without a length
specifier, rather than making up an arbitrary length limit.)
<tip>
<para>
- There are no performance differences between these three types,
- apart from increased storage size when using the blank-padded
- type, and a few extra cycles to check the length when storing into
+ There is no performance difference between these three types,
+ apart from increased storage space when using the blank-padded
+ type, and a few extra CPU cycles to check the length when storing into
a length-constrained column. While
<type>character(<replaceable>n</>)</type> has performance
- advantages in some other database systems, it has no such advantages in
+ advantages in some other database systems, there is no such advantage in
<productname>PostgreSQL</productname>. In most situations
<type>text</type> or <type>character varying</type> should be used
instead.
There are two other fixed-length character types in
<productname>PostgreSQL</productname>, shown in <xref
linkend="datatype-character-special-table">. The <type>name</type>
- type exists <emphasis>only</emphasis> for storage of identifiers
+ type exists <emphasis>only</emphasis> for the storage of identifiers
in the internal system catalogs and is not intended for use by the general user. Its
length is currently defined as 64 bytes (63 usable characters plus
terminator) but should be referenced using the constant
- <symbol>NAMEDATALEN</symbol>. The length is set at compile time (and
+ <symbol>NAMEDATALEN</symbol> in <literal>C</> source code.
+ The length is set at compile time (and
is therefore adjustable for special uses); the default maximum
length might change in a future release. The type <type>"char"</type>
(note the quotes) is different from <type>char(1)</type> in that it
only uses one byte of storage. It is internally used in the system
- catalogs as a poor-man's enumeration type.
+ catalogs as a simplistic enumeration type.
</para>
<table id="datatype-character-special-table">
<para>
A binary string is a sequence of octets (or bytes). Binary
- strings are distinguished from character strings by two
- characteristics: First, binary strings specifically allow storing
+ strings are distinguished from character strings in two
+ ways: First, binary strings specifically allow storing
octets of value zero and other <quote>non-printable</quote>
octets (usually, octets outside the range 32 to 126).
Character strings disallow zero octets, and also disallow any
values <emphasis>must</emphasis> be escaped (but all octet
values <emphasis>can</emphasis> be escaped) when used as part
of a string literal in an <acronym>SQL</acronym> statement. In
- general, to escape an octet, it is converted into the three-digit
- octal number equivalent of its decimal octet value, and preceded
+ general, to escape an octet, convert it into its three-digit
+ octal value and precede it
by two backslashes. <xref linkend="datatype-binary-sqlesc">
shows the characters that must be escaped, and gives the alternative
escape sequences where applicable.
</table>
<para>
- The requirement to escape <quote>non-printable</quote> octets actually
+ The requirement to escape <emphasis>non-printable</emphasis> octets
varies depending on locale settings. In some instances you can get away
with leaving them unescaped. Note that the result in each of the examples
in <xref linkend="datatype-binary-sqlesc"> was exactly one octet in
- length, even though the output representation of the zero octet and
- backslash are more than one character.
+ length, even though the output representation is sometimes
+ more than one character.
</para>
<para>
- The reason that you have to write so many backslashes, as shown
+ The reason multiple backslashes are required, as shown
in <xref linkend="datatype-binary-sqlesc">, is that an input
string written as a string literal must pass through two parse
phases in the <productname>PostgreSQL</productname> server.
</para>
<para>
- <type>Bytea</type> octets are also escaped in the output. In general, each
+ <type>Bytea</type> octets are sometimes escaped when output. In general, each
<quote>non-printable</quote> octet is converted into
its equivalent three-digit octal value and preceded by one backslash.
Most <quote>printable</quote> octets are represented by their standard
representation in the client character set. The octet with decimal
- value 92 (backslash) has a special alternative output representation.
+ value 92 (backslash) is doubled in the output.
Details are in <xref linkend="datatype-binary-resesc">.
</para>
<row>
<entry><type>timestamp [ (<replaceable>p</replaceable>) ] [ without time zone ]</type></entry>
<entry>8 bytes</entry>
- <entry>both date and time</entry>
+ <entry>both date and time (no time zone)</entry>
<entry>4713 BC</entry>
<entry>294276 AD</entry>
<entry>1 microsecond / 14 digits</entry>
<row>
<entry><type>date</type></entry>
<entry>4 bytes</entry>
- <entry>dates only</entry>
+ <entry>date (no time of day)</entry>
<entry>4713 BC</entry>
<entry>5874897 AD</entry>
<entry>1 day</entry>
<row>
<entry><type>time [ (<replaceable>p</replaceable>) ] [ without time zone ]</type></entry>
<entry>8 bytes</entry>
- <entry>times of day only</entry>
+ <entry>time of day (no date)</entry>
<entry>00:00:00</entry>
<entry>24:00:00</entry>
<entry>1 microsecond / 14 digits</entry>
<row>
<entry><type>interval [ <replaceable>fields</replaceable> ] [ (<replaceable>p</replaceable>) ]</type></entry>
<entry>12 bytes</entry>
- <entry>time intervals</entry>
+ <entry>time interval</entry>
<entry>-178000000 years</entry>
<entry>178000000 years</entry>
<entry>1 microsecond / 14 digits</entry>
<para>
The types <type>abstime</type>
and <type>reltime</type> are lower precision types which are used internally.
- You are discouraged from using these types in new
- applications and are encouraged to move any old
- ones over when appropriate. Any or all of these internal types
+ You are discouraged from using these types in
+ applications; these internal types
might disappear in a future release.
</para>
Date and time input is accepted in almost any reasonable format, including
ISO 8601, <acronym>SQL</acronym>-compatible,
traditional <productname>POSTGRES</productname>, and others.
- For some formats, ordering of month, day, and year in date input is
+ For some formats, ordering of day, month, and year in date input is
ambiguous and there is support for specifying the expected
ordering of these fields. Set the <xref linkend="guc-datestyle"> parameter
to <literal>MDY</> to select month-day-year interpretation,
<synopsis>
<replaceable>type</replaceable> [ (<replaceable>p</replaceable>) ] '<replaceable>value</replaceable>'
</synopsis>
- where <replaceable>p</replaceable> in the optional precision
- specification is an integer corresponding to the number of
+ where <replaceable>p</replaceable> is an optional precision corresponding to the number of
fractional digits in the seconds field. Precision can be
specified for <type>time</type>, <type>timestamp</type>, and
<type>interval</type> types. The allowed values are mentioned
</row>
</thead>
<tbody>
- <row>
- <entry>January 8, 1999</entry>
- <entry>unambiguous in any <varname>datestyle</varname> input mode</entry>
- </row>
<row>
<entry>1999-01-08</entry>
<entry>ISO 8601; January 8 in any mode
(recommended format)</entry>
</row>
+ <row>
+ <entry>January 8, 1999</entry>
+ <entry>unambiguous in any <varname>datestyle</varname> input mode</entry>
+ </row>
<row>
<entry>1/8/1999</entry>
<entry>January 8 in <literal>MDY</> mode;
</row>
<row>
<entry>January 8, 99 BC</entry>
- <entry>year 99 before the Common Era</entry>
+ <entry>year 99 BC</entry>
</row>
</tbody>
</tgroup>
The time-of-day types are <type>time [
(<replaceable>p</replaceable>) ] without time zone</type> and
<type>time [ (<replaceable>p</replaceable>) ] with time
- zone</type>. Writing just <type>time</type> is equivalent to
+ zone</type>; <type>time</type> is equivalent to
<type>time without time zone</type>.
</para>
</row>
<row>
<entry><literal>04:05 AM</literal></entry>
- <entry>same as 04:05; AM does not affect value</entry>
+ <entry>same as 04:05 (AM ignored)</entry>
</row>
<row>
<entry><literal>04:05 PM</literal></entry>
</indexterm>
<para>
- Valid input for the time stamp types consists of a concatenation
+ Valid input for the time stamp types consists of the concatenation
of a date and a time, followed by an optional time zone,
followed by an optional <literal>AD</literal> or <literal>BC</literal>.
(Alternatively, <literal>AD</literal>/<literal>BC</literal> can appear
</programlisting>
are valid values, which follow the <acronym>ISO</acronym> 8601
- standard. In addition, the wide-spread format:
+ standard. In addition, the common format:
<programlisting>
January 8 04:05:06 1999 PST
</programlisting>
<para>
The <acronym>SQL</acronym> standard differentiates <type>timestamp without time zone</type>
and <type>timestamp with time zone</type> literals by the presence of a
- <quote>+</quote> or <quote>-</quote>. Hence, according to the standard,
+ <quote>+</quote> or <quote>-</quote> symbol after the time
+ indicating the time zone offset. Hence, according to the standard:
+
<programlisting>TIMESTAMP '2004-10-19 10:23:54'</programlisting>
- is a <type>timestamp without time zone</type>, while
+
+ is a <type>timestamp without time zone</type>, while:
+
<programlisting>TIMESTAMP '2004-10-19 10:23:54+02'</programlisting>
+
is a <type>timestamp with time zone</type>.
<productname>PostgreSQL</productname> never examines the content of a
literal string before determining its type, and therefore will treat
both of the above as <type>timestamp without time zone</type>. To
ensure that a literal is treated as <type>timestamp with time
zone</type>, give it the correct explicit type:
+
<programlisting>TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'</programlisting>
- In a literal that has been decided to be <type>timestamp without time
+
+ In a literal that has been determined to be <type>timestamp without time
zone</type>, <productname>PostgreSQL</productname> will silently ignore
any time zone indication.
That is, the resulting value is derived from the date/time
Conversions between <type>timestamp without time zone</type> and
<type>timestamp with time zone</type> normally assume that the
<type>timestamp without time zone</type> value should be taken or given
- as <varname>timezone</> local time. A different zone reference can
+ as <varname>timezone</> local time. A different time zone can
be specified for the conversion using <literal>AT TIME ZONE</>.
</para>
</sect3>
linkend="datatype-datetime-special-table">. The values
<literal>infinity</literal> and <literal>-infinity</literal>
are specially represented inside the system and will be displayed
- the same way; but the others are simply notational shorthands
+ unchanged; but the others are simply notational shorthands
that will be converted to ordinary date/time values when read.
(In particular, <literal>now</> and related strings are converted
to a specific time value as soon as they are read.)
- All of these values need to be written in single quotes when used
+ All of these values need to be enclosed in single quotes when used
as constants in SQL commands.
</para>
<literal>CURRENT_TIMESTAMP</literal>, <literal>LOCALTIME</literal>,
<literal>LOCALTIMESTAMP</literal>. The latter four accept an
optional subsecond precision specification. (See <xref
- linkend="functions-datetime-current">.) Note however that these are
- SQL functions and are <emphasis>not</> recognized as data input strings.
+ linkend="functions-datetime-current">.) Note that these are
+ SQL functions and are <emphasis>not</> recognized in data input strings.
</para>
</sect3>
</indexterm>
<para>
- The output format of the date/time types can be set to one of the four
- styles ISO 8601,
- <acronym>SQL</acronym> (Ingres), traditional POSTGRES, and
- German, using the command <literal>SET datestyle</literal>. The default
+ The output format of the date/time types can one of the four
+ styles: ISO 8601,
+ <acronym>SQL</acronym> (Ingres), traditional <productname>POSTGRES</>
+ (Unix <application>date</> format), and
+ German. It can be set using the <literal>SET datestyle</literal> command. The default
is the <acronym>ISO</acronym> format. (The
<acronym>SQL</acronym> standard requires the use of the ISO 8601
- format. The name of the <quote>SQL</quote> output format is a
- historical accident.) <xref
+ format. The name of the <literal>SQL</> output format poorly
+ chosen and an historical accident.) <xref
linkend="datatype-datetime-output-table"> shows examples of each
output style. The output of the <type>date</type> and
<type>time</type> types is of course only the date or time part
<listitem>
<para>
Although the <type>date</type> type
- does not have an associated time zone, the
+ cannot have an associated time zone, the
<type>time</type> type can.
Time zones in the real world have little meaning unless
associated with a date as well as a time,
<listitem>
<para>
The default time zone is specified as a constant numeric offset
- from <acronym>UTC</>. It is therefore not possible to adapt to
+ from <acronym>UTC</>. It is therefore impossible to adapt to
daylight-saving time when doing date/time arithmetic across
<acronym>DST</acronym> boundaries.
</para>
<para>
To address these difficulties, we recommend using date/time types
that contain both date and time when using time zones. We
- recommend <emphasis>not</emphasis> using the type <type>time with
+ do <emphasis>not</> recommend using the type <type>time with
time zone</type> (though it is supported by
<productname>PostgreSQL</productname> for legacy applications and
for compliance with the <acronym>SQL</acronym> standard).
<para>
A time zone abbreviation, for example <literal>PST</>. Such a
specification merely defines a particular offset from UTC, in
- contrast to full time zone names which might imply a set of daylight
+ contrast to full time zone names which can imply a set of daylight
savings transition-date rules as well. The recognized abbreviations
are listed in the <literal>pg_timezone_abbrevs</> view (see <xref
linkend="view-pg-timezone-abbrevs">). You cannot set the
configuration parameters <xref linkend="guc-timezone"> or
- <xref linkend="guc-log-timezone"> using a time
+ <xref linkend="guc-log-timezone"> to a time
zone abbreviation, but you can use abbreviations in
date/time input values and with the <literal>AT TIME ZONE</>
operator.
optional daylight-savings zone abbreviation, assumed to stand for one
hour ahead of the given offset. For example, if <literal>EST5EDT</>
were not already a recognized zone name, it would be accepted and would
- be functionally equivalent to USA East Coast time. When a
+ be functionally equivalent to United States East Coast time. When a
daylight-savings zone name is present, it is assumed to be used
according to the same daylight-savings transition rules used in the
<literal>zoneinfo</> time zone database's <filename>posixrules</> entry.
</listitem>
</itemizedlist>
- There is a conceptual and practical difference between the abbreviations
- and the full names: abbreviations always represent a fixed offset from
+ In summary, there is a difference between abbreviations
+ and full names: abbreviations always represent a fixed offset from
UTC, whereas most of the full names imply a local daylight-savings time
- rule and so have two possible UTC offsets.
+ rule, and so have two possible UTC offsets.
</para>
<para>
<para>
In all cases, timezone names are recognized case-insensitively.
(This is a change from <productname>PostgreSQL</productname> versions
- prior to 8.2, which were case-sensitive in some contexts and not others.)
+ prior to 8.2, which were case-sensitive in some contexts but not others.)
</para>
<para>
<listitem>
<para>
If <varname>timezone</> is not specified in
- <filename>postgresql.conf</> nor as a server command-line option,
+ <filename>postgresql.conf</> or as a server command-line option,
the server attempts to use the value of the <envar>TZ</envar>
environment variable as the default time zone. If <envar>TZ</envar>
is not defined or is not any of the time zone names known to
default time zone is selected as the closest match among
<productname>PostgreSQL</productname>'s known time zones.
(These rules are also used to choose the default value of
- <xref linkend="guc-log-timezone">, if it is not specified.)
+ <xref linkend="guc-log-timezone">, if not specified.)
</para>
</listitem>
<listitem>
<para>
- The <envar>PGTZ</envar> environment variable, if set at the
- client, is used by <application>libpq</application>
- applications to send a <command>SET TIME ZONE</command>
+ The <envar>PGTZ</envar> environment variable is used by
+ <application>libpq</application> clients
+ to send a <command>SET TIME ZONE</command>
command to the server upon connection.
</para>
</listitem>
</indexterm>
<para>
- <type>interval</type> values can be written with the following
+ <type>interval</type> values can be written using the following:
verbose syntax:
<synopsis>
or abbreviations or plurals of these units;
<replaceable>direction</> can be <literal>ago</literal> or
empty. The at sign (<literal>@</>) is optional noise. The amounts
- of different units are implicitly added up with appropriate
+ of the different units are implicitly added with appropriate
sign accounting. <literal>ago</literal> negates all the fields.
This syntax is also used for interval output, if
<xref linkend="guc-intervalstyle"> is set to
<para>
<productname>PostgreSQL</productname> uses Julian dates
- for all date/time calculations. They have the nice property of correctly
- predicting/calculating any date more recent than 4713 BC
+ for all date/time calculations. This has the useful property of correctly
+ calculating dates from 4713 BC
to far into the future, using the assumption that the length of the
year is 365.2425 days.
</para>
<member><literal>'off'</literal></member>
<member><literal>'0'</literal></member>
</simplelist>
- Leading and trailing whitespace is ignored. Using the key words
- <literal>TRUE</literal> and <literal>FALSE</literal> is preferred
- (and <acronym>SQL</acronym>-compliant).
+ Leading and trailing whitespace and case are ignored. The key words
+ <literal>TRUE</literal> and <literal>FALSE</literal> is the preferred
+ usage (and <acronym>SQL</acronym>-compliant).
</para>
<example id="datatype-boolean-example">
<para>
Enumerated (enum) types are data types that
- are comprised of a static, predefined set of values with a
- specific order. They are equivalent to the <type>enum</type>
- types in a number of programming languages. An example of an enum
+ comprise a static, ordered set of values.
+ They are equivalent to the <type>enum</type>
+ types supported in a number of programming languages. An example of an enum
type might be the days of the week, or a set of status values for
a piece of data.
</para>
<para>
The ordering of the values in an enum type is the
- order in which the values were listed when the type was declared.
+ order in which the values were listed when the type was created.
All standard comparison operators and related
aggregate functions are supported for enums. For example:
</para>
Moe | happy
(2 rows)
-SELECT name FROM person
- WHERE current_mood = (SELECT MIN(current_mood) FROM person);
+SELECT name
+FROM person
+WHERE current_mood = (SELECT MIN(current_mood) FROM person);
name
-------
Larry
<title>Type Safety</title>
<para>
- Enumerated types are completely separate data types and may not
- be compared with each other.
+ Each enumerated data type is separate and cannot
+ be compared with other enumerated types.
</para>
<example>
<title>Lack of Casting</title>
<programlisting>
CREATE TYPE happiness AS ENUM ('happy', 'very happy', 'ecstatic');
-CREATE TABLE holidays (
- num_weeks int,
+CREATE TABLE holidays (
+ num_weeks integer,
happiness happiness
);
INSERT INTO holidays(num_weeks,happiness) VALUES (4, 'happy');
<para>
Enum labels are case sensitive, so
<type>'happy'</type> is not the same as <type>'HAPPY'</type>.
- Spaces in the labels are significant, too.
+ White space in the labels is significant too.
</para>
<para>
<row>
<entry><type>point</type></entry>
<entry>16 bytes</entry>
- <entry>Point on the plane</entry>
+ <entry>Point on a plane</entry>
<entry>(x,y)</entry>
</row>
<row>
<entry><type>circle</type></entry>
<entry>24 bytes</entry>
<entry>Circle</entry>
- <entry><(x,y),r> (center and radius)</entry>
+ <entry><(x,y),r> (center point and radius)</entry>
</row>
</tbody>
</tgroup>
</synopsis>
where <replaceable>x</> and <replaceable>y</> are the respective
- coordinates as floating-point numbers.
+ coordinates, as floating-point numbers.
</para>
</sect2>
</para>
<para>
- Boxes are output using the first syntax.
- The corners are reordered on input to store
- the upper right corner, then the lower left corner.
- Other corners of the box can be entered, but the lower
- left and upper right corners are determined from the input and stored.
+ Boxes are output using the first syntax. Any two opposite corners
+ can be supplied; the corners are reordered on input to store the
+ upper right and lower left corners.
</para>
</sect2>
<para>
Paths are represented by lists of connected points. Paths can be
<firstterm>open</firstterm>, where
- the first and last points in the list are not considered connected, or
+ the first and last points in the list are considered not connected, or
<firstterm>closed</firstterm>,
where the first and last points are considered connected.
</para>
</para>
<para>
- Paths are output using the first syntax.
+ Paths are output using the first appropriate syntax.
</para>
</sect2>
<para>
Polygons are represented by lists of points (the vertexes of the
- polygon). Polygons should probably be
- considered equivalent to closed paths, but are stored differently
+ polygon). Polygons are very similar to closed paths, but are
+ stored differently
and have their own set of support routines.
</para>
</indexterm>
<para>
- Circles are represented by a center point and a radius.
+ Circles are represented by a center point and radius.
Values of type <type>circle</type> are specified using the following syntax:
<synopsis>
where
<literal>(<replaceable>x</replaceable>,<replaceable>y</replaceable>)</literal>
- is the center and <replaceable>r</replaceable> is the radius of the circle.
+ is the center point and <replaceable>r</replaceable> is the radius of the circle.
</para>
<para>
<para>
<productname>PostgreSQL</> offers data types to store IPv4, IPv6, and MAC
addresses, as shown in <xref linkend="datatype-net-types-table">. It
- is preferable to use these types instead of plain text types to store
- network addresses, because
- these types offer input error checking and several specialized
+ is better to use these types instead of plain text types to store
+ network addresses because
+ these types offer input error checking and specialized
operators and functions (see <xref linkend="functions-net">).
</para>
<para>
When sorting <type>inet</type> or <type>cidr</type> data types,
IPv4 addresses will always sort before IPv6 addresses, including
- IPv4 addresses encapsulated or mapped into IPv6 addresses, such as
+ IPv4 addresses encapsulated or mapped to IPv6 addresses, such as
::10.2.3.4 or ::ffff:10.4.3.2.
</para>
<para>
The <type>inet</type> type holds an IPv4 or IPv6 host address, and
- optionally the identity of the subnet it is in, all in one field.
- The subnet identity is represented by stating how many bits of
- the host address represent the network address (the
+ optionally its subnet, all in one field.
+ The subnet is represented by the number of network address bits
+ present in the host address (the
<quote>netmask</quote>). If the netmask is 32 and the address is IPv4,
then the value does not indicate a subnet, only a single host.
In IPv6, the address length is 128 bits, so 128 bits specify a
unique host address. Note that if you
- want to accept networks only, you should use the
+ want to accept only networks, you should use the
<type>cidr</type> type rather than <type>inet</type>.
</para>
<replaceable class="parameter">y</replaceable>
is the number of bits in the netmask. If the
<replaceable class="parameter">/y</replaceable>
- part is left off, then the
+ is missing, the
netmask is 32 for IPv4 and 128 for IPv6, so the value represents
just a single host. On display, the
<replaceable class="parameter">/y</replaceable>
class="parameter">y</> is the number of bits in the netmask. If
<replaceable class="parameter">y</> is omitted, it is calculated
using assumptions from the older classful network numbering system, except
- that it will be at least large enough to include all of the octets
+ it will be at least large enough to include all of the octets
written in the input. It is an error to specify a network address
that has bits set to the right of the specified netmask.
</para>
are designed to support full text search, which is the activity of
searching through a collection of natural-language <firstterm>documents</>
to locate those that best match a <firstterm>query</>.
- The <type>tsvector</type> type represents a document in a form suited
- for text search, while the <type>tsquery</type> type similarly represents
- a query.
+ The <type>tsvector</type> type represents a document stored in a form optimized
+ for text search; <type>tsquery</type> type similarly represents
+ a text query.
<xref linkend="textsearch"> provides a detailed explanation of this
facility, and <xref linkend="functions-textsearch"> summarizes the
related functions and operators.
<para>
A <type>tsvector</type> value is a sorted list of distinct
- <firstterm>lexemes</>, which are words that have been
- <firstterm>normalized</> to make different variants of the same word look
- alike (see <xref linkend="textsearch"> for details). Sorting and
+ <firstterm>lexemes</>, which are words which have been
+ <firstterm>normalized</> to merge different variants of the same word
+ (see <xref linkend="textsearch"> for details). Sorting and
duplicate-elimination are done automatically during input, as shown in
this example:
' ' 'contains' 'lexeme' 'spaces' 'the'
</programlisting>
- (We use dollar-quoted string literals in this example and the next one,
- to avoid confusing matters by having to double quote marks within the
+ (We use dollar-quoted string literals in this example and the next one
+ to avoid the confusion of having to double quote marks within the
literals.) Embedded quotes and backslashes must be doubled:
<programlisting>
'Joe''s' 'a' 'contains' 'lexeme' 'quote' 'the'
</programlisting>
- Optionally, integer <firstterm>position(s)</>
- can be attached to any or all of the lexemes:
+ Optionally, integer <firstterm>positions</>
+ can be attached to lexemes:
<programlisting>
SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::tsvector;
A position normally indicates the source word's location in the
document. Positional information can be used for
<firstterm>proximity ranking</firstterm>. Position values can
- range from 1 to 16383; larger numbers are silently clamped to 16383.
+ range from 1 to 16383; larger numbers are silently set to 16383.
Duplicate positions for the same lexeme are discarded.
</para>
<para>
It is important to understand that the
<type>tsvector</type> type itself does not perform any normalization;
- it assumes that the words it is given are normalized appropriately
+ it assumes the words it is given are normalized appropriately
for the application. For example,
<programlisting>
<para>
A <type>tsquery</type> value stores lexemes that are to be
- searched for, and combines them using the boolean operators
+ searched for, and combines them by honoring the boolean operators
<literal>&</literal> (AND), <literal>|</literal> (OR), and
<literal>!</> (NOT). Parentheses can be used to enforce grouping
of the operators:
<para>
Optionally, lexemes in a <type>tsquery</type> can be labeled with
one or more weight letters, which restricts them to match only
- <type>tsvector</> lexemes with one of those weights:
+ <type>tsvector</> lexemes with matching weights:
<programlisting>
SELECT 'fat:ab & cat'::tsquery;
</para>
<para>
- Quoting rules for lexemes are the same as described above for
+ Quoting rules for lexemes are the same as described previously for
lexemes in <type>tsvector</>; and, as with <type>tsvector</>,
- any required normalization of words must be done before putting
- them into the <type>tsquery</> type. The <function>to_tsquery</>
+ any required normalization of words must be done before converting
+ to the <type>tsquery</> type. The <function>to_tsquery</>
function is convenient for performing such normalization:
<programlisting>
<para>
The data type <type>uuid</type> stores Universally Unique Identifiers
(UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards.
- (Some systems refer to this data type as globally unique identifier, or
- GUID,<indexterm><primary>GUID</primary></indexterm> instead.) Such an
+ (Some systems refer to this data type as a globally unique identifier, or
+ GUID,<indexterm><primary>GUID</primary></indexterm> instead.) This
identifier is a 128-bit quantity that is generated by an algorithm chosen
to make it very unlikely that the same identifier will be generated by
anyone else in the known universe using the same algorithm. Therefore,
for distributed systems, these identifiers provide a better uniqueness
- guarantee than that which can be achieved using sequence generators, which
+ guarantee than sequence generators, which
are only unique within a single database.
</para>
</indexterm>
<para>
- The data type <type>xml</type> can be used to store XML data. Its
+ The <type>xml</type> data type can be used to store XML data. Its
advantage over storing XML data in a <type>text</type> field is that it
- checks the input values for well-formedness, and there are support
- functions to perform type-safe operations on it; see <xref
+ checks the input values for well-formedness, and support
+ functions can perform type-safe operations on it; see <xref
linkend="functions-xml">. Use of this data type requires the
installation to have been built with <command>configure
--with-libxml</>.
</para>
<para>
- The <type>xml</type> type does not validate its input values
- against a possibly included document type declaration
+ The <type>xml</type> type does not validate input values
+ against an optionally-supplied document type declaration
(DTD).<indexterm><primary>DTD</primary></indexterm>
</para>
<para>
- The inverse operation, producing character string type values from
+ The inverse operation, producing a character string value from
<type>xml</type>, uses the function
<function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
<synopsis>
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
</synopsis>
- <replaceable>type</replaceable> can be one of
+ <replaceable>type</replaceable> can be
<type>character</type>, <type>character varying</type>, or
<type>text</type> (or an alias name for those). Again, according
to the SQL standard, this is the only way to convert between type
</para>
<para>
- When character string values are cast to or from type
+ When a character string value is cast to or from type
<type>xml</type> without going through <type>XMLPARSE</type> or
<type>XMLSERIALIZE</type>, respectively, the choice of
<literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is
determined by the <quote>XML option</quote>
<indexterm><primary>XML option</primary></indexterm>
session configuration parameter, which can be set using the
- standard command
+ standard command:
<synopsis>
SET XML OPTION { DOCUMENT | CONTENT };
</synopsis>
end; see <xref linkend="multibyte">. This includes string
representations of XML values, such as in the above examples.
This would ordinarily mean that encoding declarations contained in
- XML data might become invalid as the character data is converted
- to other encodings while travelling between client and server,
- while the embedded encoding declaration is not changed. To cope
- with this behavior, an encoding declaration contained in a
- character string presented for input to the <type>xml</type> type
- is <emphasis>ignored</emphasis>, and the content is always assumed
+ XML data can become invalid as the character data is converted
+ to other encodings while travelling between client and server
+ because the embedded encoding declaration is not changed. To cope
+ with this behavior, encoding declarations contained in
+ character strings presented for input to the <type>xml</type> type
+ are <emphasis>ignored</emphasis>, and content is assumed
to be in the current server encoding. Consequently, for correct
- processing, such character strings of XML data must be sent off
+ processing, character strings of XML data must be sent
from the client in the current client encoding. It is the
- responsibility of the client to either convert the document to the
- current client encoding before sending it off to the server or to
+ responsibility of the client to either convert documents to the
+ current client encoding before sending them to the server or to
adjust the client encoding appropriately. On output, values of
type <type>xml</type> will not have an encoding declaration, and
- clients must assume that the data is in the current client
+ clients should assume all data is in the current client
encoding.
</para>
<para>
- When using the binary mode to pass query parameters to the server
+ When using binary mode to pass query parameters to the server
and query results back to the client, no character set conversion
is performed, so the situation is different. In this case, an
encoding declaration in the XML data will be observed, and if it
is absent, the data will be assumed to be in UTF-8 (as required by
- the XML standard; note that PostgreSQL does not support UTF-16 at
- all). On output, data will have an encoding declaration
+ the XML standard; note that PostgreSQL does not support UTF-16).
+ On output, data will have an encoding declaration
specifying the client encoding, unless the client encoding is
UTF-8, in which case it will be omitted.
</para>
<para>
Needless to say, processing XML data with PostgreSQL will be less
- error-prone and more efficient if data encoding, client encoding,
+ error-prone and more efficient if the XML data encoding, client encoding,
and server encoding are the same. Since XML data is internally
processed in UTF-8, computations will be most efficient if the
server encoding is also UTF-8.
Since there are no comparison operators for the <type>xml</type>
data type, it is not possible to create an index directly on a
column of this type. If speedy searches in XML data are desired,
- possible workarounds would be casting the expression to a
+ possible workarounds include casting the expression to a
character string type and indexing that, or indexing an XPath
- expression. The actual query would of course have to be adjusted
+ expression. Of course, the actual query would have to be adjusted
to search by the indexed expression.
</para>
<para>
- The text-search functionality in PostgreSQL could also be used to speed
- up full-document searches in XML data. The necessary
- preprocessing support is, however, not available in the PostgreSQL
- distribution in this release.
+ The text-search functionality in PostgreSQL can also be used to speed
+ up full-document searches of XML data. The necessary
+ preprocessing support is, however, not yet available in the PostgreSQL
+ distribution.
</para>
</sect2>
</sect1>
The <type>regproc</> and <type>regoper</> alias types will only
accept input names that are unique (not overloaded), so they are
of limited use; for most uses <type>regprocedure</> or
- <type>regoperator</> is more appropriate. For <type>regoperator</>,
+ <type>regoperator</> are more appropriate. For <type>regoperator</>,
unary operators are identified by writing <literal>NONE</> for the unused
operand.
</para>
<para>
- An additional property of the OID alias types is that if a
+ An additional property of the OID alias types is the creation of
+ dependencies. If a
constant of one of these types appears in a stored expression
(such as a column default expression or view), it creates a dependency
on the referenced object. For example, if a column has a default
<tbody>
<row>
<entry><type>any</></entry>
- <entry>Indicates that a function accepts any input data type whatever.</entry>
+ <entry>Indicates that a function accepts any input data type.</entry>
</row>
<row>
<para>
The <type>internal</> pseudo-type is used to declare functions
that are meant only to be called internally by the database
- system, and not by direct invocation in a <acronym>SQL</acronym>
+ system, and not by direct invocation in an <acronym>SQL</acronym>
query. If a function has at least one <type>internal</>-type
argument then it cannot be called from <acronym>SQL</acronym>. To
preserve the type safety of this restriction it is important to
</para>
<para>
- If you need to modify a table that already exists look into <xref
+ If you need to modify a table that already exists, see <xref
linkend="ddl-alter"> later in this chapter.
</para>
The default value can be an expression, which will be
evaluated whenever the default value is inserted
(<emphasis>not</emphasis> when the table is created). A common example
- is that a <type>timestamp</type> column can have a default of <literal>now()</>,
+ is for a <type>timestamp</type> column to have a default of <literal>CURRENT_TIMESTAMP</>,
so that it gets set to the time of row insertion. Another common
example is generating a <quote>serial number</> for each row.
In <productname>PostgreSQL</productname> this is typically done by
</para>
<para>
- Names can be assigned to table constraints in just the same way as
- for column constraints:
+ Names can be assigned to table constraints in the same way as
+ column constraints:
<programlisting>
CREATE TABLE products (
product_no integer,
</indexterm>
<para>
- In general, a unique constraint is violated when there are two or
- more rows in the table where the values of all of the
+ In general, a unique constraint is violated when there is more than
+ one row in the table where the values of all of the
columns included in the constraint are equal.
However, two null values are not considered equal in this
comparison. That means even in the presence of a
unique constraint it is possible to store duplicate
rows that contain a null value in at least one of the constrained
- columns. This behavior conforms to the SQL standard, but we have
- heard that other SQL databases might not follow this rule. So be
+ columns. This behavior conforms to the SQL standard, but there
+ might be other SQL databases might not follow this rule. So be
careful when developing applications that are intended to be
portable.
</para>
restrictions are separate from whether the name is a key word or
not; quoting a name will not allow you to escape these
restrictions.) You do not really need to be concerned about these
- columns, just know they exist.
+ columns; just know they exist.
</para>
<indexterm>
Command identifiers are also 32-bit quantities. This creates a hard limit
of 2<superscript>32</> (4 billion) <acronym>SQL</acronym> commands
within a single transaction. In practice this limit is not a
- problem — note that the limit is on number of
- <acronym>SQL</acronym> commands, not number of rows processed.
+ problem — note that the limit is on the number of
+ <acronym>SQL</acronym> commands, not the number of rows processed.
Also, as of <productname>PostgreSQL</productname> 8.3, only commands
that actually modify the database contents will consume a command
identifier.
<para>
When you create a table and you realize that you made a mistake, or
- the requirements of the application change, then you can drop the
+ the requirements of the application change, you can drop the
table and create it again. But this is not a convenient option if
the table is already filled with data, or if the table is
referenced by other database objects (for instance a foreign key
</para>
<para>
- You can
+ You can:
<itemizedlist spacing="compact">
<listitem>
- <para>Add columns,</para>
+ <para>Add columns</para>
</listitem>
<listitem>
- <para>Remove columns,</para>
+ <para>Remove columns</para>
</listitem>
<listitem>
- <para>Add constraints,</para>
+ <para>Add constraints</para>
</listitem>
<listitem>
- <para>Remove constraints,</para>
+ <para>Remove constraints</para>
</listitem>
<listitem>
- <para>Change default values,</para>
+ <para>Change default values</para>
</listitem>
<listitem>
- <para>Change column data types,</para>
+ <para>Change column data types</para>
</listitem>
<listitem>
- <para>Rename columns,</para>
+ <para>Rename columns</para>
</listitem>
<listitem>
- <para>Rename tables.</para>
+ <para>Rename tables</para>
</listitem>
</itemizedlist>
</indexterm>
<para>
- To add a column, use a command like this:
+ To add a column, use a command like:
<programlisting>
ALTER TABLE products ADD COLUMN description text;
</programlisting>
</indexterm>
<para>
- To remove a column, use a command like this:
+ To remove a column, use a command like:
<programlisting>
ALTER TABLE products DROP COLUMN description;
</programlisting>
</indexterm>
<para>
- To set a new default for a column, use a command like this:
+ To set a new default for a column, use a command like:
<programlisting>
ALTER TABLE products ALTER COLUMN price SET DEFAULT 7.77;
</programlisting>
</indexterm>
<para>
- To convert a column to a different data type, use a command like this:
+ To convert a column to a different data type, use a command like:
<programlisting>
ALTER TABLE products ALTER COLUMN price TYPE numeric(10,2);
</programlisting>
<listitem>
<para>
Third-party applications can be put into separate schemas so
- they cannot collide with the names of other objects.
+ they do not collide with the names of other objects.
</para>
</listitem>
</itemizedlist>
<para>
In the previous sections we created tables without specifying any
- schema names. By default, such tables (and other objects) are
+ schema names. By default such tables (and other objects) are
automatically put into a schema named <quote>public</quote>. Every new
database contains such a schema. Thus, the following are equivalent:
<programlisting>
<para>
By default, users cannot access any objects in schemas they do not
- own. To allow that, the owner of the schema needs to grant the
+ own. To allow that, the owner of the schema must grant the
<literal>USAGE</literal> privilege on the schema. To allow users
to make use of the objects in the schema, additional privileges
might need to be granted, as appropriate for the object.
such names, to ensure that you won't suffer a conflict if some
future version defines a system table named the same as your
table. (With the default search path, an unqualified reference to
- your table name would be resolved as the system table instead.)
+ your table name would be resolved as a system table instead.)
System tables will continue to follow the convention of having
names beginning with <literal>pg_</>, so that they will not
conflict with unqualified user-table names so long as users avoid
<programlisting>
SELECT p.relname, c.name, c.altitude
FROM cities c, pg_class p
-WHERE c.altitude > 500 and c.tableoid = p.oid;
+WHERE c.altitude > 500 AND c.tableoid = p.oid;
</programlisting>
which returns:
<para>
Table access permissions are not automatically inherited. Therefore,
a user attempting to access a parent table must either have permissions
- to do the operation on all its child tables as well, or must use the
+ to do the same operation on all its child tables as well, or must use the
<literal>ONLY</literal> notation. When adding a new child table to
an existing inheritance hierarchy, be careful to grant all the needed
permissions on it.
These deficiencies will probably be fixed in some future release,
but in the meantime considerable care is needed in deciding whether
- inheritance is useful for your problem.
+ inheritance is useful for your application.
</para>
<note>
</programlisting>
Ensure that the constraints guarantee that there is no overlap
between the key values permitted in different partitions. A common
- mistake is to set up range constraints like this:
+ mistake is to set up range constraints like:
<programlisting>
CHECK ( outletID BETWEEN 100 AND 200 )
CHECK ( outletID BETWEEN 200 AND 300 )
For example, suppose we are constructing a database for a large
ice cream company. The company measures peak temperatures every
day as well as ice cream sales in each region. Conceptually,
- we want a table like this:
+ we want a table like:
<programlisting>
CREATE TABLE measurement (
CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
- IF ( NEW.logdate >= DATE '2006-02-01' AND NEW.logdate < DATE '2006-03-01' ) THEN
+ IF ( NEW.logdate >= DATE '2006-02-01' AND
+ NEW.logdate < DATE '2006-03-01' ) THEN
INSERT INTO measurement_y2006m02 VALUES (NEW.*);
- ELSIF ( NEW.logdate >= DATE '2006-03-01' AND NEW.logdate < DATE '2006-04-01' ) THEN
+ ELSIF ( NEW.logdate >= DATE '2006-03-01' AND
+ NEW.logdate < DATE '2006-04-01' ) THEN
INSERT INTO measurement_y2006m03 VALUES (NEW.*);
...
- ELSIF ( NEW.logdate >= DATE '2008-01-01' AND NEW.logdate < DATE '2008-02-01' ) THEN
+ ELSIF ( NEW.logdate >= DATE '2008-01-01' AND
+ NEW.logdate < DATE '2008-02-01' ) THEN
INSERT INTO measurement_y2008m01 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!';
Without constraint exclusion, the above query would scan each of
the partitions of the <structname>measurement</> table. With constraint
exclusion enabled, the planner will examine the constraints of each
- partition and try to prove that the partition need not
- be scanned because it could not contain any rows meeting the query's
- <literal>WHERE</> clause. When the planner can prove this, it
+ partition and try to determine which partitions need not
+ be scanned because they cannot not contain any rows meeting the query's
+ <literal>WHERE</> clause. When the planner can determine this, it
excludes the partition from the query plan.
</para>
<para>
If you are using manual <command>VACUUM</command> or
<command>ANALYZE</command> commands, don't forget that
- you need to run them on each partition individually. A command like
+ you need to run them on each partition individually. A command like:
<programlisting>
ANALYZE measurement;
</programlisting>
<listitem>
<para>
- Keep the partitioning constraints simple, else the planner may not be
+ Keep the partitioning constraints simple or else the planner may not be
able to prove that partitions don't need to be visited. Use simple
equality conditions for list partitioning, or simple
range tests for range partitioning, as illustrated in the preceding
that exist in a database. Many other kinds of objects can be
created to make the use and management of the data more efficient
or convenient. They are not discussed in this chapter, but we give
- you a list here so that you are aware of what is possible.
+ you a list here so that you are aware of what is possible:
</para>
<itemizedlist>
<para>
When you create complex database structures involving many tables
with foreign key constraints, views, triggers, functions, etc. you
- will implicitly create a net of dependencies between the objects.
+ implicitly create a net of dependencies between the objects.
For instance, a table with a foreign key constraint depends on the
table it references.
</para>
HINT: Use DROP ... CASCADE to drop the dependent objects too.
</screen>
The error message contains a useful hint: if you do not want to
- bother deleting all the dependent objects individually, you can run
+ bother deleting all the dependent objects individually, you can run:
<screen>
DROP TABLE products CASCADE;
</screen>
the possible dependencies varies with the type of the object. You
can also write <literal>RESTRICT</literal> instead of
<literal>CASCADE</literal> to get the default behavior, which is to
- prevent drops of objects that other objects depend on.
+ prevent the dropping of objects that other objects depend on.
</para>
<note>
table data. We also introduce ways to effect automatic data changes
when certain events occur: triggers and rewrite rules. The chapter
after this will finally explain how to extract your long-lost data
- back out of the database.
+ from the database.
</para>
<sect1 id="dml-insert">
do before a database can be of much use is to insert data. Data is
conceptually inserted one row at a time. Of course you can also
insert more than one row, but there is no way to insert less than
- one row at a time. Even if you know only some column values, a
+ one row. Even if you know only some column values, a
complete row must be created.
</para>
<para>
To create a new row, use the <xref linkend="sql-insert"
endterm="sql-insert-title"> command. The command requires the
- table name and a value for each of the columns of the table. For
+ table name and column values. For
example, consider the products table from <xref linkend="ddl">:
<programlisting>
CREATE TABLE products (
<para>
The above syntax has the drawback that you need to know the order
- of the columns in the table. To avoid that you can also list the
+ of the columns in the table. To avoid this you can also list the
columns explicitly. For example, both of the following commands
have the same effect as the one above:
<programlisting>
To perform an update, you need three pieces of information:
<orderedlist spacing="compact">
<listitem>
- <para>The name of the table and column to update,</para>
+ <para>The name of the table and column to update</para>
</listitem>
<listitem>
- <para>The new value of the column,</para>
+ <para>The new value of the column</para>
</listitem>
<listitem>
- <para>Which row(s) to update.</para>
+ <para>Which row(s) to update</para>
</listitem>
</orderedlist>
</para>
<para>
Recall from <xref linkend="ddl"> that SQL does not, in general,
provide a unique identifier for rows. Therefore it is not
- necessarily possible to directly specify which row to update.
+ always possible to directly specify which row to update.
Instead, you specify which conditions a row must meet in order to
- be updated. Only if you have a primary key in the table (no matter
- whether you declared it or not) can you reliably address individual rows,
+ be updated. Only if you have a primary key in the table (independent of
+ whether you declared it or not) can you reliably address individual rows
by choosing a condition that matches the primary key.
Graphical database access tools rely on this fact to allow you to
update rows individually.
<literal>UPDATE</literal> followed by the table name. As usual,
the table name can be schema-qualified, otherwise it is looked up
in the path. Next is the key word <literal>SET</literal> followed
- by the column name, an equals sign and the new column value. The
+ by the column name, an equal sign, and the new column value. The
new column value can be any scalar expression, not just a constant.
For example, if you want to raise the price of all products by 10%
you could use:
<programlisting>
DELETE FROM products;
</programlisting>
- then all rows in the table will be deleted! Caveat programmer.
+ then all rows in the table will be deleted! (<xref
+ linkend="sql-truncate" endterm="sql-truncate-title"> can also be used
+ to delete all rows.)
+ Caveat programmer.
</para>
</sect1>
</chapter>
Create the directory
<filename>/usr/local/share/sgml/docbook-4.2</filename> and change
to it. (The exact location is irrelevant, but this one is
- reasonable within the layout we are following here.)
+ reasonable within the layout we are following here.):
<screen>
<prompt>$ </prompt><userinput>mkdir /usr/local/share/sgml/docbook-4.2</userinput>
<prompt>$ </prompt><userinput>cd /usr/local/share/sgml/docbook-4.2</userinput>
<step>
<para>
- Unpack the archive.
+ Unpack the archive:
<screen>
<prompt>$ </prompt><userinput>unzip -a ...../docbook-4.2.zip</userinput>
</screen>
<para>
Download the <ulink url="http://www.oasis-open.org/cover/ISOEnts.zip">
ISO 8879 character entities archive</ulink>, unpack it, and put the
- files in the same directory you put the DocBook files in.
+ files in the same directory you put the DocBook files in:
<screen>
<prompt>$ </prompt><userinput>cd /usr/local/share/sgml/docbook-4.2</userinput>
<prompt>$ </prompt><userinput>unzip ...../ISOEnts.zip</userinput>
To install the style sheets, unzip and untar the distribution and
move it to a suitable place, for example
<filename>/usr/local/share/sgml</filename>. (The archive will
- automatically create a subdirectory.)
+ automatically create a subdirectory.):
<screen>
<prompt>$</prompt> <userinput>gunzip docbook-dsssl-1.<replaceable>xx</>.tar.gz</userinput>
<prompt>$</prompt> <userinput>tar -C /usr/local/share/sgml -xf docbook-dsssl-1.<replaceable>xx</>.tar</userinput>
<screen>
<prompt>doc/src/sgml$ </prompt><userinput>gmake postgres-A4.pdf</userinput>
</screen>
- or
+ or:
<screen>
<prompt>doc/src/sgml$ </prompt><userinput>gmake postgres-US.pdf</userinput>
</screen>
following one. A utility, <command>fixrtf</command>, is
available in <filename>doc/src/sgml</filename> to accomplish
these repairs:
-
<screen>
<prompt>doc/src/sgml$ </prompt><userinput>./fixrtf --refentry postgres.rtf</userinput>
</screen>
<para>
The pgtypes library maps <productname>PostgreSQL</productname> database
types to C equivalents that can be used in C programs. It also offers
- functions to do basic calculations with those types within C, i.e. without
+ functions to do basic calculations with those types within C, i.e., without
the help of the <productname>PostgreSQL</productname> server. See the
following example:
<programlisting><![CDATA[
char *PGTYPESdate_to_asc(date dDate);
</synopsis>
The function receives the date <literal>dDate</> as its only parameter.
- It will output the date in the form <literal>1999-01-18</>, i.e. in the
+ It will output the date in the form <literal>1999-01-18</>, i.e., in the
<literal>YYYY-MM-DD</> format.
</para>
</listitem>
define their own functions and operators, as described in
<xref linkend="server-programming">. The
<application>psql</application> commands <command>\df</command> and
- <command>\do</command> can be used to show the list of all actually
+ <command>\do</command> can be used to list all
available functions and operators, respectively.
</para>
<para>
- If you are concerned about portability then take note that most of
+ If you are concerned about portability then note that most of
the functions and operators described in this chapter, with the
exception of the most trivial arithmetic and comparison operators
and some explicitly marked functions, are not specified by the
- <acronym>SQL</acronym> standard. Some of the extended functionality
+ <acronym>SQL</acronym> standard. Some of this extended functionality
is present in other <acronym>SQL</acronym> database management
systems, and in many cases this functionality is compatible and
consistent between the various implementations. This chapter is also
</note>
<para>
- Comparison operators are available for all data types where this
- makes sense. All comparison operators are binary operators that
+ Comparison operators are available for all relevant data types.
+ All comparison operators are binary operators that
return values of type <type>boolean</type>; expressions like
<literal>1 < 2 < 3</literal> are not valid (because there is
no <literal><</literal> operator to compare a Boolean value with
<primary>BETWEEN</primary>
</indexterm>
In addition to the comparison operators, the special
- <token>BETWEEN</token> construct is available.
+ <token>BETWEEN</token> construct is available:
<synopsis>
<replaceable>a</replaceable> BETWEEN <replaceable>x</replaceable> AND <replaceable>y</replaceable>
</synopsis>
<synopsis>
<replaceable>a</replaceable> >= <replaceable>x</replaceable> AND <replaceable>a</replaceable> <= <replaceable>y</replaceable>
</synopsis>
- Similarly,
+ Note <token>BETWEEN</token> is inclusive in comparing the endpoint
+ values. <literal>NOT BETWEEN</literal> does the opposite comparison:
<synopsis>
<replaceable>a</replaceable> NOT BETWEEN <replaceable>x</replaceable> AND <replaceable>y</replaceable>
</synopsis>
<synopsis>
<replaceable>a</replaceable> < <replaceable>x</replaceable> OR <replaceable>a</replaceable> > <replaceable>y</replaceable>
</synopsis>
- There is no difference between the two respective forms apart from
- the <acronym>CPU</acronym> cycles required to rewrite the first one
- into the second one internally.
<indexterm>
<primary>BETWEEN SYMMETRIC</primary>
</indexterm>
<indexterm>
<primary>NOTNULL</primary>
</indexterm>
- To check whether a value is or is not null, use the constructs
+ To check whether a value is or is not null, use the constructs:
<synopsis>
<replaceable>expression</replaceable> IS NULL
<replaceable>expression</replaceable> IS NOT NULL
</synopsis>
- or the equivalent, but nonstandard, constructs
+ or the equivalent, but nonstandard, constructs:
<synopsis>
<replaceable>expression</replaceable> ISNULL
<replaceable>expression</replaceable> NOTNULL
<tip>
<para>
- Some applications might expect that
+ Some applications might expect
<literal><replaceable>expression</replaceable> = NULL</literal>
returns true if <replaceable>expression</replaceable> evaluates to
the null value. It is highly recommended that these applications
cannot be done the <xref linkend="guc-transform-null-equals">
configuration variable is available. If it is enabled,
<productname>PostgreSQL</productname> will convert <literal>x =
- NULL</literal> clauses to <literal>x IS NULL</literal>. This was
- the default behavior in <productname>PostgreSQL</productname>
- releases 6.5 through 7.1.
+ NULL</literal> clauses to <literal>x IS NULL</literal>.
</para>
</tip>
<literal>IS NOT NULL</> is true when the row expression itself is non-null
and all the row's fields are non-null. Because of this behavior,
<literal>IS NULL</> and <literal>IS NOT NULL</> do not always return
- inverse results for row-valued expressions, i.e. a row-valued
+ inverse results for row-valued expressions, i.e., a row-valued
expression that contains both NULL and non-null values will return false
for both tests.
This definition conforms to the SQL standard, and is a change from the
<indexterm>
<primary>IS NOT DISTINCT FROM</primary>
</indexterm>
- The ordinary comparison operators yield null (signifying <quote>unknown</>)
- when either input is null. Another way to do comparisons is with the
+ Ordinary comparison operators yield null (signifying <quote>unknown</>)
+ when either input is null, not true or false, e.g., <literal>7 =
+ NULL</> yields null.
+ Another way to do comparisons is with the
<literal>IS <optional> NOT </> DISTINCT FROM</literal> construct:
<synopsis>
<replaceable>expression</replaceable> IS DISTINCT FROM <replaceable>expression</replaceable>
<replaceable>expression</replaceable> IS NOT DISTINCT FROM <replaceable>expression</replaceable>
</synopsis>
For non-null inputs, <literal>IS DISTINCT FROM</literal> is
- the same as the <literal><></> operator. However, when both
- inputs are null it will return false, and when just one input is
- null it will return true. Similarly, <literal>IS NOT DISTINCT
+ the same as the <literal><></> operator. However, if both
+ inputs are null it returns false, and if only one input is
+ null it returns true. Similarly, <literal>IS NOT DISTINCT
FROM</literal> is identical to <literal>=</literal> for non-null
inputs, but it returns true when both inputs are null, and false when only
one input is null. Thus, these constructs effectively act as though null
<para>
Mathematical operators are provided for many
- <productname>PostgreSQL</productname> types. For types without
- common mathematical conventions for all possible permutations
+ <productname>PostgreSQL</productname> types. For types that support
+ only limited mathematical operations
(e.g., date/time types) we
describe the actual behavior in subsequent sections.
</para>
<row>
<entry> <literal>/</literal> </entry>
- <entry>division (integer division truncates results)</entry>
+ <entry>division (integer division truncates the result)</entry>
<entry><literal>4 / 2</literal></entry>
<entry><literal>2</literal></entry>
</row>
<tbody>
<row>
<entry><literal><function>abs</>(<replaceable>x</replaceable>)</literal></entry>
- <entry>(same as <replaceable>x</>)</entry>
+ <entry>(same as input)</entry>
<entry>absolute value</entry>
<entry><literal>abs(-17.4)</literal></entry>
<entry><literal>17.4</literal></entry>
<row>
<entry><literal><function>random</function>()</literal></entry>
<entry><type>dp</type></entry>
- <entry>random value between 0.0 and 1.0</entry>
+ <entry>random value between 0.0 and 1.0, inclusive</entry>
<entry><literal>random()</literal></entry>
<entry></entry>
</row>
<row>
<entry><literal><function>setseed</function>(<type>dp</type>)</literal></entry>
<entry><type>void</type></entry>
- <entry>set seed for subsequent <literal>random()</literal> calls (value between -1.0 and 1.0)</entry>
+ <entry>set seed for subsequent <literal>random()</literal> calls (value between -1.0 and
+ 1.0, inclusive)</entry>
<entry><literal>setseed(0.54823)</literal></entry>
<entry></entry>
</row>
<entry>
<acronym>ASCII</acronym> code of the first character of the
argument. For <acronym>UTF8</acronym> returns the Unicode code
- point of the character. For other multibyte encodings. the
- argument must be a strictly <acronym>ASCII</acronym> character.
+ point of the character. For other multibyte encodings, the
+ argument must be an <acronym>ASCII</acronym> character.
</entry>
<entry><literal>ascii('x')</literal></entry>
<entry><literal>120</literal></entry>
<entry>
Character with the given code. For <acronym>UTF8</acronym> the
argument is treated as a Unicode code point. For other multibyte
- encodings the argument must designate a strictly
+ encodings the argument must designate an
<acronym>ASCII</acronym> character. The NULL (0) character is not
allowed because text data types cannot store such bytes.
</entry>
linkend="conversion-names"> for available conversions.
</entry>
<entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
- <entry><literal>text_in_utf8</literal> represented in ISO 8859-1 encoding</entry>
+ <entry><literal>text_in_utf8</literal> represented in Latin-1
+ encoding (ISO 8859-1)</entry>
</row>
<row>
The conversion names follow a standard naming scheme: The
official name of the source encoding with all
non-alphanumeric characters replaced by underscores followed
- by <literal>_to_</literal> followed by the equally processed
- destination encoding name. Therefore the names might deviate
+ by <literal>_to_</literal> followed by similarly
+ destination encoding name. Therefore, the names might deviate
from the customary encoding names.
</para>
</footnote>
</para>
<para>
- <acronym>SQL</acronym> defines some string functions with a
- special syntax where
- certain key words rather than commas are used to separate the
+ <acronym>SQL</acronym> defines some string functions that use
+ a key word syntax, rather than commas to separate
arguments. Details are in
<xref linkend="functions-binarystring-sql">.
- Some functions are also implemented using the regular syntax for
+ Such functions are also implemented using the regular syntax for
function invocation.
(See <xref linkend="functions-binarystring-other">.)
</para>
'1110'::bit(4)::integer <lineannotation>14</lineannotation>
</programlisting>
Note that casting to just <quote>bit</> means casting to
- <literal>bit(1)</>, and so it will deliver only the least significant
+ <literal>bit(1)</>, and so will deliver only the least significant
bit of the integer.
</para>
SQL:1999), and <acronym>POSIX</acronym>-style regular
expressions. Aside from the basic <quote>does this string match
this pattern?</> operators, functions are available to extract
- or replace matching substrings and to split a string at the matches.
+ or replace matching substrings and to split a string at matching
+ locations.
</para>
<tip>
</synopsis>
<para>
- Every <replaceable>pattern</replaceable> defines a set of strings.
- The <function>LIKE</function> expression returns true if the
- <replaceable>string</replaceable> is contained in the set of
- strings represented by <replaceable>pattern</replaceable>. (As
+ The <function>LIKE</function> expression returns true if
+ <replaceable>string</replaceable> matches the supplied
+ <replaceable>pattern</replaceable>. (As
expected, the <function>NOT LIKE</function> expression returns
false if <function>LIKE</function> returns true, and vice versa.
An equivalent expression is
</para>
<para>
- <function>LIKE</function> pattern matches always cover the entire
- string. To match a sequence anywhere within a string, the
- pattern must therefore start and end with a percent sign.
+ <function>LIKE</function> pattern matching always covers the entire
+ string. Therefore, to match a sequence anywhere within a string, the
+ pattern must start and end with a percent sign.
</para>
<para>
- To match a literal underscore or percent sign without matching
+ To match only a literal underscore or percent sign without matching
other characters, the respective character in
<replaceable>pattern</replaceable> must be
preceded by the escape character. The default escape
actually matches a literal backslash means writing four backslashes in the
statement. You can avoid this by selecting a different escape character
with <literal>ESCAPE</literal>; then a backslash is not special to
- <function>LIKE</function> anymore. (But it is still special to the string
+ <function>LIKE</function> anymore. (But backslash is still special to the string
literal parser, so you still need two of them.)
</para>
<para>
The <function>SIMILAR TO</function> operator returns true or
false depending on whether its pattern matches the given string.
- It is much like <function>LIKE</function>, except that it
+ It is similar to <function>LIKE</function>, except that it
interprets the pattern using the SQL standard's definition of a
regular expression. SQL regular expressions are a curious cross
between <function>LIKE</function> notation and common regular
</para>
<para>
- Like <function>LIKE</function>, the <function>SIMILAR TO</function>
+ Like <function>LIKE</function>, the <function>SIMILAR TO</function>
operator succeeds only if its pattern matches the entire string;
- this is unlike common regular expression practice, wherein the pattern
+ this is unlike common regular expression behavior where the pattern
can match any part of the string.
Also like
<function>LIKE</function>, <function>SIMILAR TO</function> uses
</itemizedlist>
Notice that bounded repetition (<literal>?</> and <literal>{...}</>)
- are not provided, though they exist in POSIX. Also, the dot (<literal>.</>)
+ is not provided, though they exist in POSIX. Also, the period (<literal>.</>)
is not a metacharacter.
</para>
<replaceable>escape-character</replaceable>)</function>, provides
extraction of a substring that matches an SQL
regular expression pattern. As with <literal>SIMILAR TO</>, the
- specified pattern must match to the entire data string, else the
+ specified pattern must match the entire data string, or else the
function fails and returns null. To indicate the part of the
pattern that should be returned on success, the pattern must contain
two occurrences of the escape character followed by a double quote
</para>
<para>
- Some examples:
+ Some examples, with <literal>#"</> delimiting the return string:
<programlisting>
substring('foobar' from '%#"o_b#"%' for '#') <lineannotation>oob</lineannotation>
substring('foobar' from '#"o_b#"%' for '#') <lineannotation>NULL</lineannotation>
expression. As with <function>LIKE</function>, pattern characters
match string characters exactly unless they are special characters
in the regular expression language — but regular expressions use
- different special characters than <function>LIKE</function> does.
+ different special characters than <function>LIKE</function>.
Unlike <function>LIKE</function> patterns, a
regular expression is allowed to match anywhere within a string, unless
the regular expression is explicitly anchored to the beginning or
<para>
<productname>PostgreSQL</productname>'s regular expressions are implemented
- using a package written by Henry Spencer. Much of
+ using a software package written by Henry Spencer. Much of
the description of regular expressions below is copied verbatim from his
- manual entry.
+ manual.
</para>
<para>
(roughly those of <command>ed</command>).
<productname>PostgreSQL</productname> supports both forms, and
also implements some extensions
- that are not in the POSIX standard, but have become widely used anyway
+ that are not in the POSIX standard, but have become widely used
due to their availability in programming languages such as Perl and Tcl.
<acronym>RE</acronym>s using these non-POSIX extensions are called
<firstterm>advanced</> <acronym>RE</acronym>s or <acronym>ARE</>s
<productname>PostgreSQL</> can be chosen by setting the <xref
linkend="guc-regex-flavor"> run-time parameter. The usual
setting is <literal>advanced</>, but one might choose
- <literal>extended</> for maximum backwards compatibility with
+ <literal>extended</> for backwards compatibility with
pre-7.4 releases of <productname>PostgreSQL</>.
</para>
</note>
<para>
A branch is zero or more <firstterm>quantified atoms</> or
<firstterm>constraints</>, concatenated.
- It matches a match for the first, followed by a match for the second, etc;
+ It tries a match of the first, followed by a match for the second, etc;
an empty branch matches the empty string.
</para>
<para>
A <firstterm>constraint</> matches an empty string, but matches only when
- specific conditions are met. A constraint can be used where an atom
- could be used, except it cannot be followed by a quantifier.
+ specific conditions are met. A constraint cannot be followed by a quantifier.
The simple constraints are shown in