Pavan Deolasee [Thu, 10 Dec 2015 10:08:17 +0000 (15:38 +0530)]
Merge remote-tracking branch 'remotes/PGSQL/REL9_5_STABLE' into XL9_5_STABLE
Pavan Deolasee [Thu, 10 Dec 2015 05:44:23 +0000 (11:14 +0530)]
Add a developer-GUC 'gloabl_snapshot_source' to allow informed users to
override the way snapshots are computed.
The default value of the GUC is 'gtm' which means that snapshots are always
generated on the GTM so that we get a full and correct view of the currently
running transactions. But with this developer-GUC we now allow users to
override that and work with a coordinator generated snapshots. This can be
especially useful for read-only queries which now don't need to talk to the
GTM. If the snapshots can also be taken locally on a coordinator, this will
even further reduce the round-trips to the GTM. Of course, this can lead to
consistency issues because a coordinator may not be aware of all the
transactions currently running on the XL cluster, especially in a
multi-coordinator setup where different coordinators could be running different
transactions without knowing about each other's activity. But even in a single
coordinator setup, some transactions may start on a datanode and coordinator
may not know about them or may only know quite late.
Its advised that this feature must be used with caution and after due
consideration of the effects
Pavan Deolasee [Wed, 9 Dec 2015 11:07:37 +0000 (16:37 +0530)]
Allow on-demand assignment of XID, even on a datanode
Till now, we used to aggressively assign transaction identifiers on the
coordinator, even if the transaction may not actually need one because
it does not do any write operation. PostgreSQL has improved this case many
years back with the usage of Virtual XIDs. But since in Postgres-XL, if a
read-looking SELECT statement later does some write opreation, it would be too
late to assign a transaction identifier when the query is running on the
datanode. But such aggressive XID assignment causes severe performance
problems, especially for read-only queries.
We now solve this by assigning XID on-demand for SELECT queries. If a datanode
ever needs an XID, it will talk to the GTM with a global session identifier
which remains unique for a given global transaction on all nodes. GTM can then
correlate multiple BEGIN TXN requests for the same global transaction and
return the already assigned identifer, instead of opening a new transaction
every time.
For DML queries, we still acquire XID on the coordinator. This will be a loss
if DML ends up not doing any write operation. But since its not going to be a
very regular scenario, it makes sense to pre-acquire an XID. More tests are
required though to verify this
Pavan Deolasee [Wed, 9 Dec 2015 10:26:09 +0000 (15:56 +0530)]
Forget connection combiner before receving pending messages during cleanup
Pavan Deolasee [Wed, 9 Dec 2015 10:01:36 +0000 (15:31 +0530)]
Do not use RemoteSubplan for simple Result plan which can be evaluated on any
node
This should reduce connections for a single INSERT statements
Pavan Deolasee [Wed, 9 Dec 2015 08:04:57 +0000 (13:34 +0530)]
Reduce logging level for tuplestore_end message
Pavan Deolasee [Wed, 9 Dec 2015 07:59:48 +0000 (13:29 +0530)]
Avoid repeated invalidation on plans because of mismatch in search_path
Pavan Deolasee [Wed, 9 Dec 2015 07:23:40 +0000 (12:53 +0530)]
Introduce a cluster monitor lock to avoid a race condition between spanshot
fetch and xmin reporting
Pavan Deolasee [Wed, 9 Dec 2015 07:04:57 +0000 (12:34 +0530)]
Reduce logging level for couple of messages
Pavan Deolasee [Wed, 9 Dec 2015 06:35:29 +0000 (12:05 +0530)]
Ouch. Forgot to take this half-cooked and incorrect code in the previous commit
Pavan Deolasee [Tue, 8 Dec 2015 09:05:21 +0000 (14:35 +0530)]
Acquire the right lock for updating latestCompletedXid.
Also make sure that the latestCompletedXid is updated on the nodes irrespective
of whather report-xmin returns success or failure
Pavan Deolasee [Tue, 8 Dec 2015 08:21:57 +0000 (13:51 +0530)]
Rework handling of idle nodes for xmin reporting and calculation
The local value of latestCompletedXid caps the xmin computation on an idle
cluster. So GTM sends back an updated value of latestCompletedXid as part of
report-xmin response. Once latestCompletedXid is updated on the node, the next
iteration will ensure that the local xmin is advanced on an idle server
Pavan Deolasee [Mon, 7 Dec 2015 10:18:33 +0000 (15:48 +0530)]
Cluster Monitor, which is an auxilliary process need not call InitPostgres and
thus should not start a transaction
Pavan Deolasee [Mon, 7 Dec 2015 06:58:52 +0000 (12:28 +0530)]
Reintroduce XC's Fast Query Shipping (FQS) mechanism to improve performance of
simple queries that can be fully shipped to the datanodes
This patch backports some of the FQS code from XC/master branch. We only use
this for simple queries that can be fully executed on the datanodes. But
otherwise queries go through the XL planner/executor. This is showing good
performance improvements for such simple queries. This will also allow us to
support certain missing features such GROUPING SETS on replicated tables or
queries that require no coordinator support for final aggregations. This hasn't
yet tested though
This will cause many regression test case output since the EXPLAIN plan for
FQS-ed queries would look very different.
Pavan Deolasee [Mon, 30 Nov 2015 07:27:36 +0000 (12:57 +0530)]
Avoid deadlock by ensuring that a rd-lock holder does not request another rd or
wr-lock
While its generally safe for threads to acquire same pthread_rwlock in READ
mode multiple times, if there is a writer blocked on the lock, this can cause
deadlock. So avoid that coding practice.
Pavan Deolasee [Mon, 30 Nov 2015 06:39:31 +0000 (12:09 +0530)]
Add debug facility to GTM_RWLock
This can be turned on by #define GTM_LOCK_DEBUG and should be useful for
deadlock detections and other such causes
Pavan Deolasee [Thu, 26 Nov 2015 10:59:09 +0000 (16:29 +0530)]
Use -w option of pg_ctl to wait for the operation to complete
Pavan Deolasee [Thu, 26 Nov 2015 10:58:33 +0000 (16:28 +0530)]
Reduce logging level for a log message added in commit
257588554
Pavan Deolasee [Thu, 26 Nov 2015 08:53:23 +0000 (14:23 +0530)]
Add an user-friendly hint to the (in)famous "Failed to get pooled connection"
error message
At some point we should look at checking various related GUC parameters and
make sure that they have sane values. For example, user may increase
max_connections at the datanodes, but if she forgets to change the
max_pool_size to a more reasonable value, outgoing connections from a
coordinator or a datanode may not be enough, even though the target node can
accept incoming connections. Its generally hard to derive one value from
another because max_pool_size defines limit on outgoing connections from one
node, but max_connections defines the limit on incoming connections. But we can
still provide some useful hints if there is a gross mismatch of values based on
the number of datanodes and coordinators.
Pavan Deolasee [Thu, 26 Nov 2015 07:36:46 +0000 (13:06 +0530)]
Add a cluster monitor postmaster process
Right now the process is responsible for computing the local RecentGlobalXmin
and send periodic updates to the GTM. The GTM then computes a cluster-wide
value of the RecentGlobalXmin and sends it back to all the reporting nodes
(coordinators as well as datanodes). This way GTM does not need to track all
open snapshots in the system, which previously required a transaction to remain
open, even for a read-only operation. While this patch itself may not show
major performance improvements, this will act as a foundation for other major
improvements for transaction handling.
If a node gets disconnected for a long time or stops sending updates to the
GTM, such a node is removed from computation of the RecentGlobalXmin. This is to
ensure that a failed node does not stop advancement of the RecentGlobalXmin
beyond a certain point. Such a node can safely rejoin the cluster as long as
its not using a snapshot with a stale view of the cluster i.e. a snapshot with
xmin less than the RecentGlobalXmin that the GTM is running with.
Pavan Deolasee [Fri, 20 Nov 2015 11:30:03 +0000 (17:00 +0530)]
Free memory allocated for tracking prepared statements.
This fixes a long-standing memory leak in the TopMemoryContext. Surprisingly,
the reason why I started diagnosing the problem is because pgbench -S -c 1
would run significantly slowly with every iteration. The reason turned out be
that AllocSetCheck() will run slower and slower (yeah, I was running with
--enable-cassert like every other developer). That lead me to investigate for
the memory leaks. This patch now fixes that issue and pgbench -S runs at a
consistent speed, even for cassert enabled builds.
Pavan Deolasee [Fri, 20 Nov 2015 07:58:19 +0000 (13:28 +0530)]
Include pg_rusage.h whereever necessary
Pavan Deolasee [Thu, 19 Nov 2015 10:44:48 +0000 (16:14 +0530)]
Do not use READ ONLY transaction while dumping data using pg_dump
We use nextval(sequence) to get a consistent sequence value directly from the
GTM, since sequence values could be catched at different coordinators. But that
requires a RW transaction. Its not ideal for pg_dump to use RW transaction, but
its not terrible either given that its run in a very controlled manner. So
change it that way until we find a more elegant solution
Also fix some assorted issues with pg_dump. It now seems to pass on the
"regression" database after a round of regression run
Pavan Deolasee [Thu, 19 Nov 2015 09:33:16 +0000 (15:03 +0530)]
Add support for all releases newer than 9.2
Pavan Deolasee [Thu, 19 Nov 2015 09:30:51 +0000 (15:00 +0530)]
Initialise root->recursiveOk correcttly.
This was an oversight during the 9.5 merge process, thus breaking WITH
RECURSIVE for replicated and catalog tables. As a side-effect, this also caused
pg_dump to fail.
Pavan Deolasee [Thu, 19 Nov 2015 05:10:00 +0000 (10:40 +0530)]
Check if configuration parametere such as cord/datanode(Specific)ExtraPgHba
exists before trying to read its value
Pavan Deolasee [Thu, 19 Nov 2015 05:09:25 +0000 (10:39 +0530)]
Check for connection before trying to dereference a pointer
Pavan Deolasee [Wed, 18 Nov 2015 09:11:52 +0000 (14:41 +0530)]
Use correct index into GTM proxy array for coordinator failovers
Pavan Deolasee [Thu, 15 Oct 2015 07:37:26 +0000 (13:07 +0530)]
Do not use ereport(ERROR) while cleaning up connections because we could
already be an abort transaction handling, thus causing infinite recursion
Pavan Deolasee [Wed, 18 Nov 2015 08:16:10 +0000 (13:46 +0530)]
Run slaves in hot_standby mode so that we can ping them for monitoring purposes
This actually restores old behaviour of the utility. XL doesn't currently support
read-only queries on the standbys. But without that PQping fails to connect to
the standby. So restore that for now so that pgxc_ctl monitor works for the
slaves
Pavan Deolasee [Wed, 18 Nov 2015 08:15:03 +0000 (13:45 +0530)]
Use whichever coordinator available for running cluster commands
The utility had assumed that the first coordinator at index 0 will always be
available. But that may not be the case when we have actually removed the first
coordinator using "pgxc_ctl remove coordinator master <name>"
Pavan Deolasee [Wed, 18 Nov 2015 08:07:54 +0000 (13:37 +0530)]
Do not try to initialise multi-executor if we are running as a wal sender
pg_basebackup had stopped working because of this issue. Should be fixed now
Pavan Deolasee [Wed, 18 Nov 2015 05:08:42 +0000 (10:38 +0530)]
Ensure that the array in extended only when adding new entries and not while
replacing existing ones
This should handle the problem of coord and datanode arrays going haywire when
first or in-between coordinators/datanodes are removed using pgxc_ctl
Pavan Deolasee [Tue, 17 Nov 2015 12:28:49 +0000 (17:58 +0530)]
Wait for the socket to become ready to receive more data before attempting to
write again
We'd seen that coordinator can become CPU bound while loading large chunks of
data. Upon investigation, it was found that the coordinator process keeps
trying to send more data, even though the underlying networking layer is not
yet ready to receive more data, most likely because the kernel send-buffer is
full. Instead of retrying in a tight loop, we should check for socket readiness
and then write more
This should also fix the problem of COPY process running out of memory on the
coordinator (exhibited by "invalid memory alloc request size" error seen during
pg_restore as well as COPY)
Pavan Deolasee [Tue, 17 Nov 2015 11:07:11 +0000 (16:37 +0530)]
Cancel queries on remote connections upon transaction abort
Pavan Deolasee [Mon, 16 Nov 2015 12:42:08 +0000 (18:12 +0530)]
Add more instrumentation options to code
This patch adds two new GUCs, log_gtm_stats and log_remotesubplan_stats to
collect more information about GTM communication stats and remote subplan
stats
Pavan Deolasee [Mon, 16 Nov 2015 11:07:45 +0000 (16:37 +0530)]
Use poll() instead of select() at a few places
Patch by Krzysztof Nienartowicz, with some bug fixes and rework by me
Pavan Deolasee [Fri, 13 Nov 2015 13:21:09 +0000 (18:51 +0530)]
Remove a lot of XC-specific code from the repo.
Per discussion on the developer list, this patch removes a bulk of XC-specific
code which is not relevant in XL. This code was mostly left-over in #ifdef
blocks, thus complicating code-reading, bug fixes and merges. One can always do
a "git diff" with the XC code base to see the exact differences.
We still continue to use #ifdef PGXC and #ifdef XCP interchangeably because of
the way code was written. Something we should change. Also, there is probably
still some more dead code (because files were copied to different place or
because the code is not referenced in XL). This requires another cleanup patch,
but not something I plan to do immediately
Robert Haas [Thu, 8 Oct 2015 17:21:03 +0000 (13:21 -0400)]
Fix typo in docs.
Pallavi Sontakke
Andrew Dunstan [Wed, 7 Oct 2015 21:41:45 +0000 (17:41 -0400)]
Factor out encoding specific tests for json
This lets us remove the large alternative results files for the main
json and jsonb tests, which makes modifying those tests simpler for
committers and patch submitters.
Backpatch to 9.4 for jsonb and 9.3 for json.
Tom Lane [Wed, 7 Oct 2015 20:12:05 +0000 (16:12 -0400)]
Improve documentation of the role-dropping process.
In general one may have to run both REASSIGN OWNED and DROP OWNED to get
rid of all the dependencies of a role to be dropped. This was alluded to
in the REASSIGN OWNED man page, but not really spelled out in full; and in
any case the procedure ought to be documented in a more prominent place
than that. Add a section to the "Database Roles" chapter explaining this,
and do a bit of wordsmithing in the relevant commands' man pages.
Bruce Momjian [Wed, 7 Oct 2015 14:30:54 +0000 (10:30 -0400)]
docs: add JSONB containment example of a key and empty object
Backpatch through 9.5
Bruce Momjian [Wed, 7 Oct 2015 13:42:26 +0000 (09:42 -0400)]
docs: Map operator @> to the proper SGML escape for '>'
Backpatch through 9.5
Bruce Momjian [Wed, 7 Oct 2015 13:06:49 +0000 (09:06 -0400)]
docs: clarify JSONB operator descriptions
No catalog bump as the catalog changes are for SQL operator comments.
Backpatch through 9.5
Tom Lane [Tue, 6 Oct 2015 21:15:27 +0000 (17:15 -0400)]
Perform an immediate shutdown if the postpid file is removed.
The postmaster now checks every minute or so (worst case, at most two
minutes) that postmaster.pid is still there and still contains its own PID.
If not, it performs an immediate shutdown, as though it had received
SIGQUIT.
The original goal behind this change was to ensure that failed buildfarm
runs would get fully cleaned up, even if the test scripts had left a
postmaster running, which is not an infrequent occurrence. When the
buildfarm script removes a test postmaster's $PGDATA directory, its next
check on postmaster.pid will fail and cause it to exit. Previously, manual
intervention was often needed to get rid of such orphaned postmasters,
since they'd block new test postmasters from obtaining the expected socket
address.
However, by checking postmaster.pid and not something else, we can provide
additional robustness: manual removal of postmaster.pid is a frequent DBA
mistake, and now we can at least limit the damage that will ensue if a new
postmaster is started while the old one is still alive.
Back-patch to all supported branches, since we won't get the desired
improvement in buildfarm reliability otherwise.
Tom Lane [Mon, 5 Oct 2015 19:09:44 +0000 (15:09 -0400)]
Stamp 9.5beta1.
Bruce Momjian [Mon, 5 Oct 2015 17:38:36 +0000 (13:38 -0400)]
docs: update guidelines on when to use GIN and GiST indexes
Report by Tomas Vondra
Backpatch through 9.5
Tom Lane [Mon, 5 Oct 2015 16:44:12 +0000 (12:44 -0400)]
Docs: explain contrib/pg_stat_statements' handling of GC failure.
Failure to perform garbage collection now has a user-visible effect, so
explain that and explain that reducing pgss_max is the way to prevent it.
Per gripe from Andrew Dunstan.
Tom Lane [Mon, 5 Oct 2015 16:19:14 +0000 (12:19 -0400)]
Fix insufficiently-portable regression test case.
Some of the buildfarm members are evidently miserly enough of stack space
to pass the originally-committed form of this test. Increase the
requirement 10X to hopefully ensure that it fails as-expected everywhere.
Security: CVE-2015-5289
Peter Eisentraut [Mon, 5 Oct 2015 14:59:53 +0000 (10:59 -0400)]
Translation updates
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash:
23a52bc86edcd39c3c6b80ee1f7374759c8711f8
Tom Lane [Mon, 5 Oct 2015 14:57:15 +0000 (10:57 -0400)]
Last-minute updates for release notes.
Add entries for security and not-quite-security issues.
Security: CVE-2015-5288, CVE-2015-5289
Andres Freund [Mon, 5 Oct 2015 14:09:13 +0000 (16:09 +0200)]
Remove outdated comment about relation level autovacuum freeze limits.
The documentation for the autovacuum_multixact_freeze_max_age and
autovacuum_freeze_max_age relation level parameters contained:
"Note that while you can set autovacuum_multixact_freeze_max_age very
small, or even zero, this is usually unwise since it will force frequent
vacuuming."
which hasn't been true since these options were made relation options,
instead of residing in the pg_autovacuum table (
834a6da4f7).
Remove the outdated sentence. Even the lowered limits from
2596d70 are
high enough that this doesn't warrant calling out the risk in the CREATE
TABLE docs.
Per discussion with Tom Lane and Alvaro Herrera
Discussion: 26377.
1443105453@sss.pgh.pa.us
Backpatch: 9.0- (in parts)
Stephen Frost [Mon, 5 Oct 2015 14:14:51 +0000 (10:14 -0400)]
Add regression tests for INSERT/UPDATE+RETURNING
This adds regressions tests which are specific to INSERT+RETURNING and
UPDATE+RETURNING to ensure that the SELECT policies are added as
WithCheckOptions (and should therefore throw an error when the policy is
violated).
Per suggestion from Andres.
Back-patch to 9.5 as the prior commit was.
Noah Misch [Mon, 5 Oct 2015 14:06:30 +0000 (10:06 -0400)]
Prevent stack overflow in query-type functions.
The tsquery, ltxtquery and query_int data types have a common ancestor.
Having acquired check_stack_depth() calls independently, each was
missing at least one call. Back-patch to 9.0 (all supported versions).
Noah Misch [Mon, 5 Oct 2015 14:06:29 +0000 (10:06 -0400)]
Prevent stack overflow in container-type functions.
A range type can name another range type as its subtype, and a record
type can bear a column of another record type. Consequently, functions
like range_cmp() and record_recv() are recursive. Functions at risk
include operator family members and referents of pg_type regproc
columns. Treat as recursive any such function that looks up and calls
the same-purpose function for a record column type or the range subtype.
Back-patch to 9.0 (all supported versions).
An array type's element type is never itself an array type, so array
functions are unaffected. Recursion depth proportional to array
dimensionality, found in array_dim_to_jsonb(), is fine thanks to MAXDIM.
Noah Misch [Mon, 5 Oct 2015 14:06:29 +0000 (10:06 -0400)]
Prevent stack overflow in json-related functions.
Sufficiently-deep recursion heretofore elicited a SIGSEGV. If an
application constructs PostgreSQL json or jsonb values from arbitrary
user input, application users could have exploited this to terminate all
active database connections. That applies to 9.3, where the json parser
adopted recursive descent, and later versions. Only row_to_json() and
array_to_json() were at risk in 9.2, both in a non-security capacity.
Back-patch to 9.2, where the json type was introduced.
Oskari Saarenmaa, reviewed by Michael Paquier.
Security: CVE-2015-5289
Noah Misch [Mon, 5 Oct 2015 14:06:29 +0000 (10:06 -0400)]
pgcrypto: Detect and report too-short crypt() salts.
Certain short salts crashed the backend or disclosed a few bytes of
backend memory. For existing salt-induced error conditions, emit a
message saying as much. Back-patch to 9.0 (all supported versions).
Josh Kupershmidt
Security: CVE-2015-5288
Stephen Frost [Mon, 5 Oct 2015 11:55:11 +0000 (07:55 -0400)]
Apply SELECT policies in INSERT/UPDATE+RETURNING
Similar to
7d8db3e, given that INSERT+RETURNING requires SELECT rights
on the table, apply the SELECT policies as WCOs to the tuples being
inserted. Apply the same logic to UPDATE+RETURNING.
Back-patch to 9.5 where RLS was added.
Stephen Frost [Mon, 5 Oct 2015 11:38:56 +0000 (07:38 -0400)]
Do not write out WCOs in Query
The WithCheckOptions list in Query are only populated during rewrite and
do not need to be written out or read in as part of a Query structure.
Further, move WithCheckOptions to the bottom and add comments to clarify
that it is only populated during rewrite.
Back-patch to 9.5 with a catversion bump, as we are still in alpha.
Andres Freund [Mon, 5 Oct 2015 09:53:43 +0000 (11:53 +0200)]
Re-Align *_freeze_max_age reloption limits with corresponding GUC limits.
In
020235a5754 I lowered the autovacuum_*freeze_max_age minimums to
allow for easier testing of wraparounds. I did not touch the
corresponding per-table limits. While those don't matter for the purpose
of wraparound, it seems more consistent to lower them as well.
It's noteworthy that the previous reloption lower limit for
autovacuum_multixact_freeze_max_age was too high by one magnitude, even
before
020235a5754.
Discussion: 26377.
1443105453@sss.pgh.pa.us
Backpatch: back to 9.0 (in parts), like the prior patch
Stephen Frost [Mon, 5 Oct 2015 01:05:18 +0000 (21:05 -0400)]
ALTER TABLE .. FORCE ROW LEVEL SECURITY
To allow users to force RLS to always be applied, even for table owners,
add ALTER TABLE .. FORCE ROW LEVEL SECURITY.
row_security=off overrides FORCE ROW LEVEL SECURITY, to ensure pg_dump
output is complete (by default).
Also add SECURITY_NOFORCE_RLS context to avoid data corruption when
ALTER TABLE .. FORCE ROW SECURITY is being used. The
SECURITY_NOFORCE_RLS security context is used only during referential
integrity checks and is only considered in check_enable_rls() after we
have already checked that the current user is the owner of the relation
(which should always be the case during referential integrity checks).
Back-patch to 9.5 where RLS was added.
Tom Lane [Sun, 4 Oct 2015 23:38:00 +0000 (19:38 -0400)]
Release notes for 9.5beta1, 9.4.5, 9.3.10, 9.2.14, 9.1.19, 9.0.23.
Tom Lane [Sun, 4 Oct 2015 21:58:30 +0000 (17:58 -0400)]
Improve contrib/pg_stat_statements' handling of garbage collection failure.
If we can't read the query texts file (whether because out-of-memory, or
for some other reason), give up and reset the file to empty, discarding all
stored query texts, though not the statistics per se. We used to leave
things alone and hope for better luck next time, but the problem is that
the file is only going to get bigger and even harder to slurp into memory.
Better to do something that will get us out of trouble.
Likewise reset the file to empty for any other failure within gc_qtexts().
The previous behavior after a write error was to discard query texts but
not do anything to truncate the file, which is just weird.
Also, increase the maximum supported file size from MaxAllocSize to
MaxAllocHugeSize; this makes it more likely we'll be able to do a garbage
collection successfully.
Also, fix recalculation of mean_query_len within entry_dealloc() to match
the calculation in gc_qtexts(). The previous coding overlooked the
possibility of dropped texts (query_len == -1) and would underestimate the
mean of the remaining entries in such cases, thus possibly causing excess
garbage collection cycles.
In passing, add some errdetail to the log entry that complains about
insufficient memory to read the query texts file, which after all was
Jim Nasby's original complaint.
Back-patch to 9.4 where the current handling of query texts was
introduced.
Peter Geoghegan, rather editorialized upon by me
Tom Lane [Sun, 4 Oct 2015 19:55:07 +0000 (15:55 -0400)]
Further twiddling of nodeHash.c hashtable sizing calculation.
On reflection, the submitted patch didn't really work to prevent the
request size from exceeding MaxAllocSize, because of the fact that we'd
happily round nbuckets up to the next power of 2 after we'd limited it to
max_pointers. The simplest way to enforce the limit correctly is to
round max_pointers down to a power of 2 when it isn't one already.
(Note that the constraint to INT_MAX / 2, if it were doing anything useful
at all, is properly applied after that.)
Tom Lane [Sun, 4 Oct 2015 18:06:40 +0000 (14:06 -0400)]
Fix some issues in new hashtable size calculations in nodeHash.c.
Limit the size of the hashtable pointer array to not more than
MaxAllocSize, per reports from Kouhei Kaigai and others of "invalid memory
alloc request size" failures. There was discussion of allowing the array
to get larger than that by using the "huge" palloc API, but so far no proof
that that is actually a good idea, and at this point in the 9.5 cycle major
changes from old behavior don't seem like the way to go.
Fix a rather serious secondary bug in the new code, which was that it
didn't ensure nbuckets remained a power of 2 when recomputing it for the
multiple-batch case.
Clean up sloppy division of labor between ExecHashIncreaseNumBuckets and
its sole call site.
Andrew Dunstan [Sun, 4 Oct 2015 17:28:16 +0000 (13:28 -0400)]
Disallow invalid path elements in jsonb_set
Null path elements and, where the object is an array, invalid integer
elements now cause an error.
Incorrect behaviour noted by Thom Brown, patch from Dmitry Dolgov.
Backpatch to 9.5 where jsonb_set was introduced
Peter Eisentraut [Sun, 4 Oct 2015 15:14:28 +0000 (11:14 -0400)]
Group cluster_name and update_process_title settings together
Noah Misch [Sun, 4 Oct 2015 00:20:22 +0000 (20:20 -0400)]
Document that row_security is a boolean GUC.
Oversight in commit
537bd178c73b1d25938347b17e9e3e62898fc231.
Back-patch to 9.5, like that commit.
Noah Misch [Sun, 4 Oct 2015 00:19:57 +0000 (20:19 -0400)]
Make BYPASSRLS behave like superuser RLS bypass.
Specifically, make its effect independent from the row_security GUC, and
make it affect permission checks pertinent to views the BYPASSRLS role
owns. The row_security GUC thereby ceases to change successful-query
behavior; it can only make a query fail with an error. Back-patch to
9.5, where BYPASSRLS was introduced.
Andres Freund [Sat, 3 Oct 2015 13:29:08 +0000 (15:29 +0200)]
Improve errhint() about replication slot naming restrictions.
The existing hint talked about "may only contain letters", but the
actual requirement is more strict: only lower case letters are allowed.
Reported-By: Rushabh Lathia
Author: Rushabh Lathia
Discussion: AGPqQf2x50qcwbYOBKzb4x75sO_V3g81ZsA8+Ji9iN5t_khFhQ@mail.gmail.com
Backpatch: 9.4-, where replication slots were added
Andres Freund [Sat, 3 Oct 2015 13:12:10 +0000 (15:12 +0200)]
Fix several bugs related to ON CONFLICT's EXCLUDED pseudo relation.
Four related issues:
1) attnos/varnos/resnos for EXCLUDED were out of sync when a column
after one dropped in the underlying relation was referenced.
2) References to whole-row variables (i.e. EXCLUDED.*) lead to errors.
3) It was possible to reference system columns in the EXCLUDED pseudo
relations, even though they would not have valid contents.
4) References to EXCLUDED were rewritten by the RLS machinery, as
EXCLUDED was treated as if it were the underlying relation.
To fix the first two issues, generate the excluded targetlist with
dropped columns in mind and add an entry for whole row
variables. Instead of unconditionally adding a wholerow entry we could
pull up the expression if needed, but doing it unconditionally seems
simpler. The wholerow entry is only really needed for ruleutils/EXPLAIN
support anyway.
The remaining two issues are addressed by changing the EXCLUDED RTE to
have relkind = composite. That fits with EXCLUDED not actually being a
real relation, and allows to treat it differently in the relevant
places. scanRTEForColumn now skips looking up system columns when the
RTE has a composite relkind; fireRIRrules() already had a corresponding
check, thereby preventing RLS expansion on EXCLUDED.
Also add tests for these issues, and improve a few comments around
excluded handling in setrefs.c.
Reported-By: Peter Geoghegan, Geoff Winkless
Author: Andres Freund, Amit Langote, Peter Geoghegan
Discussion: CAEzk6fdzJ3xYQZGbcuYM2rBd2BuDkUksmK=mY9UYYDugg_GgZg@mail.gmail.com,
CAM3SWZS+CauzbiCEcg-GdE6K6ycHE_Bz6Ksszy8AoixcMHOmsA@mail.gmail.com
Backpatch: 9.5, where ON CONFLICT was introduced
Peter Eisentraut [Sat, 3 Oct 2015 01:50:59 +0000 (21:50 -0400)]
doc: Update URLs of external projects
Peter Eisentraut [Sat, 3 Oct 2015 01:22:44 +0000 (21:22 -0400)]
doc: Make some index terms and terminology more consistent
Tom Lane [Fri, 2 Oct 2015 23:15:39 +0000 (19:15 -0400)]
Update time zone data files to tzdata release 2015g.
DST law changes in Cayman Islands, Fiji, Moldova, Morocco, Norfolk Island,
North Korea, Turkey, Uruguay. New zone America/Fort_Nelson for Canadian
Northern Rockies.
Robert Haas [Fri, 2 Oct 2015 20:55:47 +0000 (16:55 -0400)]
Clarify FDW documentation about ON CONFLICT.
Etsuro Fujita, reviewed by Peter Geoghegan
Tom Lane [Fri, 2 Oct 2015 19:00:52 +0000 (15:00 -0400)]
Add recursion depth protection to LIKE matching.
Since MatchText() recurses, it could in principle be driven to stack
overflow, although quite a long pattern would be needed.
Tom Lane [Fri, 2 Oct 2015 18:51:58 +0000 (14:51 -0400)]
Add recursion depth protections to regular expression matching.
Some of the functions in regex compilation and execution recurse, and
therefore could in principle be driven to stack overflow. The Tcl crew
has seen this happen in practice in duptraverse(), though their fix was
to put in a hard-wired limit on the number of recursive levels, which is
not too appetizing --- fortunately, we have enough infrastructure to check
the actually available stack. Greg Stark has also seen it in other places
while fuzz testing on a machine with limited stack space. Let's put guards
in to prevent crashes in all these places.
Since the regex code would leak memory if we simply threw elog(ERROR),
we have to introduce an API that checks for stack depth without throwing
such an error. Fortunately that's not difficult.
Tom Lane [Fri, 2 Oct 2015 18:26:36 +0000 (14:26 -0400)]
Fix potential infinite loop in regular expression execution.
In cfindloop(), if the initial call to shortest() reports that a
zero-length match is possible at the current search start point, but then
it is unable to construct any actual match to that, it'll just loop around
with the same start point, and thus make no progress. We need to force the
start point to be advanced. This is safe because the loop over "begin"
points has already tried and failed to match starting at "close", so there
is surely no need to try that again.
This bug was introduced in commit
e2bd904955e2221eddf01110b1f25002de2aaa83,
wherein we allowed continued searching after we'd run out of match
possibilities, but evidently failed to think hard enough about exactly
where we needed to search next.
Because of the way this code works, such a match failure is only possible
in the presence of backrefs --- otherwise, shortest()'s judgment that a
match is possible should always be correct. That probably explains how
come the bug has escaped detection for several years.
The actual fix is a one-liner, but I took the trouble to add/improve some
comments related to the loop logic.
After fixing that, the submitted test case "()*\1" didn't loop anymore.
But it reported failure, though it seems like it ought to match a
zero-length string; both Tcl and Perl think it does. That seems to be from
overenthusiastic optimization on my part when I rewrote the iteration match
logic in commit
173e29aa5deefd9e71c183583ba37805c8102a72: we can't just
"declare victory" for a zero-length match without bothering to set match
data for capturing parens inside the iterator node.
Per fuzz testing by Greg Stark. The first part of this is a bug in all
supported branches, and the second part is a bug since 9.2 where the
iteration rewrite happened.
Tom Lane [Fri, 2 Oct 2015 17:45:39 +0000 (13:45 -0400)]
Add some more query-cancel checks to regular expression matching.
Commit
9662143f0c35d64d7042fbeaf879df8f0b54be32 added infrastructure to
allow regular-expression operations to be terminated early in the event
of SIGINT etc. However, fuzz testing by Greg Stark disclosed that there
are still cases where regex compilation could run for a long time without
noticing a cancel request. Specifically, the fixempties() phase never
adds new states, only new arcs, so it doesn't hit the cancel check I'd put
in newstate(). Add one to newarc() as well to cover that.
Some experimentation of my own found that regex execution could also run
for a long time despite a pending cancel. We'd put a high-level cancel
check into cdissect(), but there was none inside the core text-matching
routines longest() and shortest(). Ordinarily those inner loops are very
very fast ... but in the presence of lookahead constraints, not so much.
As a compromise, stick a cancel check into the stateset cache-miss
function, which is enough to guarantee a cancel check at least once per
lookahead constraint test.
Making this work required more attention to error handling throughout the
regex executor. Henry Spencer had apparently originally intended longest()
and shortest() to be incapable of incurring errors while running, so
neither they nor their subroutines had well-defined error reporting
behaviors. However, that was already broken by the lookahead constraint
feature, since lacon() can surely suffer an out-of-memory failure ---
which, in the code as it stood, might never be reported to the user at all,
but just silently be treated as a non-match of the lookahead constraint.
Normalize all that by inserting explicit error tests as needed. I took the
opportunity to add some more comments to the code, too.
Back-patch to all supported branches, like the previous patch.
Tom Lane [Fri, 2 Oct 2015 17:30:42 +0000 (13:30 -0400)]
Docs: add disclaimer about hazards of using regexps from untrusted sources.
It's not terribly hard to devise regular expressions that take large
amounts of time and/or memory to process. Recent testing by Greg Stark has
also shown that machines with small stack limits can be driven to stack
overflow by suitably crafted regexps. While we intend to fix these things
as much as possible, it's probably impossible to eliminate slow-execution
cases altogether. In any case we don't want to treat such things as
security issues. The history of that code should already discourage
prudent DBAs from allowing execution of regexp patterns coming from
possibly-hostile sources, but it seems like a good idea to warn about the
hazard explicitly.
Currently, similar_escape() allows access to enough of the underlying
regexp behavior that the warning has to apply to SIMILAR TO as well.
We might be able to make it safer if we tightened things up to allow only
SQL-mandated capabilities in SIMILAR TO; but that would be a subtly
non-backwards-compatible change, so it requires discussion and probably
could not be back-patched.
Per discussion among pgsql-security list.
Tom Lane [Fri, 2 Oct 2015 16:20:01 +0000 (12:20 -0400)]
Docs: add another example of creating a range type.
The "floatrange" example is a bit too simple because float8mi can be
used without any additional type conversion. Add an example that does
have to account for that, and do some minor other wordsmithing.
Alvaro Herrera [Fri, 2 Oct 2015 15:49:01 +0000 (12:49 -0300)]
Don't disable commit_ts in standby if enabled locally
Bug noticed by Fujii Masao
Peter Eisentraut [Fri, 2 Oct 2015 01:42:00 +0000 (21:42 -0400)]
pg_rewind: Improve some messages
The output of a typical pg_rewind run contained a mix of capitalized and
not-capitalized and punctuated and not-punctuated phrases for no
apparent reason. Make that consistent. Also fix some problems in other
messages.
Peter Eisentraut [Fri, 2 Oct 2015 01:39:56 +0000 (21:39 -0400)]
Fix message punctuation according to style guide
Tom Lane [Thu, 1 Oct 2015 20:19:49 +0000 (16:19 -0400)]
Fix pg_dump to handle inherited NOT VALID check constraints correctly.
This case seems to have been overlooked when unvalidated check constraints
were introduced, in 9.2. The code would attempt to dump such constraints
over again for each child table, even though adding them to the parent
table is sufficient.
In 9.2 and 9.3, also fix contrib/pg_upgrade/Makefile so that the "make
clean" target fully cleans up after a failed test. This evidently got
dealt with at some point in 9.4, but it wasn't back-patched. I ran into
it while testing this fix ...
Per bug #13656 from Ingmar Brouns.
Alvaro Herrera [Thu, 1 Oct 2015 18:06:55 +0000 (15:06 -0300)]
Fix commit_ts for standby
Module initialization was still not completely correct after commit
6b61955135e9, per crash report from Takashi Ohnishi. To fix, instead of
trying to monkey around with the value of the GUC setting directly, add
a separate boolean flag that enables the feature on a standby, but only
for the startup (recovery) process, when it sees that its master server
has the feature enabled.
Discussion: http://www.postgresql.org/message-id/
ca44c6c7f9314868bdc521aea4f77cbf@MP-MSGSS-MBX004.msg.nttdata.co.jp
Also change the deactivation routine to delete all segment files rather
than leaving the last one around. (This doesn't need separate
WAL-logging, because on recovery we execute the same deactivation
routine anyway.)
In passing, clean up the code structure somewhat, particularly so that
xlog.c doesn't know so much about when to activate/deactivate the
feature.
Thanks to Fujii Masao for testing and Petr Jelínek for off-list discussion.
Back-patch to 9.5, where commit_ts was introduced.
Tom Lane [Thu, 1 Oct 2015 14:31:22 +0000 (10:31 -0400)]
Fix documentation error in commit
8703059c6b55c427100e00a09f66534b6ccbfaa1.
Etsuro Fujita spotted a thinko in the README commentary.
Fujii Masao [Thu, 1 Oct 2015 14:00:52 +0000 (23:00 +0900)]
Fix mention of htup.h in storage.sgml
Previously it was documented that the details on HeapTupleHeaderData
struct could be found in htup.h. This is not correct because it's now
defined in htup_details.h.
Back-patch to 9.3 where the definition of HeapTupleHeaderData struct
was moved from htup.h to htup_details.h.
Michael Paquier
Pavan Deolasee [Thu, 1 Oct 2015 10:36:45 +0000 (16:06 +0530)]
Install missing pgxc header files properly.
Looks like we messed up this during the merge process. Per report from David
Rowley
Tom Lane [Thu, 1 Oct 2015 03:32:23 +0000 (23:32 -0400)]
Improve LISTEN startup time when there are many unread notifications.
If some existing listener is far behind, incoming new listener sessions
would start from that session's read pointer and then need to advance over
many already-committed notification messages, which they have no interest
in. This was expensive in itself and also thrashed the pg_notify SLRU
buffers a lot more than necessary. We can improve matters considerably
in typical scenarios, without much added cost, by starting from the
furthest-ahead read pointer, not the furthest-behind one. We do have to
consider only sessions in our own database when doing this, which requires
an extra field in the data structure, but that's a pretty small cost.
Back-patch to 9.0 where the current LISTEN/NOTIFY logic was introduced.
Matt Newell, slightly adjusted by me
Robert Haas [Wed, 30 Sep 2015 22:36:31 +0000 (18:36 -0400)]
Don't dump core when destroying an unused ParallelContext.
If a transaction or subtransaction creates a ParallelContext but ends
without calling InitializeParallelDSM, the previous code would
seg fault. Fix that.
Stephen Frost [Wed, 30 Sep 2015 11:39:24 +0000 (07:39 -0400)]
Include policies based on ACLs needed
When considering which policies should be included, rather than look at
individual bits of the query (eg: if a RETURNING clause exists, or if a
WHERE clause exists which is referencing the table, or if it's a
FOR SHARE/UPDATE query), consider any case where we've determined
the user needs SELECT rights on the relation while doing an UPDATE or
DELETE to be a case where we apply SELECT policies, and any case where
we've deteremind that the user needs UPDATE rights on the relation while
doing a SELECT to be a case where we apply UPDATE policies.
This simplifies the logic and addresses concerns that a user could use
UPDATE or DELETE with a WHERE clauses to determine if rows exist, or
they could use SELECT .. FOR UPDATE to lock rows which they are not
actually allowed to modify through UPDATE policies.
Use list_append_unique() to avoid adding the same quals multiple times,
as, on balance, the cost of checking when adding the quals will almost
always be cheaper than keeping them and doing busywork for each tuple
during execution.
Back-patch to 9.5 where RLS was added.
Tatsuo Ishii [Wed, 30 Sep 2015 01:36:23 +0000 (10:36 +0900)]
Fix incorrect tps number calculation in "excluding connections establishing".
The tolerance (larger than actual tps number) increases as the number
of threads decreases. The bug has been there since the thread support
was introduced in 9.0. Because back patching introduces incompatible
behavior changes regarding the tps number, the fix is committed to
master and 9.5 stable branches only.
Problem spotted by me and fix proposed by Fabien COELHO. Note that his
original patch included more than fixes (a code re-factoring) which is
not related to the problem and I omitted the part.
Alvaro Herrera [Tue, 29 Sep 2015 17:40:56 +0000 (14:40 -0300)]
Code review for transaction commit timestamps
There are three main changes here:
1. No longer cause a start failure in a standby if the feature is
disabled in postgresql.conf but enabled in the master. This reverts one
part of commit
4f3924d9cd43; what we keep is the ability of the standby
to activate/deactivate the module (which includes creating and removing
segments as appropriate) during replay of such actions in the master.
2. Replay WAL records affecting commitTS even if the feature is
disabled. This means the standby will always have the same state as the
master after replay.
3. Have COMMIT PREPARE record the transaction commit time as well. We
were previously only applying it in the normal transaction commit path.
Author: Petr Jelínek
Discussion: http://www.postgresql.org/message-id/CAHGQGwHereDzzzmfxEBYcVQu3oZv6vZcgu1TPeERWbDc+gQ06g@mail.gmail.com
Discussion: http://www.postgresql.org/message-id/CAHGQGwFuzfO4JscM9LCAmCDCxp_MfLvN4QdB+xWsS-FijbjTYQ@mail.gmail.com
Additionally, I cleaned up nearby code related to replication origins,
which I found a bit hard to follow, and fixed a couple of typos.
Backpatch to 9.5, where this code was introduced.
Per bug reports from Fujii Masao and subsequent discussion.
Tom Lane [Tue, 29 Sep 2015 14:52:22 +0000 (10:52 -0400)]
Fix plperl to handle non-ASCII error message texts correctly.
We were passing error message texts to croak() verbatim, which turns out
not to work if the text contains non-ASCII characters; Perl mangles their
encoding, as reported in bug #13638 from Michal Leinweber. To fix, convert
the text into a UTF8-encoded SV first.
It's hard to test this without risking failures in different database
encodings; but we can follow the lead of plpython, which is already
assuming that no-break space (U+00A0) has an equivalent in all encodings
we care about running the regression tests in (cf commit
2dfa15de5).
Back-patch to 9.1. The code is quite different in 9.0, and anyway it seems
too risky to put something like this into 9.0's final minor release.
Alex Hunsaker, with suggestions from Tim Bunce and Tom Lane
Robert Haas [Tue, 29 Sep 2015 11:42:30 +0000 (07:42 -0400)]
Comment update for join pushdown.
Etsuro Fujita
Andrew Dunstan [Mon, 28 Sep 2015 22:42:30 +0000 (18:42 -0400)]
Fix compiler warning for non-TIOCGWINSZ case
Backpatch to 9.5 where the error was introduced.
Andrew Dunstan [Mon, 28 Sep 2015 22:29:20 +0000 (18:29 -0400)]
Fix compiler warning about unused function in non-readline case.
Backpatch to all live branches to keep the code in sync.
Alvaro Herrera [Mon, 28 Sep 2015 22:13:42 +0000 (19:13 -0300)]
Fix "sesssion" typo
It was introduced alongside replication origins, by commit
5aa2350426c, so backpatch to 9.5.
Pointed out by Fujii Masao
Tom Lane [Mon, 28 Sep 2015 22:02:38 +0000 (18:02 -0400)]
Fix poor errno handling in libpq's version of our custom OpenSSL BIO.
Thom Brown reported that SSL connections didn't seem to work on Windows in
9.5. Asif Naeem figured out that the cause was my_sock_read() looking at
"errno" when it needs to look at "SOCK_ERRNO". This mistake was introduced
in commit
680513ab79c7e12e402a2aad7921b95a25a4bcc8, which cloned the
backend's custom SSL BIO code into libpq, and didn't translate the errno
handling properly. Moreover, it introduced unnecessary errno save/restore
logic, which was particularly confusing because it was incomplete; and it
failed to check for all three of EINTR, EAGAIN, and EWOULDBLOCK in
my_sock_write. (That might not be necessary; but since we're copying
well-tested backend code that does do that, it seems prudent to copy it
faithfully.)