git.postgresql.org Git - postgres-xl.git/log

projects / postgres-xl.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Pallavi Sontakke [Tue, 2 Feb 2016 09:17:30 +0000 (14:47 +0530)]

Test output and sql changes

Change comments on FQS of query as per behavior #59
Separate out an issue #67 from xc_copy

commit | commitdiff | tree

Pavan Deolasee [Mon, 1 Feb 2016 10:55:50 +0000 (11:55 +0100)]

Fix collate regression case by removing an ORDER BY (added in XL) which seems
to produce different results on different platforms.

Expected output adjusted accordingly

commit | commitdiff | tree

Pavan Deolasee [Mon, 1 Feb 2016 09:29:33 +0000 (10:29 +0100)]

Change expected output for 'transactions' test case which seems to be working
fine after recent bug fixes.

commit | commitdiff | tree

Pavan Deolasee [Mon, 1 Feb 2016 09:19:37 +0000 (10:19 +0100)]

Exponetially increase the sleep before retrying commit on the GTM.

When a transaction waits for another transaction to complete, we enforce the
same ordering on the same GTM by making the former transaction wait on the
latter. We do this by a simple retry logic. While the latter transaction should
ideally finish soon because it has already finished on the datanode, the retry
loop now waits exponentially, starting at 1000usec but limited by 1s.

Patch by Mason Sharp

commit | commitdiff | tree

Pavan Deolasee [Sun, 31 Jan 2016 08:25:01 +0000 (13:55 +0530)]

Do not throw a FATAL error when SharedQ producer times out while waiting for
one or more consumers to finish.

We have seen bunch of cases where a consumer may never bind to the SharedQ and
rightfully so. For example, in a join between 3 tables which requires
redistribution of tuples, a consumer may not at all bind to the SharedQ because
it the top level outer side did not produce any tuples to join against the
redistributed inner node.

This patch avoids the unnecessary FATAL errors, but what we still do not do
nicely is to avoid the 10s timeout (as currently set for producer). So while
queries, as included in the test case, will finally return success, it will
unnecessarily add a 10s delay in the response time. This is a TODO.

commit | commitdiff | tree

Pavan Deolasee [Sun, 31 Jan 2016 07:49:53 +0000 (13:19 +0530)]

Remove a WARNING about coordinator provided snapshot not available.

There are a few expected cases, such as catalog scans, where coordinator
supplied snapshot may not be available. So remove this mis-leading warning.

commit | commitdiff | tree

Pavan Deolasee [Fri, 29 Jan 2016 09:16:10 +0000 (14:46 +0530)]

Fix a bug where queries would incorrectly get executed on the coordinator.

Report by Krzysztof Nienartowicz, patch by me.

commit | commitdiff | tree

Pavan Deolasee [Fri, 29 Jan 2016 09:13:52 +0000 (14:43 +0530)]

Do not hold the XidGenLock while obtaining an XID from the GTM

This was an oversight when on-demand GXID work was committed. Mason reported
that this patch significantly improves performance on his tests and I can
also confirm that with my own tests.

Report and patch by Mason Sharp, with some changes from me.

commit | commitdiff | tree

Pallavi Sontakke [Mon, 1 Feb 2016 08:18:19 +0000 (13:48 +0530)]

Test output and sql changes

Fix 2 tests. Partial fix for xc_copy diffs.

commit | commitdiff | tree

Pavan Deolasee [Thu, 28 Jan 2016 12:50:32 +0000 (18:20 +0530)]

Avoid redefinition of a signal handler signature

commit | commitdiff | tree

Pavan Deolasee [Thu, 28 Jan 2016 10:27:54 +0000 (15:57 +0530)]

Do not override the sequence_range setting in COPY

The default value of this parameter has now been hiked to 1000. So there is no
good reason to override this in COPY, if the user has explicitly set it back to
1. Honor user defined value in all cases.

We can possibly flag a warning if sequences are being incremened too fast and
the current value of sequence_range is set too low. But no compelling need to
do that just now.

commit | commitdiff | tree

Pallavi Sontakke [Fri, 29 Jan 2016 05:41:32 +0000 (11:11 +0530)]

Test output and sql changes

Accept outputs, with some known bugs added.

commit | commitdiff | tree

Pallavi Sontakke [Thu, 28 Jan 2016 06:02:42 +0000 (11:32 +0530)]

Test sql and output changes

Comment out known issues and add them to kown bugs.

commit | commitdiff | tree

Pallavi Sontakke [Wed, 27 Jan 2016 13:45:24 +0000 (19:15 +0530)]

Test output changes

Fix 2 tests.

commit | commitdiff | tree

Pavan Deolasee [Wed, 27 Jan 2016 12:03:07 +0000 (17:33 +0530)]

Change expected output - can't do FQS for a GROUP BY query on a roundrobin
table

commit | commitdiff | tree

Pavan Deolasee [Wed, 27 Jan 2016 11:53:25 +0000 (17:23 +0530)]

Fix a bug where a query was getting incorrectly FQSed when the GROUP BY clause
contains only non-distribution keys.

Also rerun all the xc_groupby tests with enable_fast_query_shipping ON so that
similar issues can be caught more easily

commit | commitdiff | tree

Pavan Deolasee [Wed, 27 Jan 2016 04:17:20 +0000 (09:47 +0530)]

Fix a protocol message to register the GTM proxy

commit | commitdiff | tree

Pavan Deolasee [Wed, 27 Jan 2016 02:41:07 +0000 (08:11 +0530)]

Fix compiler warnings in stormstats, some of which may also cause failures on
other platforms.

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 16:46:41 +0000 (22:16 +0530)]

Add missing DISTRIBUTE RANDOMLY to the specs in the documentation

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 14:09:41 +0000 (19:39 +0530)]

Change expected output for select_views test case.

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 12:11:55 +0000 (17:41 +0530)]

Change some part of expected output file for test case rowsecurity, related to
XL plan changes.

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 11:41:59 +0000 (17:11 +0530)]

Change expected output for prepared_xacts test case to accept newer changes
from PG

commit | commitdiff | tree

Pallavi Sontakke [Mon, 25 Jan 2016 11:14:30 +0000 (16:44 +0530)]

Test output changes

Accept some XL outputs for tests.

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 10:55:16 +0000 (16:25 +0530)]

Avoid using EXPLAIN VERBOSE when temp tables are involved in a test case.

In XL, the temporary schema may change in different regression runs. Hence we
must not print the schema in expected output. Change the sql as well as
expected output file for the 'join' test case.

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 10:34:33 +0000 (16:04 +0530)]

Change expected output for test case xc_alter_table which matches current
behaviour

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 10:22:51 +0000 (15:52 +0530)]

Change expected output for test case xc_FQS_join now that queries are properly
FQSed to the remote (via 1083af3fa)

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 10:18:08 +0000 (15:48 +0530)]

Change expected output for test case xc_having now that we support
enable_fast_query_shipping GUC

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 10:09:59 +0000 (15:39 +0530)]

Change expected output for xc_groupby test case now that
enable_fast_query_shipping GUC is supported and the entire test case runs with
the GUC turned off

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 09:32:34 +0000 (15:02 +0530)]

Fix a bug where we will pick up random nodes for individual replicated tables,
thus reducing the chances of query getting fully shipped to the remote node.

We now remember all nodes that can satify a READ request for a replicated table
and then finally choose a node randomly if no preferred datanode is specified.
This will avoid non-deterministic selection of FQS query plans as well as allow
us to send some more queries to the remote node.

commit | commitdiff | tree

Pavan Deolasee [Mon, 25 Jan 2016 09:31:20 +0000 (15:01 +0530)]

Add enable_fast_query_shipping GUC to control whether to attempt Fast Query
Shipping to the remote node or not

Its primary use is for debugging purposes

commit | commitdiff | tree

Pallavi Sontakke [Mon, 25 Jan 2016 07:34:38 +0000 (13:04 +0530)]

Test output and sql changes

Fix rowtypes with nodes off in query plan
Simplify diffs seen in ome xc_* tests.

commit | commitdiff | tree

Pallavi Sontakke [Sat, 23 Jan 2016 13:10:58 +0000 (18:40 +0530)]

Test output and sql changes

Fixes 6 tests.

Mainly accepts known limitations and uses 'nodes off'
for query plan.

commit | commitdiff | tree

Pavan Deolasee [Fri, 22 Jan 2016 14:06:38 +0000 (19:36 +0530)]

Further improvements to release notes

commit | commitdiff | tree

Pavan Deolasee [Fri, 22 Jan 2016 11:54:27 +0000 (17:24 +0530)]

The XLOG dirs in the pgxc_ctl conf file are optional

commit | commitdiff | tree

Pavan Deolasee [Fri, 22 Jan 2016 09:14:37 +0000 (14:44 +0530)]

Add a changes missed in the previous commit.

commit | commitdiff | tree

Pavan Deolasee [Fri, 22 Jan 2016 08:35:19 +0000 (14:05 +0530)]

Add support to specify separate XLOG dirs for datanode masters and datanode
slaves in pgxc_ctl.conf file as well as corresponding "add" commands.

Recent releases of Postgres now allow users to specify a separate XLOG dir and
initdb time and we extend the same facility to pgxc_ctl.

commit | commitdiff | tree

Pavan Deolasee [Fri, 22 Jan 2016 04:21:37 +0000 (09:51 +0530)]

Change expected output for updatable_views test case by not printing node
information

This makes test output more deterministic. The test case still does not pass
because of an additional diff that requires further analysis.

commit | commitdiff | tree

Pavan Deolasee [Fri, 22 Jan 2016 03:36:47 +0000 (09:06 +0530)]

Set sequence_range to 1 while initialising nodes for "make check" so that
deterministic output is obtained.

This avoids explicit setting of the GUC in every test case that uses serials or
sequences. For those running "make installcheck" they must set it up correctly
in their postgresql.conf files.

We could revert some of the changes done to the test cases (by adding
sequence_range = 1), but this patch does not do that

commit | commitdiff | tree

Pavan Deolasee [Fri, 22 Jan 2016 03:17:32 +0000 (08:47 +0530)]

Change expected output for create_view test case.

We expect to see a Remote Subquery Scan node in the explain plan.

commit | commitdiff | tree

Pavan Deolasee [Fri, 22 Jan 2016 03:12:04 +0000 (08:42 +0530)]

Change expected output for timestamp/timestamptz tests

Looks like they were mistakenly changed while a bug existed. Now that the bug
is fixed, they are giving the correct output, matching PG's results.

commit | commitdiff | tree

Pavan Deolasee [Fri, 22 Jan 2016 02:49:51 +0000 (08:19 +0530)]

Recheck health of a node before changing its status.

send/recv() errors just give us a hint about something going wrong with a node.
But a mere send/recv failure does not mean that the node is down or
unreachable. So before changing the health status, ping the node once and
confirm its health status.

commit | commitdiff | tree

Pavan Deolasee [Thu, 21 Jan 2016 13:58:27 +0000 (19:28 +0530)]

Assign coordinator local timestamp when transaction timestamp is requested
before or without assigning XID to the transaction.

GTM supplied timestamp is available only when a transaction has started on the
GTM. But for read-only transactions or transactions which are yet to do any
database write activity, we avoid going to GTM for performance reason. For such
cases, use coordinator local timestamp and continue to use the same for rest of
the transaction.

commit | commitdiff | tree

Pavan Deolasee [Thu, 21 Jan 2016 13:56:56 +0000 (19:26 +0530)]

First draft of release notes.

commit | commitdiff | tree

Simon Riggs [Thu, 21 Jan 2016 02:40:44 +0000 (18:40 -0800)]

Speedup 2PC by skipping two phase state files in normal path

2PC state info is written only to WAL at PREPARE, then read back from WAL at
COMMIT PREPARED/ABORT PREPARED. Prepared transactions that live past one bufmgr
checkpoint cycle will be written to disk in the same form as previously. Crash
recovery path is not altered. Measured performance gains of 50-100% for short
2PC transactions by completely avoiding writing files and fsyncing. Other
optimizations still available, further patches in related areas expected.

Stas Kelvich and heavily edited by Simon Riggs

Based upon earlier ideas and patches by Michael Paquier and Heikki Linnakangas,
a concrete example of how Postgres-XC has fed back ideas into PostgreSQL.

Reviewed by Michael Paquier, Jeff Janes and Andres Freund
Performance testing by Jesper Pedersen

commit | commitdiff | tree

Simon Riggs [Thu, 21 Jan 2016 01:18:58 +0000 (17:18 -0800)]

Refactor to create generic WAL page read callback

Previously we didn’t have a generic WAL page read callback function,
surprisingly. Logical decoding has logical_read_local_xlog_page(), which was
actually generic, so move that to xlogfunc.c and rename to
read_local_xlog_page().
Maintain logical_read_local_xlog_page() so existing callers still work.

As requested by Michael Paquier, Alvaro Herrera and Andres Freund

commit | commitdiff | tree

Pavan Deolasee [Wed, 20 Jan 2016 12:41:56 +0000 (18:11 +0530)]

Merge upto commit 'cdd4ed5449bf317cc71b45a8deee0173822e7592' which corresponds
to 9.5.0 release of PostgreSQL

commit | commitdiff | tree

Pallavi Sontakke [Tue, 19 Jan 2016 11:33:52 +0000 (17:03 +0530)]

Test output and sql changes.

Fixes 9 tests.

Reasons:

set sequence_range as 1 for sequences and datatype
to work fine,
run advisory_lock independently in parallel
schedule, as XL uses them internally too,
no support in XL to refresh materialized views
concurrently.

commit | commitdiff | tree

Pavan Deolasee [Tue, 19 Jan 2016 10:50:40 +0000 (16:20 +0530)]

We don't yet support persistent connections between datanodes

commit | commitdiff | tree

Pavan Deolasee [Tue, 19 Jan 2016 08:20:02 +0000 (13:50 +0530)]

Fix an oversight in the previous commit

commit | commitdiff | tree

Pavan Deolasee [Tue, 19 Jan 2016 03:36:04 +0000 (09:06 +0530)]

Fix various potential buffer overflows which got exposed after we recently
increased GIDSIZE

Per report by Tobias Oberstein

commit | commitdiff | tree

Pavan Deolasee [Tue, 19 Jan 2016 03:29:02 +0000 (08:59 +0530)]

Use XID passed down by the remote datanode, if its available, when
GetCurrentTransactionIdIfAny() is called.

The regular GetCurrentTransactionId() does the same, without actually storing
the XID in the transaction state. We don't store the XID when the connection is
from a datanode, to ensure that a duplicate XID is not stored. There are other
technique employed to ensure multiple backends can participate in the same
global transaction.

This bug was causing wrong results when a query needs datanode-datanode
connections and updates are made previously in the same transaction, via a
different connection

Per report by Arun Shaji

commit | commitdiff | tree

Pallavi Sontakke [Mon, 18 Jan 2016 08:56:18 +0000 (14:26 +0530)]

Accept XL test output.

Fixes 17 tests.

Reasons:

timestamp Issue, some ERROR string differences,
remote subquery plan differences.

No support in XL for: TRIGGERS, correlated UPDATE,
SAVEPOINT, WHERE CURRENT OF, internal subtransactions,
complicated SELECT queries in plpgsql functions,
distribution column in child table to refer to non
distribution column in referenced table,
ORDER BY in subqueries, FOREIGN DATA WRAPPER, SERVER,
USER MAPPING

commit | commitdiff | tree

Pavan Deolasee [Sun, 17 Jan 2016 14:01:01 +0000 (19:31 +0530)]

WAL log only the actual GID instead of the entire GIDSIZE data

While GIDSIZE is set to quite high (actually, in XL its even more than 200
bytes currently defined in PG), in practice, the GID will be much smaller in
length. So instead of WAL logging the entire GIDSIZE data, we only log the
actual GID string. This shows considerable improvement for XL

commit | commitdiff | tree

Pavan Deolasee [Fri, 15 Jan 2016 17:58:58 +0000 (23:28 +0530)]

Remove an unnecessary newline which may cause conflicts with upstream patches

commit | commitdiff | tree

Pavan Deolasee [Fri, 15 Jan 2016 17:44:54 +0000 (23:14 +0530)]

Remove an unintentional stderr message mistakenly committed
in 8a519fbc16bedd

commit | commitdiff | tree

Pavan Deolasee [Fri, 15 Jan 2016 10:34:10 +0000 (16:04 +0530)]

Support yet another syntax for specifying distribution strategy for
a table

DISTSTYLE KEY DISTKEY (col) maps to DISTRIBUTE BY HASH (col)
DISTSTYLE EVEN maps to DISTRIBUTE BY ROUNDROBIN
DISTSTYLE ALL maps to DISTRIBUTE BY REPLICATION

commit | commitdiff | tree

Pavan Deolasee [Fri, 15 Jan 2016 07:58:12 +0000 (13:28 +0530)]

Support additional synax for choosing table distribution strategy

DISTRIBUTED BY (col) maps to DISTRIBUTE BY HASH (col)
DISTRIBUTED RANDOMLY maps to DISTRIBUTE BY ROUNDROBIN

commit | commitdiff | tree

Pavan Deolasee [Fri, 15 Jan 2016 07:15:54 +0000 (12:45 +0530)]

Check if target directory is empty (if it already exists) on the remote node.

We were wrongly doing this check on the local node, which was clearly wrong.

commit | commitdiff | tree

Pavan Deolasee [Thu, 14 Jan 2016 18:26:06 +0000 (23:56 +0530)]

Introduce a healthmap for tracking health status of all other nodes in the
cluster.

Each node now maintains a healthmap in shared memory to keep track of
availability of all other nodes in the cluster. Whenever a send() or a
receive() call on a socket fails, we try to ping the node once and if that
fails, we mark the node as UNHEALTHY. On the other hand, if later a new
connection is established successfully to a node, we mark the node as HEALTHY.
We also periodically try to ping UNHEALTHY nodes to see if they have come back
to life and update the healthmap accordingly.

The only consumer of a healthmap right now is SELECT queries on replicated
tables. When a table is replicated to more than one node, we now consult the
healthmap and discards nodes that are known to be UNHEALTHY. If all nodes are
found to be UNHEALTHY, one attempt is made to see if any of them have come back
online.

commit | commitdiff | tree

Pavan Deolasee [Thu, 14 Jan 2016 17:40:10 +0000 (23:10 +0530)]

Fix "make check" so that it now sets up a 2-coordinator, 2-datanode XL cluster
and runs a parallel schedule

There were different problems with the way various components were being set
up, including specifying a separate directory to create domain sockets. The
names used for datanodes were also different than what the regression expected
output has. This patch fixes that too

commit | commitdiff | tree

Pallavi Sontakke [Wed, 13 Jan 2016 12:04:55 +0000 (17:34 +0530)]

Test output, sql changes

exclude query plan on replicated tables - join
exclude complex queries with 'union all' - equivclass
set sequence_range for nextval() to work fine - xl_functions
accept some FQS format changes in output

commit | commitdiff | tree

Pavan Deolasee [Wed, 13 Jan 2016 04:50:39 +0000 (10:20 +0530)]

Select a node randomly from a list of available nodes for reading from
replicated tables

This fixes a bug in FQS logic where it would always pick up the same node for
reading

commit | commitdiff | tree

Pavan Deolasee [Tue, 12 Jan 2016 09:16:51 +0000 (14:46 +0530)]

Send out a relcache inval for the relation when its distribution key or
strategy changes.

Without this backends may not reload their relcache entries and thus keep
looking at the old and stale distribution information

commit | commitdiff | tree

Pavan Deolasee [Tue, 12 Jan 2016 03:54:46 +0000 (09:24 +0530)]

Do not use FQS for queries with FOR UPDATE/SHARE clause

commit | commitdiff | tree

Pavan Deolasee [Mon, 11 Jan 2016 16:05:28 +0000 (21:35 +0530)]

Send XID assigned on a datanode back to the coordinator.

We'd missed out a case where XIDs are assigned on a datanode, but the same is
not sent back to the coordinator. We now also avoid running a 2PC for
transactions which do not have XID assigned to them since such transactions
must not have made any database changes

commit | commitdiff | tree

Pallavi Sontakke [Mon, 11 Jan 2016 10:21:28 +0000 (15:51 +0530)]

Test, sql changes for insert_conflict

Accept ERROR for unique index on non-distribution column.
Remove tableoid call, with bug reference.

commit | commitdiff | tree

Pavan Deolasee [Sat, 9 Jan 2016 13:20:11 +0000 (18:50 +0530)]

For FQSed query, run EXPLAIN on the remote node and print the result

We recently added fast-query-shipping facility to Postgres-XL. But it wouldn't
print the query plan used on the datanodes, which makes it extrememly difficult
to debug bad query plan with EXPLAIN. We now send an EXPLAIN query to one of
the datanodes and print the result. This assumes that all datanodes would plan
queries similarly, assuming a good data distribution.

We also modified the EXPLAIN output for FQSed queries to match is as closely as
possible with the Remote Subplan queries.

commit | commitdiff | tree

Pavan Deolasee [Fri, 8 Jan 2016 17:39:36 +0000 (23:09 +0530)]

Fix misc issues with two-phase commit protocol and cleaning up of outstanding
transactions

When a two-phase commit protocol is interrupted mid-way, for example because of
a server crash, it can leave behind unresolved prepared transactions which must
be resolved when the node comes back. Postgres-XL provides a pgxc_clean utility
to lookup list of prepared transactions and resolve them based on status of
such a transaction on every node. But there were many issues with the utility
because of which it would either fail to resolve all transactions, or worse
resolve it in a wrong manner. This commit fixes all such issues discovered
during some simple crash recovery testing.

One of the problem with the current approach was that the utility would not
know which all nodes were involved in a transaction. So if it sees a
transaction as prepared on a subset of nodes, but does not exist on other
subset, it would not know if originally it was executed on nodes other than
where its prepared, but other nodes failed before they could prepare the
transaction. If it was indeed executed on other nodes, such transaction must be
aborted. Whereas if it was only executed on the set of nodes where its
currently prepared, then it can safely be committed.

We now store the information about nodes partcipating in a 2PC directly in the
GID. This of course has a downside of increasing the GIDSIZE which implies for
shared memory requirement. But with today's server sizes, it should not be a
very big concern. Sure, we could also look at possibility of storing this
information externally, such as on the GTM. But the fix seems good enough for
now.

commit | commitdiff | tree

Pallavi Sontakke [Fri, 8 Jan 2016 11:41:51 +0000 (17:11 +0530)]

Test changes for insert_conflict

Accept output changes - we now avoid an additional local connection.

commit | commitdiff | tree

Pavan Deolasee [Fri, 8 Jan 2016 04:21:30 +0000 (09:51 +0530)]

Fix some protocol issues between GTM and GTM-standby.

commit | commitdiff | tree

Robert Haas [Thu, 12 Nov 2015 14:00:33 +0000 (09:00 -0500)]

Make idle backends exit if the postmaster dies.

Letting backends continue to run if the postmaster has exited prevents
PostgreSQL from being restarted, which in many environments is
catastrophic.  Worse, if some other backend crashes, we no longer have
any protection against shared memory corruption.  So, arrange for them
to exit instead.  We don't want to expend many cycles on this, but
including postmaster death in the set of things that we wait for when
a backend is idle seems cheap enough.

Rajeev Rastogi and Robert Haas

commit | commitdiff | tree

Pallavi Sontakke [Thu, 7 Jan 2016 08:49:16 +0000 (14:19 +0530)]

Test changes

Changes related to expected output for XL tests, concerning
insensitive cursor, recursive queries, grouping sets, literal
constants autocast.

commit | commitdiff | tree

Pavan Deolasee [Wed, 6 Jan 2016 13:02:06 +0000 (18:32 +0530)]

End global transaction on the GTM before releasing locks.

Since other backends could be waiting for locks to be released and they must
see end of the transaction, not just locally but also on the GTM (say because
it takes a new snapshot from the GTM, especially for catalog scans). There are
some concerns about doing it the way its done in this patch because we now
exchange messages with the GTM while holding interrupts. But we don't know if
thats really a problem

commit | commitdiff | tree

Pavan Deolasee [Wed, 6 Jan 2016 12:50:08 +0000 (18:20 +0530)]

Bump default value for sequence_range to 1000.

This shows good improvement for workloads which repeatedly asks for sequence
values. We already override the sequence_range value to 1000 for COPY and it
makes a lot of sense to do the same for INSERTs

commit | commitdiff | tree

Pavan Deolasee [Wed, 6 Jan 2016 06:54:23 +0000 (12:24 +0530)]

Correctly handle RESPONSE_WAITXIDS response from a datanode

commit | commitdiff | tree

Pavan Deolasee [Tue, 5 Jan 2016 07:20:32 +0000 (12:50 +0530)]

Ensure commit ordering at the GTM when a transaction's update/delete operation
is based on some other transaction's commit

We had handled one part of this problem by recording transactions on which we
wait before proceeding with update/delete. But there is another case where an
updating transaction T1 may commit on the datanode, but before coordinator can
commit the transaction on the GTM, another transaction T2 updates the record
(seeing that the updating transaction is already committed) and also commits on
the GTM. Now if a third transaction T3 takes a snapshot, it will see T1 as
running and T2 as committed. Such a snapshot can then see both old and new
versions of the updated tuple. So we must enforce commit ordering T1->T2 on the
GTM since T2 based its actions on T1 being committed

commit | commitdiff | tree

Tom Lane [Mon, 4 Jan 2016 21:29:34 +0000 (16:29 -0500)]

Stamp 9.5.0.

commit | commitdiff | tree

Tom Lane [Mon, 4 Jan 2016 20:11:44 +0000 (15:11 -0500)]

Docs: provide a concrete discussion and example for RLS race conditions.

Commit 43cd468cf01007f3 added some wording to create_policy.sgml purporting
to warn users against a race condition of the sort that had been noted some
time ago by Peter Geoghegan. However, that warning was far too vague to be
useful (or at least, I completely failed to grasp what it was on about).
Since the problem case occurs with a security design pattern that lots of
people are likely to try to use, we need to be as clear as possible about
it. Provide a concrete example in the main-line docs in place of the
original warning.

commit | commitdiff | tree

Tom Lane [Mon, 4 Jan 2016 17:21:31 +0000 (12:21 -0500)]

Adjust behavior of row_security GUC to match the docs.

Some time back we agreed that row_security=off should not be a way to
bypass RLS entirely, but only a way to get an error if it was being
applied.  However, the code failed to act that way for table owners.
Per discussion, this is a must-fix bug for 9.5.0.

Adjust the logic in rls.c to behave as expected; also, modify the
error message to be more consistent with the new interpretation.
The regression tests need minor corrections as well.  Also update
the comments about row_security in ddl.sgml to be correct.  (The
official description of the GUC in config.sgml is already correct.)

I failed to resist the temptation to do some other very minor
cleanup as well, such as getting rid of a duplicate extern declaration.

commit | commitdiff | tree

Robert Haas [Mon, 4 Jan 2016 15:12:37 +0000 (10:12 -0500)]

Fix typo in comment.

Masahiko Sawada

commit | commitdiff | tree

Peter Eisentraut [Mon, 4 Jan 2016 13:18:48 +0000 (08:18 -0500)]

Translation updates

Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 3b0ccc27cf917446ea0a6c680b70534cfcaba81e

commit | commitdiff | tree

Tom Lane [Mon, 4 Jan 2016 06:53:24 +0000 (01:53 -0500)]

Fix regrole and regnamespace output functions to do quoting, too.

We discussed this but somehow failed to implement it...

commit | commitdiff | tree

Tom Lane [Mon, 4 Jan 2016 06:03:53 +0000 (01:03 -0500)]

Fix regrole and regnamespace types to honor quoting like other reg* types.

Aside from any consistency arguments, this is logically necessary because
the I/O functions for these types also handle numeric OID values.  Without
a quoting rule it is impossible to distinguish numeric OIDs from role or
namespace names that happen to contain only digits.

Also change the to_regrole and to_regnamespace functions to dequote their
arguments.  While not logically essential, this seems like a good idea
since the other to_reg* functions do it.  Anyone who really wants raw
lookup of an uninterpreted name can fall back on the time-honored solution
of (SELECT oid FROM pg_namespace WHERE nspname = whatever).

Report and patch by Jim Nasby, reviewed by Michael Paquier

commit | commitdiff | tree

Tom Lane [Mon, 4 Jan 2016 01:53:35 +0000 (20:53 -0500)]

Fix bogus lock release in RemovePolicyById and RemoveRoleFromObjectPolicy.

Can't release the AccessExclusiveLock on the target table until commit.
Otherwise there is a race condition whereby other backends might service
our cache invalidation signals before they can actually see the updated
catalog rows.

Just to add insult to injury, RemovePolicyById was closing the rel (with
incorrect lock drop) and then passing the now-dangling rel pointer to
CacheInvalidateRelcache. Probably the reason this doesn't fall over on
CLOBBER_CACHE buildfarm members is that some outer level of the DROP logic
is still holding the rel open ... but it'd have bit us on the arse
eventually, no doubt.

commit | commitdiff | tree

Tom Lane [Mon, 4 Jan 2016 01:04:11 +0000 (20:04 -0500)]

Do some copy-editing on the docs for row-level security.

Clarifications, markup improvements, corrections of misleading or
outright wrong statements.

commit | commitdiff | tree

Tom Lane [Sun, 3 Jan 2016 21:26:38 +0000 (16:26 -0500)]

Guard against null arguments in binary_upgrade_create_empty_extension().

The CHECK_IS_BINARY_UPGRADE macro is not sufficient security protection
if we're going to dereference pass-by-reference arguments before it.

But in any case we really need to explicitly check PG_ARGISNULL for all
the arguments of a non-strict function, not only the ones we expect null
values for.

Oversight in commits 30982be4e5019684e1772dd9170aaa53f5a8e894 and
f92fc4c95ddcc25978354a8248d3df22269201bc. Found by Andreas Seltenreich.
(The other usages in pg_upgrade_support.c seem safe.)

commit | commitdiff | tree

Tom Lane [Sun, 3 Jan 2016 21:03:42 +0000 (16:03 -0500)]

Do some copy-editing on the docs for replication origins.

Minor grammar and markup improvements.

commit | commitdiff | tree

Tom Lane [Sun, 3 Jan 2016 20:33:12 +0000 (15:33 -0500)]

Do a final round of copy-editing on the 9.5 release notes.

commit | commitdiff | tree

Tom Lane [Sun, 3 Jan 2016 18:56:29 +0000 (13:56 -0500)]

Fix treatment of *lpNumberOfBytesRecvd == 0: that's a completion condition.

pgwin32_recv() has treated a non-error return of zero bytes from WSARecv()
as being a reason to block ever since the current implementation was
introduced in commit a4c40f140d23cefb.  However, so far as one can tell
from Microsoft's documentation, that is just wrong: what it means is
graceful connection closure (in stream protocols) or receipt of a
zero-length message (in message protocols), and neither case should result
in blocking here.  The only reason the code worked at all was that control
then fell into the retry loop, which did *not* treat zero bytes specially,
so we'd get out after only wasting some cycles.  But as of 9.5 we do not
normally reach the retry loop and so the bug is exposed, as reported by
Shay Rojansky and diagnosed by Andres Freund.

Remove the unnecessary test on the byte count, and rearrange the code
in the retry loop so that it looks identical to the initial sequence.

Back-patch to 9.5.  The code is wrong all the way back, AFAICS, but
since it's relatively harmless in earlier branches we'll leave it alone.

commit | commitdiff | tree

Tom Lane [Sun, 3 Jan 2016 00:04:45 +0000 (19:04 -0500)]

Teach pg_dump to quote reloption values safely.

Commit c7e27becd2e6eb93 fixed this on the backend side, but we neglected
the fact that several code paths in pg_dump were printing reloptions
values that had not gotten massaged by ruleutils. Apply essentially the
same quoting logic in those places, too.

commit | commitdiff | tree

Tom Lane [Sat, 2 Jan 2016 21:24:50 +0000 (16:24 -0500)]

Fix overly-strict assertions in spgtextproc.c.

spg_text_inner_consistent is capable of reconstructing an empty string
to pass down to the next index level; this happens if we have an empty
string coming in, no prefix, and a dummy node label.  (In practice, what
is needed to trigger that is insertion of a whole bunch of empty-string
values.)  Then, we will arrive at the next level with in->level == 0
and a non-NULL (but zero length) in->reconstructedValue, which is valid
but the Assert tests weren't expecting it.

Per report from Andreas Seltenreich.  This has no impact in non-Assert
builds, so should not be a problem in production, but back-patch to
all affected branches anyway.

In passing, remove a couple of useless variable initializations and
shorten the code by not duplicating DatumGetPointer() calls.

commit | commitdiff | tree

Tom Lane [Sat, 2 Jan 2016 20:29:03 +0000 (15:29 -0500)]

Adjust back-branch release note description of commits a2a718b22 et al.

As pointed out by Michael Paquier, recovery_min_apply_delay didn't exist
in 9.0-9.3, making the release note text not very useful. Instead make it
talk about recovery_target_xid, which did exist then.

9.0 is already out of support, but we can fix the text in the newer
branches' copies of its release notes.

commit | commitdiff | tree

Bruce Momjian [Sat, 2 Jan 2016 18:33:39 +0000 (13:33 -0500)]

Update copyright for 2016

Backpatch certain files through 9.1

commit | commitdiff | tree

Tom Lane [Fri, 1 Jan 2016 20:27:53 +0000 (15:27 -0500)]

Teach flatten_reloptions() to quote option values safely.

flatten_reloptions() supposed that it didn't really need to do anything
beyond inserting commas between reloption array elements.  However, in
principle the value of a reloption could be nearly anything, since the
grammar allows a quoted string there.  Any restrictions on it would come
from validity checking appropriate to the particular option, if any.

A reloption value that isn't a simple identifier or number could thus lead
to dump/reload failures due to syntax errors in CREATE statements issued
by pg_dump.  We've gotten away with not worrying about this so far with
the core-supported reloptions, but extensions might allow reloption values
that cause trouble, as in bug #13840 from Kouhei Sutou.

To fix, split the reloption array elements explicitly, and then convert
any value that doesn't look like a safe identifier to a string literal.
(The details of the quoting rule could be debated, but this way is safe
and requires little code.)  While we're at it, also quote reloption names
if they're not safe identifiers; that may not be a likely problem in the
field, but we might as well try to be bulletproof here.

It's been like this for a long time, so back-patch to all supported
branches.

Kouhei Sutou, adjusted some by me

commit | commitdiff | tree

Tom Lane [Fri, 1 Jan 2016 18:42:21 +0000 (13:42 -0500)]

Add some more defenses against silly estimates to gincostestimate().

A report from Andy Colson showed that gincostestimate() was not being
nearly paranoid enough about whether to believe the statistics it finds in
the index metapage.  The problem is that the metapage stats (other than the
pending-pages count) are only updated by VACUUM, and in the worst case
could still reflect the index's original empty state even when it has grown
to many entries.  We attempted to deal with that by scaling up the stats to
match the current index size, but if nEntries is zero then scaling it up
still gives zero.  Moreover, the proportion of pages that are entry pages
vs. data pages vs. pending pages is unlikely to be estimated very well by
scaling if the index is now orders of magnitude larger than before.

We can improve matters by expanding the use of the rule-of-thumb estimates
I introduced in commit 7fb008c5ee59b040: if the index has grown by more
than a cutoff amount (here set at 4X growth) since VACUUM, then use the
rule-of-thumb numbers instead of scaling.  This might not be exactly right
but it seems much less likely to produce insane estimates.

I also improved both the scaling estimate and the rule-of-thumb estimate
to account for numPendingPages, since it's reasonable to expect that that
is accurate in any case, and certainly pages that are in the pending list
are not either entry or data pages.

As a somewhat separate issue, adjust the estimation equations that are
concerned with extra fetches for partial-match searches.  These equations
suppose that a fraction partialEntries / numEntries of the entry and data
pages will be visited as a consequence of a partial-match search.  Now,
it's physically impossible for that fraction to exceed one, but our
estimate of partialEntries is mostly bunk, and our estimate of numEntries
isn't exactly gospel either, so we could arrive at a silly value.  In the
example presented by Andy we were coming out with a value of 100, leading
to insane cost estimates.  Clamp the fraction to one to avoid that.

Like the previous patch, back-patch to all supported branches; this
problem can be demonstrated in one form or another in all of them.

commit | commitdiff | tree

Tom Lane [Fri, 1 Jan 2016 18:00:13 +0000 (13:00 -0500)]

Split out pg_operator.h function declarations to new file pg_operator_fn.h.

Commit a2e35b53c39b2a27 added an #include of catalog/objectaddress.h to
pg_operator.h, making it impossible for client-side code to #include
pg_operator.h. It's not entirely clear whether any client-side code needs
to include pg_operator.h, but it seems prudent to assume that there is some
such code somewhere. Therefore, split off the function definitions into a
new file pg_operator_fn.h, similarly to what we've done for some other
catalog header files.

Back-patch of part of commit 0dab5ef39b3d9d86.

commit | commitdiff | tree

Tom Lane [Thu, 31 Dec 2015 22:59:10 +0000 (17:59 -0500)]

Add a comment noting that FDWs don't have to implement EXCEPT or LIMIT TO.

postgresImportForeignSchema pays attention to IMPORT's EXCEPT and LIMIT TO
options, but only as an efficiency hack, not for correctness' sake. The
FDW documentation does explain that, but someone using postgres_fdw.c
as a coding guide might not remember it, so let's add a comment here.
Per question from Regina Obe.

commit | commitdiff | tree

Pavan Deolasee [Thu, 31 Dec 2015 11:11:25 +0000 (16:41 +0530)]

Set XactTopTransactionId also on receiving XID from a datanode

commit | commitdiff | tree

Pavan Deolasee [Wed, 30 Dec 2015 11:27:42 +0000 (16:57 +0530)]

Send a single command-complete message from remote coordinator or datanode

When a client sends a multi-command string for execution, Postgres server
executes each command separately and sends a command-complete for each command.
The coordinator is not well equipped though to handle this since it sends the
whole query string as it is to the datanode and expects a single
command-complete.

What we need is a mechanism where coordinator sends each command separately to
the datanode. It already parses multi-command string into multiple parse-trees.
Earlier we did not have mechanism to deparse utility commands, but IIRC we now
have that. So we should look at using that infrastructure

commit | commitdiff | tree

Pavan Deolasee [Wed, 30 Dec 2015 04:55:56 +0000 (10:25 +0530)]

Add a flag to the latest snapshot from GTM when situation demands.

Catalog scans for example require the latest snapshot to see the latest changes
to the catalogs. This patch fixes the problem with catalog scans

Official repo for Postgres-XL. Stable branch is XL9_5_STABLE. Current development is PG10 compatible. Controlled by Postgres-X2 Core Team.