Tatsuo Ishii [Wed, 29 Mar 2017 05:48:54 +0000 (14:48 +0900)]
Code clean up.
Tatsuo Ishii [Wed, 29 Mar 2017 05:37:20 +0000 (14:37 +0900)]
Remove pending response flag.
By introducing the pending message queue, the pending response flag"
is not necessary any more.
Tatsuo Ishii [Wed, 29 Mar 2017 05:06:50 +0000 (14:06 +0900)]
Remove "sync_map".
By introducing the pending message queue, "sync_map" is not necessary
any more.
Tatsuo Ishii [Wed, 29 Mar 2017 04:33:27 +0000 (13:33 +0900)]
Remove trailing space.
Tatsuo Ishii [Wed, 29 Mar 2017 04:21:46 +0000 (13:21 +0900)]
Merge branch 'bug271' of ssh://git.postgresql.org/pgpool2 into bug271
Tatsuo Ishii [Wed, 29 Mar 2017 01:46:22 +0000 (10:46 +0900)]
Fix occasional kind mismatch error (or hang) and deallocate error.
There were following problems causing the first one:
1) pool_push_pending_data() does not push pending data if the first
pending message is '3' close complete because
pool_push_pending_data() cannot distinguish from the same kind of
message generated by it. Checking if backend buffer is empty is not
enough, since it is possible that the message was not read into the
buffer yet. To solve the problem do_query() is modified to check if
'3' is received prior to '1' parse complete. If so, it is likely
that pool_push_pending_data() failed to check the message and
do_query() pushes the message.
2) CloseComplete() deletes a close pending message. This should not be
done by CloseComplete() since read_kind_from_backend() is in charge
of it using pool_pending_message_pull_out().
The latter is, deallocate does not find the statement to be
deleted. This is caused in that send_deallocate() can be called in the
reset query list processing. Since DISCARD ALL automatically
deallocate all the named statement or portal, send_deallocate() is
useless. Solution is, just deleting the call in the reset query list
processing.
Tatsuo Ishii [Fri, 24 Mar 2017 08:44:46 +0000 (17:44 +0900)]
Fix the case when duplicate statement remains.
By using parse_before_bind, it is possible a parsed statement remains
on the load balance node even if explicit close is issued because the
close is redirected to the primary node. Fix is, issue close to both
primary and load balance node anytime. This sounds is a little bit
over killing but I think there's no way to remember the statement
remains on a load balance node.
Also clean up some ifdef NOT_USED garbage.
Tatsuo Ishii [Fri, 24 Mar 2017 01:40:09 +0000 (10:40 +0900)]
Fix bugs regarding empty query response and deallocate.
The bug was analyzed in pgpool-hackers: 2173
http://www.pgpool.net/pipermail/pgpool-hackers/2017-March/002173.html
Pgpool-II needs to treat empty query response exactly same as command
complete so implement it.
Also I found issue with parse_before_bind. If a client sends bind
message multiple times and thus parse_before_bind gets called multiple
times, a named statement could be duplicated which cause an error from
backend. So solution is before sending a parse message to primary,
close the named statement.
Tatsuo Ishii [Mon, 20 Mar 2017 09:55:00 +0000 (18:55 +0900)]
Fix failing in re-syncing primary and standby when "ready for query" received.
After sending a sync message to all backend nodes, eventually a ready
for query message comes from all nodes. So read_data_from_backend()
syncs all nodes by reading and discarding messages until "ready for
query" is coming. Before it checks whether data is ready or not before
reading from socket with timeout to 0 when data input buffer is
empty. However this is not reliable since the data may not be ready at
the timing. Moreover this could lead to a kind mismatch error since
the re-syncing process is aborted. Now it is safe to assume that data
must be eventually ready, so the timeout is not necessary anymore.
[See pgpool-hacker: 2167] for more details.
Tatsuo Ishii [Fri, 17 Mar 2017 02:04:24 +0000 (11:04 +0900)]
Fix Execute() not to set writing transaction flag in an inappropriate case.
Execute() mistakenly set the writing transaction flag even if a write
query is issued (in particular case, "BEGIN"). The reason why it did
not found before was, the code path only works when the transaction
state is 'T' (in transaction). Before it was only set at
CommandComplete but previous commit set the state at Execute().
This leads to a SELECT not being load balanced.
Also add some useful debugging statements.
See [pgpool-hackers: 2155] for more details.
Tatsuo Ishii [Thu, 16 Mar 2017 05:02:32 +0000 (14:02 +0900)]
Fix issue with bug271.
When an INSERT is executed in a transaction, a "writing_transaction"
is set. The flag is checked by a succeeding SELECT to decide if
parse_before_bind (which resend the parse message to primary node if
it is executed on the standby) needs to be executed. The flag is
reset when a command complete message for BEGIN is arrived. Problem
is, it is possible that the command complete message could arrive
before the bind for the SELECT is executed, and at this point the flag
is already cleared.
Solution is, do not reset the flag when the command complete message
arrives if in streaming replication mode.
Tatsuo Ishii [Tue, 14 Mar 2017 10:03:02 +0000 (19:03 +0900)]
Fix problem described in [pgpool-hackers: 2125].
Two problems are fixed:
- The writing_transaction flag is not reset at transaction
committed/aborted. Before it was done in CommandComplete but now it
is possible that we need to check the flag before a command complete
message arrives. So at Execute, it is taken care by
handle_query_context() which used to be a static function in
CommandComplete.c. Also the session context memory is always cleared
before start a session context.
- Ready for query Re-sync code path in read_kind_from_backend did not
work. The pending message corresponding to the ready for query
message could be a sync message, which does not have query
context. But before we checked if there's a query context
exists. The check is bogus and removed.
Tatsuo Ishii [Sun, 12 Mar 2017 01:19:34 +0000 (10:19 +0900)]
Downgrade 1 more ereport from LOG to DEBUG1.
Tatsuo Ishii [Fri, 10 Mar 2017 10:42:39 +0000 (19:42 +0900)]
Downgrade ereport(LOG) to ereport(DEBUG1), which was set for debugging purpose.
Tatsuo Ishii [Thu, 9 Mar 2017 23:18:00 +0000 (08:18 +0900)]
Check if "milestone close" maybe received or not before checking pending data.
pool_push_pending_data() checks fd of backend if buffers are empty
with select(2) timers is 0. This may loose pending data which is not
processed in backend yet. To minimize the chance loosing pending data,
check the previously received data is "milestone close" (close
complete) or not. If it was not the miles-stone close, set the timer
to "wait forever" because we can be sure that the miles-stone close
will come later. Per pgpool-hackers: 2103.
http://www.pgpool.net/pipermail/pgpool-hackers/2017-March/002103.html
Tatsuo Ishii [Fri, 3 Mar 2017 06:37:14 +0000 (15:37 +0900)]
Work in progress as of 2017/3/3.
- Fix nasty bug in pool_push_pending_data.
- (Probably) long standing bug with prool_process_query. When backend
read buffer is not empty, checking 'A' message forgot to restore kind
message. Which lead to "kind = 0" error in read_kind_from_backend
later on. This may be a cause of occasional bug report.
- Fix SimpleForwardToFrontend not to use sync map.
Tatsuo Ishii [Tue, 28 Feb 2017 06:57:16 +0000 (15:57 +0900)]
Uniform function names.
Mostly cosmetic fixes.
Tatsuo Ishii [Tue, 28 Feb 2017 01:42:37 +0000 (10:42 +0900)]
Fix CommandComplete.
It still used sync map, now which is obsoleted. This gives strange
errors upon receiving command complete message.
Fix do_query(). It did not popd up saved messages in certain cases. I
guess this was one of the cause of occasional regression failures.
Tatsuo Ishii [Mon, 27 Feb 2017 07:47:58 +0000 (16:47 +0900)]
Fix the case when parse_before_bind() is called.
When parse_before_bind() is called, response of the the newly issued
parse message should not be forwarded to frontend. Otherwise, the
frontend will receive duplicate parse complete message.
Tatsuo Ishii [Mon, 27 Feb 2017 04:23:21 +0000 (13:23 +0900)]
Add new pending message type 'Sync'.
This is used upon receiving a sync message from client. The pending
message created here is a fake one, just let read_kind_from_backend to
unset in progress flag, so that it reads from all healthy backends,
because the sync message is always forwarded to all health backends.
Tatsuo Ishii [Thu, 23 Feb 2017 22:42:42 +0000 (07:42 +0900)]
Remove commented out line.
Tatsuo Ishii [Thu, 23 Feb 2017 05:43:43 +0000 (14:43 +0900)]
Remove trailing white spaces.
Tatsuo Ishii [Thu, 23 Feb 2017 05:31:51 +0000 (14:31 +0900)]
Deal with prepared statement/portal reuse case.
i.e. same named statement/portal is used before a close complete
message arrived.
Tatsuo Ishii [Wed, 22 Feb 2017 09:43:00 +0000 (18:43 +0900)]
Some cosmetic changes.
Tatsuo Ishii [Wed, 22 Feb 2017 05:26:55 +0000 (14:26 +0900)]
As of 2017/2/22.
regression tests passed.
Reusing parepared statements and portals are required fix.
Still code clean up needed.
Tatsuo Ishii [Sat, 18 Feb 2017 10:13:35 +0000 (19:13 +0900)]
Deal with query error case.
Tatsuo Ishii [Tue, 14 Feb 2017 06:54:11 +0000 (15:54 +0900)]
Trying to fix bug271 as of 2017/2/14.
Rewrite whole STREAM/extended query case by using "pending message"
infrastructure, which is an ordered list of sent messages.
- /work/pgpool-II/current/Bugs/bug271/pgproto.data now passed.
- /work/pgpool-II/current/Bugs/bug271/pgproto2.data does not pass.
- 005.jdbc does not pass.
- query cache regression does not pass.
Tatsuo Ishii [Wed, 29 Mar 2017 04:13:15 +0000 (13:13 +0900)]
Add new regression test "069.memory_leak_extended".
This is almost same as 060.memory_leak" but it uses extended query
protocol.
Tatsuo Ishii [Wed, 29 Mar 2017 01:46:22 +0000 (10:46 +0900)]
Fix occasional kind mismatch error (or hang) and deallocate error.
There were following problems causing the first one:
1) pool_push_pending_data() does not push pending data if the first
pending message is '3' close complete because
pool_push_pending_data() cannot distinguish from the same kind of
message generated by it. Checking if backend buffer is empty is not
enough, since it is possible that the message was not read into the
buffer yet. To solve the problem do_query() is modified to check if
'3' is received prior to '1' parse complete. If so, it is likely
that pool_push_pending_data() failed to check the message and
do_query() pushes the message.
2) CloseComplete() deletes a close pending message. This should not be
done by CloseComplete() since read_kind_from_backend() is in charge
of it using pool_pending_message_pull_out().
The latter is, deallocate does not find the statement to be
deleted. This is caused in that send_deallocate() can be called in the
reset query list processing. Since DISCARD ALL automatically
deallocate all the named statement or portal, send_deallocate() is
useless. Solution is, just deleting the call in the reset query list
processing.
Tatsuo Ishii [Fri, 24 Mar 2017 08:44:46 +0000 (17:44 +0900)]
Fix the case when duplicate statement remains.
By using parse_before_bind, it is possible a parsed statement remains
on the load balance node even if explicit close is issued because the
close is redirected to the primary node. Fix is, issue close to both
primary and load balance node anytime. This sounds is a little bit
over killing but I think there's no way to remember the statement
remains on a load balance node.
Also clean up some ifdef NOT_USED garbage.
Tatsuo Ishii [Fri, 24 Mar 2017 01:40:09 +0000 (10:40 +0900)]
Fix bugs regarding empty query response and deallocate.
The bug was analyzed in pgpool-hackers: 2173
http://www.pgpool.net/pipermail/pgpool-hackers/2017-March/002173.html
Pgpool-II needs to treat empty query response exactly same as command
complete so implement it.
Also I found issue with parse_before_bind. If a client sends bind
message multiple times and thus parse_before_bind gets called multiple
times, a named statement could be duplicated which cause an error from
backend. So solution is before sending a parse message to primary,
close the named statement.
Tatsuo Ishii [Fri, 24 Mar 2017 01:37:13 +0000 (10:37 +0900)]
Merge branch 'bug271' of ssh://git.postgresql.org/pgpool2 into bug271
Muhammad Usama [Thu, 23 Mar 2017 21:17:36 +0000 (02:17 +0500)]
Fix for
0000296: PGPool v3.6.2 terminated by systemd because the service Type
has been set to 'forking'
Removing the "-n" value assigned to OPTS variable in pgpool.sysconfig.
The problem was the systemd service with Type=forking expects the parent process
to exit after the startup is complete, but because the -n command line option
disables the daemon mode and systemd keeps on waiting for the Pgpool-II's parent
process to exit after startup, which never happens and eventually systemd
terminate the Pgpool-II after timeout.
As part of this commit I have also added a new variable STOP_OPTS which is
passed to ExecStop and can be used to pass extra command line options to
Pgpool-II stop command.
Muhammad Usama [Mon, 20 Mar 2017 21:49:03 +0000 (02:49 +0500)]
Enhancing the watchdog internal command mechanism to handle
multiple concurrent commands.
Tatsuo Ishii [Mon, 20 Mar 2017 09:55:00 +0000 (18:55 +0900)]
Fix failing in re-syncing primary and standby when "ready for query" received.
After sending a sync message to all backend nodes, eventually a ready
for query message comes from all nodes. So read_data_from_backend()
syncs all nodes by reading and discarding messages until "ready for
query" is coming. Before it checks whether data is ready or not before
reading from socket with timeout to 0 when data input buffer is
empty. However this is not reliable since the data may not be ready at
the timing. Moreover this could lead to a kind mismatch error since
the re-syncing process is aborted. Now it is safe to assume that data
must be eventually ready, so the timeout is not necessary anymore.
[See pgpool-hacker: 2167] for more details.
pengbo [Fri, 17 Mar 2017 09:32:57 +0000 (18:32 +0900)]
Add release-notes 3.6.2-3.2.19.
Tatsuo Ishii [Fri, 17 Mar 2017 02:04:24 +0000 (11:04 +0900)]
Fix Execute() not to set writing transaction flag in an inappropriate case.
Execute() mistakenly set the writing transaction flag even if a write
query is issued (in particular case, "BEGIN"). The reason why it did
not found before was, the code path only works when the transaction
state is 'T' (in transaction). Before it was only set at
CommandComplete but previous commit set the state at Execute().
This leads to a SELECT not being load balanced.
Also add some useful debugging statements.
See [pgpool-hackers: 2155] for more details.
Tatsuo Ishii [Thu, 16 Mar 2017 05:02:32 +0000 (14:02 +0900)]
Fix issue with bug271.
When an INSERT is executed in a transaction, a "writing_transaction"
is set. The flag is checked by a succeeding SELECT to decide if
parse_before_bind (which resend the parse message to primary node if
it is executed on the standby) needs to be executed. The flag is
reset when a command complete message for BEGIN is arrived. Problem
is, it is possible that the command complete message could arrive
before the bind for the SELECT is executed, and at this point the flag
is already cleared.
Solution is, do not reset the flag when the command complete message
arrives if in streaming replication mode.
Tatsuo Ishii [Tue, 14 Mar 2017 10:03:02 +0000 (19:03 +0900)]
Fix problem described in [pgpool-hackers: 2125].
Two problems are fixed:
- The writing_transaction flag is not reset at transaction
committed/aborted. Before it was done in CommandComplete but now it
is possible that we need to check the flag before a command complete
message arrives. So at Execute, it is taken care by
handle_query_context() which used to be a static function in
CommandComplete.c. Also the session context memory is always cleared
before start a session context.
- Ready for query Re-sync code path in read_kind_from_backend did not
work. The pending message corresponding to the ready for query
message could be a sync message, which does not have query
context. But before we checked if there's a query context
exists. The check is bogus and removed.
Tatsuo Ishii [Sun, 12 Mar 2017 01:19:34 +0000 (10:19 +0900)]
Downgrade 1 more ereport from LOG to DEBUG1.
Tatsuo Ishii [Fri, 10 Mar 2017 10:42:39 +0000 (19:42 +0900)]
Downgrade ereport(LOG) to ereport(DEBUG1), which was set for debugging purpose.
Tatsuo Ishii [Thu, 9 Mar 2017 23:18:00 +0000 (08:18 +0900)]
Check if "milestone close" maybe received or not before checking pending data.
pool_push_pending_data() checks fd of backend if buffers are empty
with select(2) timers is 0. This may loose pending data which is not
processed in backend yet. To minimize the chance loosing pending data,
check the previously received data is "milestone close" (close
complete) or not. If it was not the miles-stone close, set the timer
to "wait forever" because we can be sure that the miles-stone close
will come later. Per pgpool-hackers: 2103.
http://www.pgpool.net/pipermail/pgpool-hackers/2017-March/002103.html
Tatsuo Ishii [Fri, 3 Mar 2017 06:37:14 +0000 (15:37 +0900)]
Work in progress as of 2017/3/3.
- Fix nasty bug in pool_push_pending_data.
- (Probably) long standing bug with prool_process_query. When backend
read buffer is not empty, checking 'A' message forgot to restore kind
message. Which lead to "kind = 0" error in read_kind_from_backend
later on. This may be a cause of occasional bug report.
- Fix SimpleForwardToFrontend not to use sync map.
Tatsuo Ishii [Tue, 28 Feb 2017 06:57:16 +0000 (15:57 +0900)]
Uniform function names.
Mostly cosmetic fixes.
Tatsuo Ishii [Tue, 28 Feb 2017 01:42:37 +0000 (10:42 +0900)]
Fix CommandComplete.
It still used sync map, now which is obsoleted. This gives strange
errors upon receiving command complete message.
Fix do_query(). It did not popd up saved messages in certain cases. I
guess this was one of the cause of occasional regression failures.
Tatsuo Ishii [Mon, 27 Feb 2017 07:47:58 +0000 (16:47 +0900)]
Fix the case when parse_before_bind() is called.
When parse_before_bind() is called, response of the the newly issued
parse message should not be forwarded to frontend. Otherwise, the
frontend will receive duplicate parse complete message.
Tatsuo Ishii [Mon, 27 Feb 2017 04:23:21 +0000 (13:23 +0900)]
Add new pending message type 'Sync'.
This is used upon receiving a sync message from client. The pending
message created here is a fake one, just let read_kind_from_backend to
unset in progress flag, so that it reads from all healthy backends,
because the sync message is always forwarded to all health backends.
Tatsuo Ishii [Thu, 23 Feb 2017 22:42:42 +0000 (07:42 +0900)]
Remove commented out line.
Tatsuo Ishii [Thu, 23 Feb 2017 05:43:43 +0000 (14:43 +0900)]
Remove trailing white spaces.
Tatsuo Ishii [Thu, 23 Feb 2017 05:31:51 +0000 (14:31 +0900)]
Deal with prepared statement/portal reuse case.
i.e. same named statement/portal is used before a close complete
message arrived.
Tatsuo Ishii [Wed, 22 Feb 2017 09:43:00 +0000 (18:43 +0900)]
Some cosmetic changes.
Tatsuo Ishii [Wed, 22 Feb 2017 05:26:55 +0000 (14:26 +0900)]
As of 2017/2/22.
regression tests passed.
Reusing parepared statements and portals are required fix.
Still code clean up needed.
Tatsuo Ishii [Sat, 18 Feb 2017 10:13:35 +0000 (19:13 +0900)]
Deal with query error case.
Tatsuo Ishii [Tue, 14 Feb 2017 06:54:11 +0000 (15:54 +0900)]
Trying to fix bug271 as of 2017/2/14.
Rewrite whole STREAM/extended query case by using "pending message"
infrastructure, which is an ordered list of sent messages.
- /work/pgpool-II/current/Bugs/bug271/pgproto.data now passed.
- /work/pgpool-II/current/Bugs/bug271/pgproto2.data does not pass.
- 005.jdbc does not pass.
- query cache regression does not pass.
Yugo Nagata [Thu, 9 Mar 2017 02:34:12 +0000 (11:34 +0900)]
Fix pcp_promote_node bug that fails promoting node 0
The master node could not be promoted by pcp_promote_node with
the following error;
FATAL: invalid pgpool mode for process recovery request
DETAIL: specified node is already primary node, can't promote node id 0
In streaming replication mode, there is a case that Pgpool-II
regards the status of primary node as "standby" for some reasons,
for example, when pg_ctl promote is executed manually during
Pgpool-II is running, in which case, it seems to Pgpool-II
that the primary node doesn't exist.
This status mismatch should be fixe by pcp_promote_node, but when the node
is the master node (the first alive node), it fails as mentioned above.
The reason is as following. before changing the status, pcp_promote_node
checks if the specified node is already primary or not by comparing the
node id with PRIMARY_NODE_ID. However, if the primary doesn't exist from
Pgpool-II's view, PRIMARY_NODE_ID is set to 0, which is same as MASTER_NODE_ID.
Hence, when the master node is specified to be promoted, pcp_promote_node
is confused that this node is already primary and doesn't have to be
promoted, and it exits with the error.
To fix this, pcp_promote_node should check the node id by using
REAL_PRIMARY_NODE_ID, which is set -1 when the primary doesn't exist,
rather than PRIMARY_NODE_ID.
Tatsuo Ishii [Tue, 28 Feb 2017 02:39:14 +0000 (11:39 +0900)]
Fix tag error (excessive ">").
Muhammad Usama [Thu, 23 Feb 2017 16:06:05 +0000 (21:06 +0500)]
Pgpool-II should not perform ping test after bringing down the VIP
At the time de-escalation from the master watchdog node, we should not perform
the ping test to verify if the VIP was successfully brought down or not.
The reason is, if the new master watchdog node acquires the VIP while the
resigning node is still performing the de-escalation steps, then the resigning
Pgpool-II node will falsely assume that it has failed to bring down the
delegate IP, Since it will get the positive ping result from the new master node
who has already acquired the same VIP.
Secondly, not having the delegate-IP in the configuration should not be
considered as an error case by wd_IP_up() and wd_IP_down() functions.
Since there are many possible valid scenarios where a user would not want to
have a delegate-IP. So now both wd_IP_* functions return WD_OK instead of WD_ND
when delegate_IP configuration is empty.
This issue was reported by the reporter of
bug:[pgpool-II
0000249]: watchdog sometimes fails de-escalation
The commit also contains some log message fixes.
Tatsuo Ishii [Thu, 23 Feb 2017 07:05:11 +0000 (16:05 +0900)]
Fix to release shared memory segments when Pgpool-II exits.
Per bug272. From the bug report.
"This cause the creation of a lot of segments if you start and stop
pgpool continuously (and in a testing fase it could be normal). Lot of
segments bring to reach the shmem OS configuration limit and than
suddenly stops (pgpool) working."
Muhammad Usama [Mon, 13 Feb 2017 19:34:56 +0000 (00:34 +0500)]
Fix for [pgpool-general: 5315] pg_terminate_backend
Pgpool-II process the pg_terminate_backend by setting the swallow_termination
flag of the backend connection_info referred in the pg_terminate_backend
function, and latter resets that flag when the query execution completes.
The problem with this approach is that if the command complete for
the pg_terminate_backend is received before the connection termination,
This termination is regarded as the backend failure since the
swallow_termination flag was already cleared by the Pgpool-II child after
receiving the query completion message.
The solution is to reset the swallow_termination flag only when the
pg_terminate_query query fails otherwise leave it as it is when the query
is successful, Since, after the termination of the connection the flag
will not matter anyway.
Tatsuo Ishii [Thu, 9 Feb 2017 09:11:04 +0000 (18:11 +0900)]
Enhance document.
Add note about %m in failover command. Add indexes to "streaming
replication mode", "master slave mode" and "native replication mode".
Muhammad Usama [Mon, 6 Feb 2017 14:41:31 +0000 (19:41 +0500)]
Adding the missing ExecStop and ExecReload commands to the systemd
service configuration file.
The patch was contributed by supp_k and enhanced by me.
Muhammad Usama [Mon, 30 Jan 2017 12:56:08 +0000 (17:56 +0500)]
Fix for 281: "segmentation fault" when execute pcp_attach_node
A DEBUG message was trying to de-reference a NULL value.
Tatsuo Ishii [Mon, 30 Jan 2017 07:30:57 +0000 (16:30 +0900)]
Fix load balancing bug in streaming replication mode.
In an explicit transaction, any SELECT will be load balanced until
write query is sent. After writing query is sent, any SELECT should be
sent to the primary node. However if a SELECT is sent before a sync
message is sent, this does not work since the treatment of writing
query is done after ready for query message arrives. Solution is, the
treatment for writing query is done in executing the writing query as
well.
The bug has been there since V3.5.
Tatsuo Ishii [Mon, 30 Jan 2017 06:29:29 +0000 (15:29 +0900)]
Fix yet another kind mismatch error in streaming replication mode.
1) Parse "BEGIN" using statement S1, and it is sent to both node 0 and 1.
2) Close S1.
3) Parse SELECT using S1, and it is sent to node 0 (or 1).
4) Bind retrieves info (sent_messages) regarding S1. Since Pgpool-II
only removes info on S1 when CloseComplete received, Bind decides
to send bind message to both node 0 & 1 because it was the info
regarding BEGIN. Node 0 or 1 tries to bind to non existent
statement S1.
As a result, something like "failed to read kind from backend.
kind mismatch among backends. Possible last query was: "BEGIN" kind
details are: 0[E: prepared statement "S1" does not exist] 1[3]
check data consistency among db nodes" occurs.
Note that in 3) if other than S1 is used, the problem does not occur.
Solution is, removing S1 when Close message is received. This problem
has been there since 3.5.0 was out.
Tatsuo Ishii [Fri, 27 Jan 2017 07:12:47 +0000 (16:12 +0900)]
Fix do_query() hangs after close message.
This is an en-bug in 3.6.1.
If an extend query appears right after a close message, do_query() is
called to check system catalogs, it hangs because it expects to read
pending data. This is caused by being mistakenly set the pending flag
after Close().
Back patch to 3.6-stable and 3.5-stable.
Muhammad Usama [Thu, 26 Jan 2017 20:53:42 +0000 (01:53 +0500)]
Fixing
0000280: stack smashing detected
It was a buffer overflow in wd_get_cmd function
Tatsuo Ishii [Tue, 24 Jan 2017 23:17:51 +0000 (08:17 +0900)]
Fix indentations of query cache documents.
Tatsuo Ishii [Tue, 24 Jan 2017 23:13:25 +0000 (08:13 +0900)]
Enhance query cache documents.
Muhammad Usama [Thu, 19 Jan 2017 15:58:24 +0000 (20:58 +0500)]
Fixing the issue with the watchdog process restart.
When the watchdog process gets abnormally terminated because of some problem
(e.g. Segmentation fault) the new spawned watchdog process fails to start and
produces an error "bind on ... failed with reason: Address already in use".
Reason is the abnormally terminating watchdog process never gets the time to
clean-up the socket it uses for IPC and the new process gets an error because
the socket address is already occupied
Fix is, the Pgpool main process sets the flag in shared memory to mark the
watchdog process was abnormally terminated and at startup when the watchdog
process see that the flag is set, it performs the clean up of the socket file and
also performs the de-escalation (If the watchdog process was crashed when it
was master/coordinator node) if required before initializing itself.
Tatsuo Ishii [Wed, 18 Jan 2017 23:24:34 +0000 (08:24 +0900)]
Fix query cache bug reported in pgpool-general-jp:1441.
In streaming replication mode with query cache enabled, SELECT hangs
in the following scenario:
1) a SELECT hits query cache and returns rows from the query cache.
2) following SELECT needs to search meta data and it hangs.
In #1, while returning the cached result, it misses to call
pool_unset_pending_response(), which leave the pending_response flag
be set. In #2, do_query() checks the flag and tries to read pending
response from backend. Since there's no pending data in backend, it
hangs in reading data from backend.
Fix is, just call pool_unset_pending_response() in #1 to reset the
flag.
Bug report and fix provided by Nobuyuki Nagai.
New regression test item (068) added by me.
Tatsuo Ishii [Tue, 10 Jan 2017 23:24:32 +0000 (08:24 +0900)]
Remove elog/ereport calls from signal handlers.
elog/ereport calls malloc(), which is not safe to be called inside
signal handlers, per discussion in [pgpool-hackers: 1950]. I ifdef
out them, rather than simply remove them in a hope we someday find a
better solution which make calling the functions inside signal
handlers.
Not that I did not touch exit_handler() of pgpool_main.c because
removing elog/ereport from them loses informative message like
"received smart shutdown request". Pgpool-II main process do not
heavily use malloc(), so the risk is minimum, I guess.
pengbo [Tue, 10 Jan 2017 08:35:29 +0000 (17:35 +0900)]
Fix typo in Japanese release notes.
pengbo [Tue, 10 Jan 2017 07:59:37 +0000 (16:59 +0900)]
Fix bug failed to create INET domain socket in FreeBSD if listen_addresses = '*'.
per bug202.
Muhammad Usama [Wed, 4 Jan 2017 13:23:33 +0000 (18:23 +0500)]
Fix for
0000249: watchdog sometimes fails de-escalation.
The logic in pgpool-II main process exit_handler and terminate_all_childrens was
not making sure that pgpool-II main process should only exit after all its
children have exited. And the problem occurs when the main process shutdowns
itself before watchdog and de-escalation child processes.
The solution is to use the waitpid() system call without WNOHANG option.
Yugo Nagata [Wed, 4 Jan 2017 05:20:24 +0000 (14:20 +0900)]
Fix connection_life_time broken by authentication_timeout
When authentication_timeout is enabled,
connection_life_time could never be expired, because
alarm(0) is called at reading start-up packet.
When there only one connection pool is used, this
problem doesn't occur because the signal handler
for connection_life_time is always set at the end
of the session. However, if more than one connection
pools exist, the handler isn't set but only the time
to colse the connection is calculated.
To fix it, when authentication_timeout is enabled,
save the signal handler for conneciont_life_time
and the remaining time, and undo the handler when
authentication_timeout is disabled.
Yugo Nagata [Wed, 28 Dec 2016 08:37:11 +0000 (17:37 +0900)]
Fix authentication timeout that can occur right after client connecttions
This is possible when connection_life_time is enabled.
SIGALRM signal is used for both connection_life_time and
authentication_timeout. Usually, SIGALRM is for connection_life_time,
but when the new connection is arrive, read_startup_packet() is called,
and the handler for authentication_timeout is set by pool_signal() and
alarm(authentication_timeout) is called in enable_authentication_timeout().
However, if connection_life_time is expired **between pool_signal() and
alarm()**, authenticate_timeout() will be called when connection_life_time
is expired instead of pool_backend_timer_handler().
To fix this, call alarm() before pool_signal() to prevent the signal
handler from being with wrong timing.
pengbo [Mon, 26 Dec 2016 02:35:42 +0000 (11:35 +0900)]
Some changes in release note 3.1-3.6.
pengbo [Sun, 25 Dec 2016 15:53:50 +0000 (00:53 +0900)]
Add Japanese release note 3.1-3.6.
Muhammad Usama [Fri, 23 Dec 2016 14:58:53 +0000 (19:58 +0500)]
Tightening up the watchdog security
Now wd_authkey uses the HMAC SHA-256 hashing.
pengbo [Thu, 22 Dec 2016 02:10:36 +0000 (11:10 +0900)]
Add pgpool_adm extension.
Tatsuo Ishii [Tue, 20 Dec 2016 07:50:57 +0000 (16:50 +0900)]
Add release 3.1-3.6 release notes.
The contents are not accurate at this moment except 3.6.1. They are
just copy of 3.5.5 release note.
Tatsuo Ishii [Tue, 20 Dec 2016 07:23:09 +0000 (16:23 +0900)]
Add 3.5.5 release note.
Tatsuo Ishii [Thu, 22 Dec 2016 01:24:25 +0000 (10:24 +0900)]
Update Pgpool-II version to "3.7devel".
Tatsuo Ishii [Tue, 20 Dec 2016 02:38:12 +0000 (11:38 +0900)]
Fix occasional segfault when query cache is enabled.
Per bug 263.
Tatsuo Ishii [Tue, 20 Dec 2016 01:28:20 +0000 (10:28 +0900)]
Fix packet kind does not match error in extended protocol per bug 231.
According to the bug231, the bug seem to bite you if all of following
conditions are met:
- Streaming replication mode
- Load balance node is not node 0
- Extended protocol is used
- SELECT is executed, the statement is closed, then a transaction
command is executed
The sequence of how the problem bites is:
1) SELECT executes on statement S1 on the load balance node 1
2) Frontend send Close statement
3) Pgool-II forward it to backend 1
4) Frontend sends Parse, Bind, Execute of COMMIT
5) Pgool-II forward it to backend 0 & 1
6) Frontend sends sync message
7) Pgool-II forward it to backend 0 & 1
8) Backend 0 replies back Parse complete ("1"), while backend 1
replies back close complete ("3") because of #3.
9) Kind mismatch occurs
The solution is, in #3, let Pgpool-II wait for response from backend
1, but do not read the response message. Later on Pgpool-II's state
machine will read the response from it before the sync message is sent
in #6. With this, backend 1 will reply back "1" in #8, and the kind
mismatch error does not occur.
Also, fix not calling pool_set_doing_extended_query_message() when
receives Close message. (I don't know why it was missed).
New regression test "067.bug231" was added.
Tatsuo Ishii [Tue, 20 Dec 2016 01:25:44 +0000 (10:25 +0900)]
Enhance documentation.
"Tips for Installation" section added.
Tatsuo Ishii [Tue, 20 Dec 2016 01:13:55 +0000 (10:13 +0900)]
Enhance documentation.
Add "Tips for Installation" section added.
Tatsuo Ishii [Tue, 6 Dec 2016 04:28:36 +0000 (13:28 +0900)]
Fix a race condition in a signal handler per bug 265.
In child.c there's signal handler which calls elog. Since the signal
handler is not blocked against other signals while processing, deadlock
could occur in the system calls in the pgpool shutdown sequence. To
fix the problem, now the signal handler is blocked by using
POOL_SETMASK.
Ideally we should avoid calling elog in signal handlers though.
Tatsuo Ishii [Thu, 24 Nov 2016 01:16:51 +0000 (10:16 +0900)]
Fix wrong minimum configuration value for client_idle_limit_in_recovery.
Per bug #264.
pengbo [Tue, 22 Nov 2016 03:43:35 +0000 (12:43 +0900)]
Merge branch 'master' of ssh://git.postgresql.org/pgpool2
pengbo [Tue, 22 Nov 2016 03:42:02 +0000 (12:42 +0900)]
Some changes in Makefile.in of doc and doc.ja
Tatsuo Ishii [Tue, 22 Nov 2016 02:21:01 +0000 (11:21 +0900)]
Allow to execute "make xslthtml" under doc.ja.
pengbo [Tue, 22 Nov 2016 02:16:13 +0000 (11:16 +0900)]
Change pgpool.spec to install sgml docs and man.
pengbo [Mon, 21 Nov 2016 10:41:37 +0000 (19:41 +0900)]
Add some necessary file to EXTRA_DIST to build sgml file.
pengbo [Mon, 21 Nov 2016 09:01:48 +0000 (18:01 +0900)]
Prepare 3.6.0
Yugo Nagata [Mon, 21 Nov 2016 01:07:20 +0000 (10:07 +0900)]
Unify the term in the Japanese document
Yugo Nagata [Sun, 20 Nov 2016 22:24:40 +0000 (07:24 +0900)]
Add some missing documents about load-balancing.
Some untranslated statement are also translated.
Muhammad Usama [Fri, 18 Nov 2016 10:20:07 +0000 (15:20 +0500)]
Adding AWS watchdog example
Tatsuo Ishii [Fri, 18 Nov 2016 02:46:48 +0000 (11:46 +0900)]
Translate AWS watchdog example.
Bo Peng [Wed, 16 Nov 2016 06:14:48 +0000 (15:14 +0900)]
Prepare 3.6RC1