From: Tomas Vondra Date: Fri, 28 Jul 2017 22:00:09 +0000 (+0200) Subject: Accept plan change in xc_groupby regression test X-Git-Tag: XL_10_R1BETA1~194 X-Git-Url: http://git.postgresql.org/gitweb/static/gitweb.js?a=commitdiff_plain;h=f7dff6ae5b1a1eb16cc4b52308c741cc534faa4b;p=postgres-xl.git Accept plan change in xc_groupby regression test The plan changed in two ways. Firstly, the targetlists changed due to abandoning the custom distributed aggregation and reusing the upstream partial aggregation code. That means we're not prefixing the aggregate with schema name, etc. The plan also switches from distributed aggregation to plain aggregation with all the work done on top of a remote query. This happens simply due to costing, as the tables are tiny and two-phase aggregation has some overhead. The original implementation (as in XL 9.5) distributed the aggregate unconditionally, ignoring the costing. Parf of the problem is that the query groups by two columns from two different tables, resulting in overestimation of the number of groups. That means the optimizer thinks distributing the aggregation would not reduce the number of rows, which increases the cost estimate as each row requires network transfer and the finalize aggregate also depends on the number of input rows. We could make the tables larger and the optimizer would eventually switch to distributed aggregate. For example this seems to do the trick: insert into xc_groupby_tab1 select 1, mod(i,1000) from generate_series(1,20000) s(i); insert into xc_groupby_tab2 select 1, mod(i,1000) from generate_series(1,20000) s(i); But it does not seem worth it, considering it's just a workaround for the estimation issue and the increased duration. And we already have other regression tests testing plausible queries benefiting from distributed aggregation. So just accept the plan change. --- diff --git a/src/test/regress/expected/xc_groupby.out b/src/test/regress/expected/xc_groupby.out index cec10c1550..cb3d397ae3 100644 --- a/src/test/regress/expected/xc_groupby.out +++ b/src/test/regress/expected/xc_groupby.out @@ -5400,40 +5400,40 @@ select count(*), sum(xc_groupby_tab1.val * xc_groupby_tab2.val), avg(xc_groupby_ (4 rows) explain (verbose true, costs false, nodes false) select count(*), sum(xc_groupby_tab1.val * xc_groupby_tab2.val), avg(xc_groupby_tab1.val*xc_groupby_tab2.val), sum(xc_groupby_tab1.val*xc_groupby_tab2.val)::float8/count(*), xc_groupby_tab1.val2, xc_groupby_tab2.val2 from xc_groupby_tab1 full outer join xc_groupby_tab2 on xc_groupby_tab1.val2 = xc_groupby_tab2.val2 group by xc_groupby_tab1.val2, xc_groupby_tab2.val2; - QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + QUERY PLAN +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate - Output: pg_catalog.count(*), pg_catalog.sum((sum((xc_groupby_tab1.val * xc_groupby_tab2.val)))), pg_catalog.avg((avg((xc_groupby_tab1.val * xc_groupby_tab2.val)))), ((pg_catalog.sum((sum((xc_groupby_tab1.val * xc_groupby_tab2.val)))))::double precision / (pg_catalog.count(*))::double precision), xc_groupby_tab1.val2, xc_groupby_tab2.val2 + Output: count(*), sum((xc_groupby_tab1.val * xc_groupby_tab2.val)), avg((xc_groupby_tab1.val * xc_groupby_tab2.val)), ((sum((xc_groupby_tab1.val * xc_groupby_tab2.val)))::double precision / (count(*))::double precision), xc_groupby_tab1.val2, xc_groupby_tab2.val2 Group Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2 -> Remote Subquery Scan on all - Output: count(*), sum((xc_groupby_tab1.val * xc_groupby_tab2.val)), avg((xc_groupby_tab1.val * xc_groupby_tab2.val)), xc_groupby_tab1.val2, xc_groupby_tab2.val2 - -> GroupAggregate - Output: count(*), sum((xc_groupby_tab1.val * xc_groupby_tab2.val)), avg((xc_groupby_tab1.val * xc_groupby_tab2.val)), xc_groupby_tab1.val2, xc_groupby_tab2.val2 - Group Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2 - -> Sort + Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val + Sort Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2 + -> Sort + Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val + Sort Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2 + -> Merge Full Join Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val - Sort Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2 - -> Merge Full Join - Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val - Merge Cond: (xc_groupby_tab1.val2 = xc_groupby_tab2.val2) + Merge Cond: (xc_groupby_tab1.val2 = xc_groupby_tab2.val2) + -> Remote Subquery Scan on all + Output: xc_groupby_tab1.val, xc_groupby_tab1.val2 + Distribute results by H: val2 + Sort Key: xc_groupby_tab1.val2 + -> Sort + Output: xc_groupby_tab1.val, xc_groupby_tab1.val2 + Sort Key: xc_groupby_tab1.val2 + -> Seq Scan on public.xc_groupby_tab1 + Output: xc_groupby_tab1.val, xc_groupby_tab1.val2 + -> Materialize + Output: xc_groupby_tab2.val, xc_groupby_tab2.val2 -> Remote Subquery Scan on all - Output: xc_groupby_tab1.val2, xc_groupby_tab1.val + Output: xc_groupby_tab2.val, xc_groupby_tab2.val2 Distribute results by H: val2 + Sort Key: xc_groupby_tab2.val2 -> Sort - Output: xc_groupby_tab1.val2, xc_groupby_tab1.val - Sort Key: xc_groupby_tab1.val2 - -> Seq Scan on public.xc_groupby_tab1 - Output: xc_groupby_tab1.val2, xc_groupby_tab1.val - -> Materialize - Output: xc_groupby_tab2.val2, xc_groupby_tab2.val - -> Remote Subquery Scan on all - Output: xc_groupby_tab2.val2, xc_groupby_tab2.val - Distribute results by H: val2 - -> Sort - Output: xc_groupby_tab2.val2, xc_groupby_tab2.val - Sort Key: xc_groupby_tab2.val2 - -> Seq Scan on public.xc_groupby_tab2 - Output: xc_groupby_tab2.val2, xc_groupby_tab2.val + Output: xc_groupby_tab2.val, xc_groupby_tab2.val2 + Sort Key: xc_groupby_tab2.val2 + -> Seq Scan on public.xc_groupby_tab2 + Output: xc_groupby_tab2.val, xc_groupby_tab2.val2 (32 rows) -- aggregates over aggregates