Accept plan change in xc_groupby regression test
authorTomas Vondra <[email protected]>
Fri, 28 Jul 2017 22:00:09 +0000 (00:00 +0200)
committerTomas Vondra <[email protected]>
Mon, 31 Jul 2017 01:18:22 +0000 (03:18 +0200)
The plan changed in two ways. Firstly, the targetlists changed due to
abandoning the custom distributed aggregation and reusing the upstream
partial aggregation code. That means we're not prefixing the aggregate
with schema name, etc.

The plan also switches from distributed aggregation to plain aggregation
with all the work done on top of a remote query. This happens simply due
to costing, as the tables are tiny and two-phase aggregation has some
overhead. The original implementation (as in XL 9.5) distributed the
aggregate unconditionally, ignoring the costing.

Parf of the problem is that the query groups by two columns from two
different tables, resulting in overestimation of the number of groups.
That means the optimizer thinks distributing the aggregation would not
reduce the number of rows, which increases the cost estimate as each
row requires network transfer and the finalize aggregate also depends
on the number of input rows.

We could make the tables larger and the optimizer would eventually
switch to distributed aggregate. For example this seems to do the
trick:

    insert into xc_groupby_tab1 select 1, mod(i,1000)
      from generate_series(1,20000) s(i);

    insert into xc_groupby_tab2 select 1, mod(i,1000)
      from generate_series(1,20000) s(i);

But it does not seem worth it, considering it's just a workaround
for the estimation issue and the increased duration. And we already
have other regression tests testing plausible queries benefiting from
distributed aggregation. So just accept the plan change.

src/test/regress/expected/xc_groupby.out

index cec10c1550109ed0c1ddb243ec2f3e47d33e85b5..cb3d397ae3ff617f66478b187b32717af97c6cb9 100644 (file)
@@ -5400,40 +5400,40 @@ select count(*), sum(xc_groupby_tab1.val * xc_groupby_tab2.val), avg(xc_groupby_
 (4 rows)
 
 explain (verbose true, costs false, nodes false) select count(*), sum(xc_groupby_tab1.val * xc_groupby_tab2.val), avg(xc_groupby_tab1.val*xc_groupby_tab2.val), sum(xc_groupby_tab1.val*xc_groupby_tab2.val)::float8/count(*), xc_groupby_tab1.val2, xc_groupby_tab2.val2 from xc_groupby_tab1 full outer join xc_groupby_tab2 on xc_groupby_tab1.val2 = xc_groupby_tab2.val2 group by xc_groupby_tab1.val2, xc_groupby_tab2.val2;
-                                                                                                                                                                      QUERY PLAN                                                                                                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                                                QUERY PLAN                                                                                                                                 
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  GroupAggregate
-   Output: pg_catalog.count(*), pg_catalog.sum((sum((xc_groupby_tab1.val * xc_groupby_tab2.val)))), pg_catalog.avg((avg((xc_groupby_tab1.val * xc_groupby_tab2.val)))), ((pg_catalog.sum((sum((xc_groupby_tab1.val * xc_groupby_tab2.val)))))::double precision / (pg_catalog.count(*))::double precision), xc_groupby_tab1.val2, xc_groupby_tab2.val2
+   Output: count(*), sum((xc_groupby_tab1.val * xc_groupby_tab2.val)), avg((xc_groupby_tab1.val * xc_groupby_tab2.val)), ((sum((xc_groupby_tab1.val * xc_groupby_tab2.val)))::double precision / (count(*))::double precision), xc_groupby_tab1.val2, xc_groupby_tab2.val2
    Group Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2
    ->  Remote Subquery Scan on all
-         Output: count(*), sum((xc_groupby_tab1.val * xc_groupby_tab2.val)), avg((xc_groupby_tab1.val * xc_groupby_tab2.val)), xc_groupby_tab1.val2, xc_groupby_tab2.val2
-         ->  GroupAggregate
-               Output: count(*), sum((xc_groupby_tab1.val * xc_groupby_tab2.val)), avg((xc_groupby_tab1.val * xc_groupby_tab2.val)), xc_groupby_tab1.val2, xc_groupby_tab2.val2
-               Group Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2
-               ->  Sort
+         Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val
+         Sort Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2
+         ->  Sort
+               Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val
+               Sort Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2
+               ->  Merge Full Join
                      Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val
-                     Sort Key: xc_groupby_tab1.val2, xc_groupby_tab2.val2
-                     ->  Merge Full Join
-                           Output: xc_groupby_tab1.val2, xc_groupby_tab2.val2, xc_groupby_tab1.val, xc_groupby_tab2.val
-                           Merge Cond: (xc_groupby_tab1.val2 = xc_groupby_tab2.val2)
+                     Merge Cond: (xc_groupby_tab1.val2 = xc_groupby_tab2.val2)
+                     ->  Remote Subquery Scan on all
+                           Output: xc_groupby_tab1.val, xc_groupby_tab1.val2
+                           Distribute results by H: val2
+                           Sort Key: xc_groupby_tab1.val2
+                           ->  Sort
+                                 Output: xc_groupby_tab1.val, xc_groupby_tab1.val2
+                                 Sort Key: xc_groupby_tab1.val2
+                                 ->  Seq Scan on public.xc_groupby_tab1
+                                       Output: xc_groupby_tab1.val, xc_groupby_tab1.val2
+                     ->  Materialize
+                           Output: xc_groupby_tab2.val, xc_groupby_tab2.val2
                            ->  Remote Subquery Scan on all
-                                 Output: xc_groupby_tab1.val2, xc_groupby_tab1.val
+                                 Output: xc_groupby_tab2.val, xc_groupby_tab2.val2
                                  Distribute results by H: val2
+                                 Sort Key: xc_groupby_tab2.val2
                                  ->  Sort
-                                       Output: xc_groupby_tab1.val2, xc_groupby_tab1.val
-                                       Sort Key: xc_groupby_tab1.val2
-                                       ->  Seq Scan on public.xc_groupby_tab1
-                                             Output: xc_groupby_tab1.val2, xc_groupby_tab1.val
-                           ->  Materialize
-                                 Output: xc_groupby_tab2.val2, xc_groupby_tab2.val
-                                 ->  Remote Subquery Scan on all
-                                       Output: xc_groupby_tab2.val2, xc_groupby_tab2.val
-                                       Distribute results by H: val2
-                                       ->  Sort
-                                             Output: xc_groupby_tab2.val2, xc_groupby_tab2.val
-                                             Sort Key: xc_groupby_tab2.val2
-                                             ->  Seq Scan on public.xc_groupby_tab2
-                                                   Output: xc_groupby_tab2.val2, xc_groupby_tab2.val
+                                       Output: xc_groupby_tab2.val, xc_groupby_tab2.val2
+                                       Sort Key: xc_groupby_tab2.val2
+                                       ->  Seq Scan on public.xc_groupby_tab2
+                                             Output: xc_groupby_tab2.val, xc_groupby_tab2.val2
 (32 rows)
 
 -- aggregates over aggregates