-
Notifications
You must be signed in to change notification settings - Fork 193
Optimize Materialized View Status Maintenance for Partitioned Tables #990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2e00c25 to
e5c13e7
Compare
|
Added commit: e5c13e7 SINGLENODE and EntryDB materialized view status maintenance optimization. In SINGLENODE mode or when operating as an entry db, we cannot use This change moves the processing of modified relations to |
|
TPC-B before this commit:
after this commit:
Summary
|
yjhjstz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
e5c13e7 to
e2506b3
Compare
Previously, write operations on partitioned tables would alter the materialized view (MV) status of both ancestor and descendant tables, leading to unnecessary invalidations of MVs that depended on unaffected partitions. This commit introduces a more efficient approach to updating MV statuses for partitioned tables by focusing on leaf partitions that have actually undergone data changes. Key changes include: - Leaf Partition Tracking: Instead of updating the MV status based on the parent table, we now detect and track changes at the leaf partition level. This ensures that only partitions with actual data modifications trigger MV status updates, significantly reducing unnecessary refreshes. - Cross-Partition Updates: For operations like UPDATEs that span multiple partitions (e.g., decomposed into an INSERT on one leaf and a DELETE on another), the MV status is updated based on the real data state of the affected leaf partitions, rather than propagating changes through the parent table. This avoids invalidating MVs that depend on unrelated partitions. - Executor Enhancements: The executor now detects dynamic partition expansion during query execution and records data modification bitmaps for each partition on the QE. These bitmaps are aggregated on the QD to update MV statuses accurately. - Protocol Extension: The libpq protocol has been extended to handle QE feedback uniformly, ensuring that partition-level modification information is properly collected and consumed by the QD. This optimization minimizes the impact on MVs by ensuring that only relevant partitions trigger status updates, improving performance and reducing unnecessary invalidations. Authored-by: Zhang Mingli [email protected]
…ion. In SINGLENODE mode or when operating as an entry db, we cannot use the extend libpq to send messages because we are already functioning in a QD role. Processing modified relations at `EndModifyTable()` is too late, especially since materialized views are updated when the executor ends. This change moves the processing of modified relations to `ExecModifyTable()` to ensure timely handling of modifications. This avoids issues with materialized view updates and ensures correct behavior in SINGLENODE or entry DB scenarios. Authored-by: Zhang Mingli [email protected]
e2506b3 to
939f936
Compare
Previously, write operations on partitioned tables would alter the materialized view (MV) status of both ancestor and descendant tables, leading to unnecessary invalidations of MVs that depended on unaffected partitions.
This commit introduces a more efficient approach to updating MV statuses for partitioned tables by focusing on leaf partitions that have actually undergone data changes.
example:
INSERT INTO Partitioned Table
before this commit:
It is known that data was successfully inserted into P1, which is a descendant node of P0. Due to the increased data, the status needs to be propagated upwards to invalidate the materialized view MV_0 based on P0.
At the same time, P1 is a partitioned table, and it also needs to propagate the status downwards to its child tables. Since it is unclear which child tables have undergone data changes, all materialized views of the child tables must be invalidated.
Consequently, the materialized views MV_0, MV_1, MV_1_1, and MV_1_2 are all invalidated.
after this commit:
When data is inserted into a partitioned table, it is ultimately routed to the child tables at the leaf nodes. The system monitors these leaf nodes for any data changes and propagates the status upward, while ignoring the changes directly made to the partition table P0 in the SQL.
The tuple (1, 1, 1) will be routed to the partition P1_1, invalidating the materialized views of P1_1 and its ancestor partition tables. In contrast, the other sub-partition P1_2 experiences no data changes, so the materialized view MV_1_2 remains valid.
As a result, the materialized views MV_0, MV_1, and MV_1_1 are invalidated, while MV_1_2 remains unaffected.
The new approach allows for status propagation only from the leaf nodes upward, rather than sending status updates in both directions through the intermediate nodes of the partition tree. This safeguards unrelated materialized views from unnecessary refreshes.
(Split) Across Partition Update on root table
before this commit:
When the data in the root node is updated, and since it is unclear which specific child table was updated, the status is propagated downwards to all descendant child tables.
As a result, all materialized views based on the partition tree are invalidated: MV_0, MV_1, MV_1_1, MV_1_2, and MV_2_2.
after this commit:
Since
cis the partition key, updating the partition key will cause data movement across partitions in the child tables. The UPDATE operation on the parent table will be translated into an INSERT for one child table and a DELETE for another.The system monitors the changed leaf node child tables: P2_1 and P2_2, and only propagates their respective statuses upward, invalidating the materialized views based on their ancestor tables. The materialized views related to the left subtree P1 remain unaffected.
As a result, MV_0 and MV_2_2 are invalidated, while MV_1, MV_1_1, and MV_1_2 remain valid.
Extend protocol of libpq
The executor (QE) records the modified partition child tables during the data modification process. A mechanism is needed to transmit these results to the QD for updating the metadata. The libpq protocol has been extended to support this data transfer.
Processing on the QE and QD sides:
Each QE records the IDs of the child tables it has modified, deduplicates them in a local bitmap, and sends this information to the QD.
The data changes on each segment node may not be the same. For example, in the diagram, seg0 has changes in two tables, while seg1 has no changes (as the data to be updated is not located on the seg1 node due to distribution).
The QD collects multiple bitmaps returned from the segment nodes and consolidates them into a single bitmap, removing duplicate relids. This deduplicated bitmap is then used to update the status of the ancestor nodes.
Extend Protocol Design
Based on libpq, the protocol has been extended with an 'e' protocol, which stands for "extend protocol," to unify the handling of information returns.
Each transmission begins with the 'e' protocol, followed immediately by data blocks. A single transmission can carry multiple data blocks.
Each data block corresponds to a specific type of data and is distinguished by a subtag. Following the subtag is a 32-bit integer, num, which indicates the total number of relids in the subsequent array.
The relid array follows the num value. After all data blocks have been written, the transmission concludes with a subtag_max.
The subtag_max, which belongs to the same enum category as the subtag, is specifically used to indicate the termination of an extended protocol transmission.
Key changes:
Leaf Partition Tracking: Instead of updating the MV status based on the parent table, we now detect and track changes at the leaf partition level. This ensures that only partitions with actual data modifications trigger MV status updates, significantly reducing unnecessary refreshes.
Cross-Partition Updates: For operations like UPDATEs that span multiple partitions (e.g., decomposed into an INSERT on one leaf and a DELETE on another), the MV status is updated based on the real data state of the affected leaf partitions, rather than propagating changes through the parent table. This avoids invalidating MVs that depend on unrelated partitions.
Executor Enhancements: The executor now detects dynamic partition expansion during query execution and records data modification bitmaps for each partition on the QE. These bitmaps are aggregated on the QD to update MV statuses accurately.
Protocol Extension: The libpq protocol has been extended to handle QE feedback uniformly, ensuring that partition-level modification information is properly collected and consumed by the QD.
This optimization minimizes the impact on MVs by ensuring that only relevant partitions trigger status updates, improving performance and reducing unnecessary invalidations.
Authored-by: Zhang Mingli [email protected]
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions