Skip to content

add check to gc to validate metadata scan is complete#3703

Merged
EdColeman merged 16 commits intoapache:1.10from
EdColeman:gc_meta_validation
Sep 19, 2023
Merged

add check to gc to validate metadata scan is complete#3703
EdColeman merged 16 commits intoapache:1.10from
EdColeman:gc_meta_validation

Conversation

@EdColeman
Copy link
Contributor

@EdColeman EdColeman commented Aug 16, 2023

Add validation that dir and prevRow are present when the gc is scanning for in-use file references. If a row is processed and the dir and prevRow entries are not present, an IllegalStateException is throw to abort the current gc cycle.

This is to help detect / prevent partial metadata scans from missing file references that are in use.

Fixes #3696 for 1.10

@EdColeman EdColeman self-assigned this Aug 16, 2023
@EdColeman EdColeman added the bug This issue has been verified to be a bug. label Aug 16, 2023
@EdColeman EdColeman changed the base branch from main to 1.10 August 16, 2023 22:49
gce.candidates.add("/1636/default_tablet");
gce.addDirReference("1636", null, "/default_tablet");
assertThrows(IllegalStateException.class, () -> gca.collect(gce));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a third test for tablet row that has files, but no dir or prev row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this but my first attempts fail to detect when there are only file entries - I think it is that the row is not being updated - and could be an artifact of the test env.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests added in 01e47c6 cover this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keith-turner f98bb39 tries to account for having at least one table / tablet scanned. Not sure if this is the correct approach or a valid assumption. For 1.x it may be (rep, and trace?), but not sure if there is a better way?

The idea was that if only ~del markers are seen and no rows, then probably best not to proceed.

And if there are no tablets - well there should not be much to gc anyway...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misinterpreted the suggestion. b187550 adds test for case where only prev row is present, The checking for at least one tablet seen was not necessary and has been reverted with b187550

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is what I was thinking about, a test where there is a file but no dir or prev row.

  @Test
  public void testNoPrevRowNoDir() throws Exception {

    GarbageCollectionAlgorithm gca = new GarbageCollectionAlgorithm();

    TestGCE gce = new TestGCE();
    gce.candidates.add("/1636/default_tablet");
    gce.addFileReference("b", "m", "hdfs://foo.com:6000/user/foo/tables/b/t-0/F00.rf");
    assertThrows(IllegalStateException.class, () -> gca.collect(gce));
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test added with 3bba72a

@ctubbsii
Copy link
Member

Just mentioning that this is a mitigation for #608 (would like that to be noted in the eventual git commit when this gets merged).

Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestions are comments/style. Functionally, this PR seems good to me. Not sure how difficult it will be to merge forward into 2.1 and main, though. If there are any conflicts in the gc algorithm class, my suggestion to move the tracker class to its own file would definitely help reduce conflicts and make the merge easier, in addition to other benefits of being rigorous about encapsulation and separation of responsibilities.

* Track metadata rows read to help validate that gc scan has complete information to make a
* decision on deleting files
*/
private static class MetadataReadTracker implements AutoCloseable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tracker class is now a substantial portion of the algorithm class file. I would consider moving this class out to its own separate file in the same package. Since it's a static class and shouldn't be tightly coupled to any of the algorithm class' state, this should be easy to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tracker class is not expected to be used outside of the GarbageCollectionAlgorithm - it is not a general class, more of an implementation detail. Moving it outside of the GCA seems to imply that other classes might want to use it. Currently, it was not designed for anything else other than tracking the metadata seen in the context of the GCA metadata scan. If other classes need similar functionality, then this could be refactored to a utility somewhere? But until that need is identified, keeping it internal to GCA seems proper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 3bba72a the class was renamed MetadataRowReadTracker. Looking at porting the check to 2.1, Ample changes / simplifies the check and this class is not needed to track reading a row-at-at-time. Because this is a one-off and will not be carried forward, I favor keeping this enclosed with the GCA for now.

EdColeman and others added 5 commits September 5, 2023 14:47
…onAlgorithm.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
…onAlgorithm.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
…onAlgorithm.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
…onAlgorithm.java

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
@EdColeman EdColeman merged commit 19719b5 into apache:1.10 Sep 19, 2023
@EdColeman EdColeman deleted the gc_meta_validation branch September 19, 2023 17:18
@ctubbsii ctubbsii added this to the 1.10.4 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug This issue has been verified to be a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add metadata table validation to AccumuloGC

3 participants