Skip to content

Conversation

@AndrewSisley
Copy link
Contributor

@AndrewSisley AndrewSisley commented Jul 22, 2025

Relevant issue(s)

Resolves #3698
Resolves #1520

To reviewers

This is a disgustingly large PR and I am sorry that anyone has to try and review it. About 3000 lines of the diff are CID test changes - I doubt it is worth reviewing any of the test changes outside of the tests/integration/schema and the tests/integration/collection_version directories - many of the lines within those directories are CID changes, but many of them are not.

I suggest reading this PR description carefully, and consider reviewing and commenting on it's contents before bothering with the code.

When reviewing the code, I suggest starting with the changes in the client package, followed by the crdt package, and then maybe the db package. db/collection_id.go contains some useful explanations as to why some stuff needs to exist and do what it does (particularly RE CollectionSets), it might be worth scanning the function level documentation within that file if anything in client confuses you initially.

Description

Removes SchemaDescription from user space.

Schema properties are now saved in the blockstore

The global properties (previously on schema and co) are now saved in the blockstore.

The blocks are defined in the crdt package in order to piggyback off of the existing block logic, even though atm they are not true CRDTs (they could become so later if we want). I would like to avoid a further refactor of the existing crdt code in this PR.

Only properties that have been mutated are stored in the blocks (a delta).

The CollectionSet concept that was largely contained within the old db/schema_id.go file is now also in the blockstore. This is required for circular subsets of collections, as they must reference each other by their relative location within the collection set instead of the related CollectionID, as otherwise we would be trying to form a circular DAG (dog=>cat=>dog=>cat=>...). There is more documentation on this in db/collection_id.go (renamed from the original db/schema_id.go).

Schema properties are now hosted on Collection

The properties on SchemaDescription, and SchemaFieldDescription have been moved onto CollectionVersion and CollectionFieldDescription. The following changes were made to facilitate this:

  • IsPlaceholder on CollectionVersion has been added, as we can no longer use the lack of a SchemaDescription to determine whether a migration was uploaded before the schema versions were defined.
  • IsPrimary has been added to CollectionFieldDefinition as we can no longer use the lack of a SchemaFieldDescription to determine whether a field is the primary side of a relation.

This also means that PatchSchema has been removed, and the properties ported over are now available via PatchCollection.

PatchCollection has gained some of the substitution logic that PatchSchema had, such as referencing by collection name instead of ID. It also activates new versions by default - given the way patching works, I think this will make a lot more sense to users, and the old setActive boolean is very confusing with edits to multiple collections, and those that do not result in new versions. The new version may still be explicitly set to inactive, by patching the IsActive property.

The internal logic around orphan collection versions (when a migration between versions was added before the versions existed locally) had to change slightly because of this.

Collection IDs are now the Block CIDs

The IDs on CollectionVersion and its children are no longer formed by hashing the json representation of the client objects, instead they are now the CIDs of the blocks. This includes CollectionID, CollectionVersion.VersionID, CollectionFieldDescription.FieldID, and CollectionVersion.CollectionSetID.

This gives fields a stable, deterministic, global, field ID. This FieldID is not unique to the collection, and may be found on fields belonging to other collections - to do otherwise would create a circular dependency between field and collection.

Collection Fields can now be deleted via patch

Collection Fields can now be deleted via patch - the blocking off of this was already broken, as users were able to add a field, then toggle the version back to the previous (essentially removing the field). This became even more flimsy with the other changes in this PR, and fairly easy to handle, thanks to the new FieldID.

Collection short field ids are indexed by FieldID now in order to prevent their reuse, as well as just being nicer IMO.

Follow-up changes

The following changes should be done soon, but were excluded from this silly-sized PR in order to not make it even bigger.

  • The patch related tests need to be reviewed, and collapsed into a single set - I have left the existing tests in the schema directory (after converting them to PatchCollection, as moving them and de-duplicating them with the existing patch collection tests would significantly increase the line count, and obfuscate the behavioural test changes made to them within this PR. Merge and review patch schema/collection tests #3843
  • client.CollectionDefinition has not been removed in this PR - it is referenced in a large number of places throughout the repo - removing it here would increase the line count, obsfucating changes that should be reviewed more carefully, and increasing the likelihood of merge conflicts. Remove client.CollectionDefinition #3845
  • The DAG fetcher (used for commit queries) scans the entire headstore by default and then discards the collection/field/set commits. It is messy and inefficient, but not breaking, and is not used very often, so I have broken this out to Only scan relevant doc-commits in commit queries #3846 . We likely want to allow users to fetch the collection-commits via commit queries, but we may not want them bundled in with doc commits.
  • Some patch stuff is hard to validate, as we have no mechanic to validate the user patch before the system logic. Some of the error messages are not very friendly, and at least one operation is forced to silently ignore the user because of this. I did not feel like including this complication in the current PR, but it should be sorted soon. Add pre-system logic PatchCollection validation #3853
  • The http collection components all exist under schemas, small change, but easy to break out Http collection component refs are all under schema #3865

@AndrewSisley AndrewSisley added this to the DefraDB v0.19 milestone Jul 22, 2025
@AndrewSisley AndrewSisley self-assigned this Jul 22, 2025
@AndrewSisley AndrewSisley added area/collections Related to the collections system refactor This issue specific to or requires *notable* refactoring of existing codebases and components labels Jul 22, 2025
@AndrewSisley AndrewSisley force-pushed the 3698-field-def-block branch 2 times, most recently from 4884b5e to 7e9a68f Compare July 22, 2025 19:14
@codecov
Copy link

codecov bot commented Jul 22, 2025

Codecov Report

❌ Patch coverage is 84.16779% with 234 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.02%. Comparing base (f90ad6f) to head (3e15f0e).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
...ternal/keys/headstore_collection_set_definition.go 34.78% 30 Missing ⚠️
cbindings/logic/collection.go 0.00% 25 Missing ⚠️
internal/keys/headstore_field_definition.go 55.56% 22 Missing and 2 partials ⚠️
internal/request/graphql/schema/collection.go 57.69% 20 Missing and 2 partials ⚠️
internal/keys/headstore_collection_definition.go 56.52% 18 Missing and 2 partials ⚠️
internal/db/collection_define.go 91.43% 12 Missing and 6 partials ⚠️
internal/db/definition_validation.go 88.89% 11 Missing and 7 partials ⚠️
internal/db/collection_id.go 95.98% 11 Missing and 5 partials ⚠️
internal/db/description/collection.go 63.64% 11 Missing and 5 partials ⚠️
cli/collection_patch.go 66.67% 8 Missing and 1 partial ⚠️
... and 9 more
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3842      +/-   ##
===========================================
+ Coverage    72.76%   73.02%   +0.26%     
===========================================
  Files          462      463       +1     
  Lines        42621    42068     -553     
===========================================
- Hits         31009    30716     -293     
+ Misses        9521     9317     -204     
+ Partials      2091     2035      -56     
Flag Coverage Δ
all-tests 73.02% <84.17%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
cbindings/logic/schema.go 0.00% <ø> (ø)
cli/cli.go 100.00% <100.00%> (ø)
cli/collection_set_active.go 100.00% <100.00%> (ø)
client/collection.go 100.00% <ø> (ø)
client/collection_description.go 82.56% <100.00%> (+0.85%) ⬆️
client/collection_field_description.go 70.00% <100.00%> (+3.33%) ⬆️
client/db.go 89.29% <ø> (ø)
client/document.go 64.61% <100.00%> (+0.46%) ⬆️
http/client.go 55.22% <100.00%> (-1.01%) ⬇️
http/client_collection.go 51.32% <100.00%> (-0.36%) ⬇️
... and 54 more

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f90ad6f...3e15f0e. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AndrewSisley AndrewSisley force-pushed the 3698-field-def-block branch 4 times, most recently from e11c4f6 to 1686c11 Compare July 23, 2025 15:08
@AndrewSisley AndrewSisley force-pushed the 3698-field-def-block branch 6 times, most recently from 3e94bff to 5bca807 Compare July 23, 2025 17:53
require.NoError(t, err)

cdb, err := db.NewDB(ctx, memory.NewDatastore(ctx), dac.NoDocumentACP, nil)
cdb, err := db.NewDB(ctx, store, dac.NoDocumentACP, nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: The compiler decided to hit an existing corekv mem store deadlock bug due to unrelated changes, so I moved it to badger. Most our other tests had already moved to badger because of this but this one had managed to stay passing up until now.

@AndrewSisley AndrewSisley force-pushed the 3698-field-def-block branch from 5bca807 to 289200a Compare July 23, 2025 18:13
//
// For example Field [FieldKind] string representations will be replaced by the raw integer
// value.
func substituteCollectionPatch(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This has been copy-pasted from the schema code, very few changes have been made - IIRC the only new feature is support for substitution in "from" paths, which had been missed in the original.

// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

package db
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This is the old db/schema_id file but renamed, and modified.

The circular detection and grouping is largely unchanged. sortCollectionSets is new, as the sets need to be handled in the order that they exist within the dependency/relationship DAG so that the CollectionIDs can be added to the CollectionKind, before creating the host blocks.

substituteSecondaryRelationFieldKinds is also new.

@AndrewSisley AndrewSisley force-pushed the 3698-field-def-block branch 2 times, most recently from 023a53f to ec9d8b9 Compare July 23, 2025 18:46
fieldID string,
) (uint32, error) {
// This concatenation is temporary, soon we can just use the field CID
uniqueKey := strconv.Itoa(int(collectionShortID)) + ":" + fieldID
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This turned out to be not so temporary, as the short ids are collection specific, but the field ids are not. Removing the collection short id prefix results in the wrong field short ids being returned.

return err
}

if dstCol.CollectionID != "" { // todo- this makes no sense
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: The old todo I accidentally left in from an old PR that should have dealt with it proved correct - it made no sense, and did nothing useful, so now it is gone.

@AndrewSisley AndrewSisley force-pushed the 3698-field-def-block branch 2 times, most recently from bb6c2f5 to e1005a7 Compare July 23, 2025 19:23
@AndrewSisley AndrewSisley force-pushed the 3698-field-def-block branch 4 times, most recently from 53b30d5 to 38928cd Compare August 7, 2025 16:54
Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this Andy!

@AndrewSisley AndrewSisley merged commit 9b8f3ca into sourcenetwork:develop Aug 12, 2025
45 of 51 checks passed
@AndrewSisley AndrewSisley deleted the 3698-field-def-block branch August 12, 2025 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/collections Related to the collections system refactor This issue specific to or requires *notable* refactoring of existing codebases and components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move SchemaFieldDescription to block store Name clash with relational id fields

2 participants