Skip to content

Conversation

@mdibaiee
Copy link
Member

@mdibaiee mdibaiee commented Oct 18, 2023

Description:

  • Databricks uses the Recovery Log is authoritative & Idempotent apply pattern of a materialization
  • This has been tested using: integration tests, and manual testing on the 25m document collection (imported_25m)

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

@mdibaiee mdibaiee force-pushed the mahdi/databricks branch 4 times, most recently from 10b19c0 to 5e17779 Compare October 25, 2023 14:08
@mdibaiee mdibaiee force-pushed the mahdi/databricks branch 10 times, most recently from c98fb69 to 3a25853 Compare November 8, 2023 11:54
@mdibaiee mdibaiee marked this pull request as ready for review November 9, 2023 14:47
@mdibaiee
Copy link
Member Author

mdibaiee commented Nov 9, 2023

What remains to be tested: since I have changed the logic of the CounterWriter, we need to test whether compression is still working fine. The non-compression case is working as I have tested it with Databricks.

UPDATE: tested this by running bigquery's integration tests 👍🏽

@mdibaiee mdibaiee force-pushed the mahdi/databricks branch 2 times, most recently from da101e8 to 5362209 Compare November 10, 2023 14:42
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments/considerations - looking good so far, this one is a bit of a beast.


type tableConfig struct {
Table string `json:"table" jsonschema:"title=Table,description=Name of the table" jsonschema_extras:"x-collection-name=true"`
Schema string `json:"schema" jsonschema:"title=Schema,description=Schema where the table resides,default=default"`
Copy link
Member

@williamhbaker williamhbaker Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with other materializations and usability, I suggest:

  • Not setting a default for schema in tableConfig
  • Making schema optional in tableConfig
  • In config, make schema_name a required field (note: I am assuming that you can't connect without some kind of schema specified), and maybe make the name of the equivalent fields in the tableConfig and config be the same
  • In (*config).Validate, error if schema_name hasn't been provided
  • In newTableConfig, by default initialize the schema to the schema from config, which will be over-written if there is a present schema property in the raw json of the resource config when it is unmarshalled elsewhere

This would support the case where most bindings use the same schema as set in the endpoint-level config, but just a few need to be edited to use a different schema, which can be done by just configuring it for those specific bindings. It would directly parallel the BigQuery and Snowflake materializations, which work in this way. As it is now, if you wanted to materialize all the bindings to a schema other than "default", you would need to set it for every single binding.

Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % remaining unresolved comments

@mdibaiee
Copy link
Member Author

Merging of this depends on gazette/core#352 so holding off until that PR is merged and landed

@mdibaiee mdibaiee merged commit 56c8554 into main Nov 21, 2023
@mdibaiee mdibaiee deleted the mahdi/databricks branch November 21, 2023 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants