Skip to content

Use Postgres' RETURNING capability to optimize insertMany. #407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 8, 2015

Conversation

MaxGabriel
Copy link
Member

Closes #365.

This PR uses Postgres' RETURNING capability to optimize bulk-inserts. Instead of inserting one-at-a-time, the rows can be inserted all at once and have their IDs returned. Based on my comment here, I don't think this is possible for SQLite/MySQL in a reliable way.

I believe test coverage is already pretty good on this, because several tests use insertMany, get back IDs, and then use those IDs to lookup the inserted rows.

Things I'm unsure about:

  • The different InsertSqlResult types. Is ISRManyKeys supposed to be used when a composite primary key is defined?
  • Version bumps. Is this considered a breaking change because it adds a new record field to public API?

@gregwebs
Copy link
Member

gregwebs commented Jun 2, 2015

This should re-use functionality from Sql/Util.hs. In particular, this will fail at least for composite keys and should use dbIdColumns for the returning clause. The test case for this should be added, perhaps in CompositeTest or PrimaryTest.

@MaxGabriel
Copy link
Member Author

Ok I added a failing test case for the composite primary key case, and then fixed that by adding a RawSql instance for Keys and using dbIdColumnsEsc from Database.Persist.Sql.Util. Probably I should add more tests for the RawSql instance, but is this generally correct otherwise?

@gregwebs
Copy link
Member

gregwebs commented Jun 6, 2015

I actually haven't touched RawSql before, but it looks good now. Perhaps @snoyberg can look at the RawSql instance.

@gregwebs
Copy link
Member

gregwebs commented Jun 6, 2015

One more thing, the documentation for insertMany_ and insertMany should be updated now to reflect which SQL databases have efficient implementations.

rawSqlCols _ key = (length $ keyToValues key, [])
rawSqlColCountReason key = "The primary key is composed of "
++ (show $ length $ keyToValues key)
++ " columns"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to write rawSqlColCountReason? Maybe a way to list out the components of the key?

* `RawSql` instance for `Key`s
* Optimize `insertMany` for Postgres backend

[ci skip]
@gregwebs
Copy link
Member

gregwebs commented Jun 7, 2015

@MaxGabriel The changes look good now, thanks for the great patch with the updated ChangeLog and documentation!

@snoyberg this bumps persistent to 2.2 because the SqlBackend gets a connInsertManySql field added. However, this is only really an API breakage for someone implementing a new SQL backend, so we could probably avoid the major version bump if we wanted to. What do you think?

@snoyberg
Copy link
Member

snoyberg commented Jun 8, 2015

I'm OK skipping the major version bump, we typically do so for internal-only changes like this.

gregwebs added a commit that referenced this pull request Jun 8, 2015
Use Postgres' RETURNING capability to optimize `insertMany`.
@gregwebs gregwebs merged commit e944857 into master Jun 8, 2015
@gregwebs gregwebs deleted the postgresInsertManyReturning3 branch June 8, 2015 17:33
@gregwebs
Copy link
Member

gregwebs commented Jun 8, 2015

hmm, I am having second thoughts on the minor version bump. If someone pegs their backend version, but not persistent, then re-installing dependencies can cause a breakage. If we do the major bump, then at least we can say persistent should have been constrained to a major version.

@snoyberg are you ok with making this a major bump?

@snoyberg
Copy link
Member

snoyberg commented Jun 8, 2015

Yes, I'm OK with it. Just to throw out one other possibility: we can add upper bounds via Hackage revisions to prevent the currently-released backends from using the newer persistent. I slightly prefer that approach, but only slightly.

@gregwebs
Copy link
Member

gregwebs commented Jun 8, 2015

I tagged and released them all as version 2.2. That is enough work for me without changing a bunch of revisions on hackage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Single-statement bulk insert for SQL backends
3 participants