Skip to content

Empty binary value can have two results in postgresql database #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
immae opened this issue Feb 9, 2024 · 2 comments · Fixed by #58
Closed

Empty binary value can have two results in postgresql database #54

immae opened this issue Feb 9, 2024 · 2 comments · Fixed by #58

Comments

@immae
Copy link

immae commented Feb 9, 2024

We seem to have different treatment for Bytestrings as handled by LibPQ:

-- Create a table some_table with a nullable binary_column bytea type
--     CREATE TABLE some_table (id SERIAL,binary_column BYTEA);
ghci> import Database.PostgreSQL.LibPQ
ghci> c <- connectdb "postgresql://use-your-connection-string"
ghci> Just res <- execParams c "INSERT INTO \"some_table\" (binary_column) values ($1), ($2)" [Just (Oid 17,"",Binary), Just (Oid 17,mempty,Binary)] Binary

Now if you look in the table, you’ll have one NULL column and one with an empty bytea content

This seem to come from the fact that mempty and "" have different internal representation in Haskell, in spite of being equal:
(requires OverloadedStrings)

ghci> import Data.ByteString
ghci> a = mempty :: ByteString
ghci> b = "" :: ByteString
ghci> a == b
True
ghci> :force a
a = bytestring-0.11.5.3:Data.ByteString.Internal.Type.BS
      0x0000000000000000 GHC.ForeignPtr.FinalPtr 0
ghci> :force b
b = bytestring-0.11.5.3:Data.ByteString.Internal.Type.BS
      0x0000004200647b40
      (GHC.ForeignPtr.PlainPtr (ByteArray# <mutableByteArray>)) 0

This difference of treatment seem to be introduced by commit
89c7cb8

Reverting that commit indeed stores the two different values as an empty bytea.

@guibou
Copy link

guibou commented Feb 9, 2024

A few additional information we gathered when debuging this with @immae.

  • The operations over ByteString which can produce a null bytestring can be implemented differently. For example, mempty, drop, take are special cased to not allocate the destination array and use a null pointer. However, IsString, fromString (Hence "" with OverloadedStrings) does actually allocate a buffer of n items and copy them.
  • The commit pointed by @immae uses unsafeUseAsCString versus useAsCString. The difference is that the former does share the pointer with the ByteString (and hence the null pointer), when the later allocates a new buffer of the right size (actually, augmented of 1 to store the \0). So I think that the underying libpq is handling null as NULL and an not null pointer as a (here empty) string.
  • Even if it works for arbitrary blob contained in the ByteString because the length may be used elsewhere, I have doubt about the usage of the AsCString family of function. What happen if the semantic of these function change when a \0 is found.

I'm wondering how to fix that:

  • The optimization can be removed, but that's not desirable, especially if large ByteString are involved.
  • A special case, which detect empty bytestring and point to an already allocated empty one can be implemented?
  • Should the usage of AsCString functions be refactored to use something like useAsCStringLen?

@guibou
Copy link

guibou commented Feb 9, 2024

I understand it now.

maybeWith returns a null pointer in case of Nothing or unsafeUseAsCString ptr in case of Just v. The next layer decide that this nullptr is a NULL and otherwise it is a blob and its length is passed using the c_lengths array.

But when unsafeUseAsCString returns a null ptr for an empty string, then it's NULL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants