Empty binary value can have two results in postgresql database #54

immae · 2024-02-09T11:36:27Z

We seem to have different treatment for Bytestrings as handled by LibPQ:

-- Create a table some_table with a nullable binary_column bytea type
--     CREATE TABLE some_table (id SERIAL,binary_column BYTEA);
ghci> import Database.PostgreSQL.LibPQ
ghci> c <- connectdb "postgresql://use-your-connection-string"
ghci> Just res <- execParams c "INSERT INTO \"some_table\" (binary_column) values ($1), ($2)" [Just (Oid 17,"",Binary), Just (Oid 17,mempty,Binary)] Binary

Now if you look in the table, you’ll have one NULL column and one with an empty bytea content

This seem to come from the fact that mempty and "" have different internal representation in Haskell, in spite of being equal:
(requires OverloadedStrings)

ghci> import Data.ByteString
ghci> a = mempty :: ByteString
ghci> b = "" :: ByteString
ghci> a == b
True
ghci> :force a
a = bytestring-0.11.5.3:Data.ByteString.Internal.Type.BS
      0x0000000000000000 GHC.ForeignPtr.FinalPtr 0
ghci> :force b
b = bytestring-0.11.5.3:Data.ByteString.Internal.Type.BS
      0x0000004200647b40
      (GHC.ForeignPtr.PlainPtr (ByteArray# <mutableByteArray>)) 0

This difference of treatment seem to be introduced by commit
89c7cb8

Reverting that commit indeed stores the two different values as an empty bytea.

The text was updated successfully, but these errors were encountered:

guibou · 2024-02-09T12:37:27Z

A few additional information we gathered when debuging this with @immae.

The operations over ByteString which can produce a null bytestring can be implemented differently. For example, mempty, drop, take are special cased to not allocate the destination array and use a null pointer. However, IsString, fromString (Hence "" with OverloadedStrings) does actually allocate a buffer of n items and copy them.
The commit pointed by @immae uses unsafeUseAsCString versus useAsCString. The difference is that the former does share the pointer with the ByteString (and hence the null pointer), when the later allocates a new buffer of the right size (actually, augmented of 1 to store the \0). So I think that the underying libpq is handling null as NULL and an not null pointer as a (here empty) string.
Even if it works for arbitrary blob contained in the ByteString because the length may be used elsewhere, I have doubt about the usage of the AsCString family of function. What happen if the semantic of these function change when a \0 is found.

I'm wondering how to fix that:

The optimization can be removed, but that's not desirable, especially if large ByteString are involved.
A special case, which detect empty bytestring and point to an already allocated empty one can be implemented?
Should the usage of AsCString functions be refactored to use something like useAsCStringLen?

guibou · 2024-02-09T13:19:16Z

I understand it now.

maybeWith returns a null pointer in case of Nothing or unsafeUseAsCString ptr in case of Just v. The next layer decide that this nullptr is a NULL and otherwise it is a blob and its length is passed using the c_lengths array.

But when unsafeUseAsCString returns a null ptr for an empty string, then it's NULL.

This was referenced Apr 16, 2024

unsafeUseAsCString(Len) may return a NULL pointer. haskell/bytestring#674

Closed

Fix issue with empty binary values #58

Merged

phadej closed this as completed in #58 Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty binary value can have two results in postgresql database #54

Empty binary value can have two results in postgresql database #54

immae commented Feb 9, 2024 •

edited

Loading

guibou commented Feb 9, 2024

guibou commented Feb 9, 2024

Empty binary value can have two results in postgresql database #54

Empty binary value can have two results in postgresql database #54

Comments

immae commented Feb 9, 2024 • edited Loading

guibou commented Feb 9, 2024

guibou commented Feb 9, 2024

immae commented Feb 9, 2024 •

edited

Loading