Skip to content

Surprising behavior of ByteString literals via IsString #140

@parsonsmatt

Description

@parsonsmatt

At work, we discovered a somewhat surprising behavior of ByteString's IsString instance and interaction with OverloadedStrings.

The following REPL session demonstrates the issue:

λ> BS.unpack $ T.encodeUtf8 ("bla語" :: Text)
[98,108,97,232,170,158]
λ> BS.unpack $ ("bla語" :: BS.ByteString)
[98,108,97,158]
λ> T.decodeUtf8 $ ("bla語" :: BS.ByteString)
*** Exception: Cannot decode byte '\x9e': Data.Text.Internal.Encoding.decodeUtf8: Invalid UTF-8

The IsString instance calls packChars which calls c2w, which silently truncates the bytes.

I'd be happy to put together a PR to document the behavior of the IsString instance.

I think I expected it to encode the string using the source encoding. I don't know whether or not that's a feasible or desirable change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    blocked: ghcThis is blocked on a feature or primitive not yet available in a released GHC versiondocumentationpitfall

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions