Skip to content

Proposal: showLitChar (and show @Char) shouldn't escape readable Unicode characters #26

@ksqsf

Description

@ksqsf

Motivation

Currently, ghci cannot print readable Unicode directly,

ghci> 'λ'
'\955'
ghci> "世界"
"\19990\30028"
ghci> '😂'
'\128514'

According to the documentation,

  • Show: "Conversion of values to readable Strings."

    Escaped strings are often not very readable.

  • showLitChar: "Convert a character to a string using only printable characters, using Haskell source-language escape conventions. "

    It does not consider Unicode-printable characters at all.

Conclusion 1: There's a bug, either in the code, or in the doc.

You might argue that people are using Show as a lightweight serialization mechanism, and this proposal will cause unexpected breakage for them. However, this breakage is expected only when they are sending serialized data over network, assuming ASCII-compatible encodings on both sides. And there is actually an inconsistency, and your code is probably buggy if you rely on escaping:

ghci> data Γ a = Γ a deriving Show
ghci> Γ 'Γ'
Γ '\915'   -- Uh-oh.

Conclusion 2: If you rely on Show's escaping for encoding-independent serialization today, your code is probably buggy.

Therefore, we have pretty strong reasons to correct the behavior of showLitChar (and in turn show @Char and show @String):

  1. It's much friendlier to learners around the globe: you can now use your native language freely in ghci.
  2. Fix a longstanding inconsistency issue.
  3. Align better with the documentation.
  4. String is always a list of Unicode code points. Why escape it in the first place? Encoding and decoding is already handled elsewhere.
  5. showLitChar remains unchanged since 1999, a time when GHC had no good Unicode support. They had reasons to escape then, we don't today.

See also previous discussions at gitlab, libraries@, r/haskell, and another proposal.

Proposed Changes

  • Relax showLitChar so that readable Unicode characters in Char and String are not escaped.
  • Add a newtype wrapper Ascii, and supporting functions to really escape all non-ASCII characters. And escapeAscii guarantees that oldshow x = asciiToString . escapeAscii . newshow $ x where x :: String.
newtype Ascii = MkAscii String

escapeAscii :: String -> Ascii
asciiToString :: Ascii -> String
  • GHCi supports only Unicode-compatible terminals after this proposal is implemented.

Breakage and Migration

This is a breaking change with a large impact. All libraries with custom Show instances will probably need a major version bump, as their show now can output different results. However,

  1. The new behavior is usually what you and the users really want. Even if the results are different. Maybe you don't need to do anything at all:
    • Existing serialized data will still be supported by Read.
    • New serialized data will still be supported by an existing program compiled with an older version of base.
  2. If you rely on the old behavior, consider using asciiToString . escapeAscii to get the old behavior back:
-- Previous
instance Show T where
  show x = _tostring x

-- Now
instance Show T where
  show x = asciiToString . escapeAscii $ _tostring x

Implementation

I will implement this proposal.

update: a PoC has been implemented here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    abandonedAbandoned by proposer (no-show)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions