-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Motivation
Currently, ghci cannot print readable Unicode directly,
ghci> 'λ'
'\955'
ghci> "世界"
"\19990\30028"
ghci> '😂'
'\128514'
According to the documentation,
-
Show
: "Conversion of values to readable Strings."Escaped strings are often not very readable.
-
showLitChar
: "Convert a character to a string using only printable characters, using Haskell source-language escape conventions. "It does not consider Unicode-printable characters at all.
Conclusion 1: There's a bug, either in the code, or in the doc.
You might argue that people are using Show
as a lightweight serialization mechanism, and this proposal will cause unexpected breakage for them. However, this breakage is expected only when they are sending serialized data over network, assuming ASCII-compatible encodings on both sides. And there is actually an inconsistency, and your code is probably buggy if you rely on escaping:
ghci> data Γ a = Γ a deriving Show
ghci> Γ 'Γ'
Γ '\915' -- Uh-oh.
Conclusion 2: If you rely on Show
's escaping for encoding-independent serialization today, your code is probably buggy.
Therefore, we have pretty strong reasons to correct the behavior of showLitChar
(and in turn show @Char
and show @String
):
- It's much friendlier to learners around the globe: you can now use your native language freely in ghci.
- Fix a longstanding inconsistency issue.
- Align better with the documentation.
String
is always a list of Unicode code points. Why escape it in the first place? Encoding and decoding is already handled elsewhere.showLitChar
remains unchanged since 1999, a time when GHC had no good Unicode support. They had reasons to escape then, we don't today.
See also previous discussions at gitlab, libraries@, r/haskell, and another proposal.
Proposed Changes
- Relax
showLitChar
so that readable Unicode characters inChar
andString
are not escaped. - Add a newtype wrapper
Ascii
, and supporting functions to really escape all non-ASCII characters. AndescapeAscii
guarantees thatoldshow x = asciiToString . escapeAscii . newshow $ x
wherex :: String
.
newtype Ascii = MkAscii String
escapeAscii :: String -> Ascii
asciiToString :: Ascii -> String
- GHCi supports only Unicode-compatible terminals after this proposal is implemented.
Breakage and Migration
This is a breaking change with a large impact. All libraries with custom Show
instances will probably need a major version bump, as their show
now can output different results. However,
- The new behavior is usually what you and the users really want. Even if the results are different. Maybe you don't need to do anything at all:
- Existing serialized data will still be supported by
Read
. - New serialized data will still be supported by an existing program compiled with an older version of base.
- Existing serialized data will still be supported by
- If you rely on the old behavior, consider using
asciiToString . escapeAscii
to get the old behavior back:
-- Previous
instance Show T where
show x = _tostring x
-- Now
instance Show T where
show x = asciiToString . escapeAscii $ _tostring x
Implementation
I will implement this proposal.
update: a PoC has been implemented here.