Page MenuHomePhabricator

SVG Image query result downloads use incorrect encoding
Closed, ResolvedPublic

Description

T165228: Query results are downloaded in wrong encoding comes back to haunt us again, this time just for the SVG download. Minimal reproducer query:

#defaultView:BubbleChart
SELECT ("Ü" as ?üLabel) (1 AS ?size) {}

link

In the resulting SVG, the “Ü” is broken in a very peculiar way that I don’t even understand – it seems it’s UTF-8 encoded, but the encoded codepoints are U+00C3 “Ô U+0092 “PRIVATE USE TWO”, not U+00DC “Ü”.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
thiemowmde moved this task from incoming to needs discussion or investigation on the Wikidata board.
thiemowmde subscribed.

To me this looks like UTF-8 got read as if it was an 8 bit encoding (most probably ISO-8859-1), and then stored as UTF-8 again. If I do iconv -f UTF-8 -t ISO-8859-1 query.svg -o output.svg on the downloaded SVG, it's fixed.

Note this might be browser-depended. I did my tests with Firefox.

Change 386880 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[wikidata/query/gui@master] Don’t escape unescape characters in SVG download

https://gerrit.wikimedia.org/r/386880

Looks like the fix is actually pretty simple, see the above change. I tested this on both Firefox and Chromium, using the test string "Ü ☃ 𐐨" (one ISO-8859-1 character, one BMP character, and one non-BMP character).

Change 386880 merged by jenkins-bot:
[wikidata/query/gui@master] Don’t escape unescape characters in SVG download

https://gerrit.wikimedia.org/r/386880

Smalyshev claimed this task.
Smalyshev subscribed.

Should be OK now.