Page MenuHomePhabricator

Wikitext cleanup is not fully Unicode compatible
Closed, ResolvedPublic2 Estimated Story PointsBUG REPORT

Description

It appears like certain regular expressions in the FileImporter wikitext cleanup code are not Unicode-safe. For example, the link [[兵庫県立考古博物館]] doesn't get the interwiki prefix. It should become [[:ja:兵庫県立考古博物館]], but doesn't.

Example: https://commons.wikimedia.org/wiki/Special:Diff/435420995
Reported here: https://www.mediawiki.org/wiki/Topic:Vqwxi9chttpgxdy7

Event Timeline

Change 616460 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Fix several regex patterns not being Unicode-aware

https://gerrit.wikimedia.org/r/616460

thiemowmde set the point value for this task to 2.
thiemowmde changed the subtype of this task from "Task" to "Bug Report".
thiemowmde moved this task from Backlog to Tickets in sprint on the Move-Files-To-Commons board.

Change 616460 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Fix several regex patterns not being Unicode-aware

https://gerrit.wikimedia.org/r/616460

Lena_WMDE moved this task from Demo to Done on the WMDE-QWERTY-Sprint-2020-07-22 board.

Change 640799 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Make some more regex patterns Unicode-aware

https://gerrit.wikimedia.org/r/640799

Change 640799 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Make some more regex patterns Unicode-aware

https://gerrit.wikimedia.org/r/640799