Page MenuHomePhabricator

All Graphs broken on Wikimedia wikis (due to security issue T336556)
Open, HighPublicBUG REPORT

Assigned To
Authored By
Iniquity
Apr 18 2023, 12:29 PM
Referenced Files
F43086505: irudia.png
Mar 22 2024, 10:21 PM
F41711233: Wishlist_feasibility_flow_chart.jpg
Jan 23 2024, 7:25 PM
F36955928: Screenshot 2023-04-19 at 00.58.31.png
Apr 18 2023, 11:59 PM
F36955428: image.png
Apr 18 2023, 12:29 PM
Tokens
"Heartbreak" token, awarded by Prototyperspective."Heartbreak" token, awarded by aliu."Heartbreak" token, awarded by Ita140188."Heartbreak" token, awarded by Misfortunesdaughter."Manufacturing Defect?" token, awarded by Sj."Manufacturing Defect?" token, awarded by Dalba."Hungry Hippo" token, awarded by Don-vip.

Description

On April 19, 2023 it was identified that the Graph extension, which uses the older Vega 1 & Vega 2 libraries, had a number of security vulnerabilities.

In the interest of the security of our users, the Graph extension was disabled on Wikimedia wiki's. WMF teams are working quickly on a plan to respond to these vulnerabilities.

We recommend that any other third party users of the Graph extension should disable the use of that extension on their wikis.

A configuration change will suppress the exposed raw tags and graph json definition to avoid excess disruption to the end user experience when the extension is disabled. [2] This also provides a tracking category "Category:Pages with disabled graphs" showing the pages that used to contain graphs. Local administrators can localise the name of the category and its description by editing [[MediaWiki:Graph-disabled-category]], [[MediaWiki:Graph-disabled-category-desc]] interface messages on your local wiki.

On Wikimedia projects, graphs created via the extension will remain unavailable. This means that pages that were formerly displaying graphs will now display a small blank area. To help readers understand this situation, communities can now define a brief message that can be displayed to readers in place of each graph until this is resolved. That message can be defined on each wiki at [[MediaWiki:Graph-disabled]] by local administrators.

An example from the English Wikipedia:

Screenshot 2023-04-19 at 00.58.31.png (610×636 px, 69 KB)

ORIGINAL:
Steps to replicate the issue (include links if applicable):

What happens?:

Any graph is not shown. Instead, this error message from the page MediaWiki:Graph-disabled is shown. Example error message on enwiki:

https://en.wikipedia.org/wiki/MediaWiki:Graph-disabled

Some wikis may have similar rendering errors instead, or blank page:

image.png (453×1 px, 59 KB)

What should have happened instead?:
Graphs should be shown.

Other information (browser name/version, screenshots, etc.):
I know graphs was disable because of a security issue, but an open issue is also needed so that people understand what's going on.

April 21 update part 1 - part 2. - exploring Vega 5 support for the Graph Extension

April 28 update. - Vega5 added for testing with limited features

July 15 update. - created the page https://www.mediawiki.org/wiki/Extension:Graph/Plans

August 11 update (archived).

December 22 update (archived)

April 10, 2024 update

Related Objects

StatusSubtypeAssignedTask
DuplicateNone
DeclinedNone
DeclinedNone
OpenBUG REPORTCCiufo-WMF
ResolvedSecurityJdlrobson
DeclinedFeatureJdlrobson
ResolvedBawolff
DeclinedNone
DeclinedNone
DeclinedNone
Resolved Jseddon
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedsbassett
ResolvedFeatureJdlrobson
DeclinedFeatureNone
DeclinedJdlrobson
DeclinedNone
Resolved Elitre
DuplicateNone
InvalidSecurityNone
DeclinedNone
ResolvedCCiufo-WMF

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I have been working on a non-vega version of OSM Location map, which is now at 'close to complete' stage at https://en.wikipedia.org/wiki/Template:OSM_Location_map/sandbox . It jumps through some very ungainly hoops, as it uses the Maplink overlay, but only seems to work if an en:overlay template also adds an invisible square. That has allowed me to re-use the mercator calculations I had needed to get vega5 working, and add inline CSS graphics and text instructions on top of the map. (Betraying my ignorance, I had no idea CSS could be used like this). So far as I can tell, it appears to have a lower performace hit than Vega did.

There are a selection of examples at https://en.wikipedia.org/wiki/Template:OSM_Location_map/examples which also showcase some new features not possible with the old graph template. Any thoughts on the stability, performace, sustainability, portability and 'security safety' of this approach would be welcome. So far it only does 10 map-items. I am doing a few more compatibility/bug-find tests with existing map examples, and all being well will then ramp it up to the original 60 and go live in the next few days.

There are also downsides of it, as it would either clutter Commons with many versions of the same graph or people would need to ask for autopatrol specifically to edit graphs. An extreme example would be during the covid pandemic when the graphs were updated weekly or more frequently. In order to reupload a photo uploaded by someone else, an autopatrol right is needed, which is likely not to be given 'just to edit graphs' if person is not very active on Commons otherwise.

Of course there would be downsides. That's the whole problem, that all of what has been discussed has downsides. But this at least would be within the control and capabilities of the community.

I have been working on a non-vega version of OSM Location map, which is now at 'close to complete' stage at https://en.wikipedia.org/wiki/Template:OSM_Location_map/sandbox

This is interesting, but which are the differences between your project and https://www.mediawiki.org/wiki/Help:Extension:Kartographer? Most of that maps can be done fairly well using Kartographer.

This is interesting, but which are the differences between your project and https://www.mediawiki.org/wiki/Help:Extension:Kartographer? Most of that maps can be done fairly well using Kartographer.

The biggest single display difference is text-labels either alongside a marker or simply naming a map feature. More control over symbols and other graphical elements also means a map can be 'added to' rather than simply given markers. Numbered dots are most similar, but with extra caption and display features, and can be used along with text labels. In some measure it has different use cases, and of course it builds on Kartographer through use of the maplink template. It also makes more use of standard wikitemplate syntax.

I just realised the sandbox template still showed the old documentation, with the interim solution, which you correctly noted is all done through kartographer. I have switched the maps to the sandbox version - a bit rough and ready but you will see the differences!

Just to confirm my understanding of the situation: graphs are still broken after a whole year, with no replacement whatsoever, and no plan for implementing one?

T336595 feels like a band-aid solution and a bad one at that. It does nothing to adress the underlying security problem or maintenance problem. I personally don't think we should do that except as a last resort.

But, to be clear, having graphs unavailable for a year already feels like there needs to be some last resort solution. I don’t see why Graphs cannot be brought back with tight security requirements about editing them, and then a more comprehensible solution can be worked out by the engineers (maybe even dropping the current library etc., but, eh, not waiting for two years for a new solution). Many wikis have used graphs extensively and therefore require them to be back, even though that might not apply to English-speaking audiences.

I personally still think that running something on toolforge and exporting to SVG which can then be uploaded, is the most flexible, easiest to realize and most usable short term way to enable rich creation tools. It creates a hard separation between security contexts and avoids any sort of fragile dependency on MediaWiki and Wikimedia. The uploading can even be automated, OAuth login and commons uploading are commonly implemented things in tools already. You loose interactivity, but interactivity was already pretty rare in graphs.

At that point, you might as well resurrect Graphoid in a more secure way. Which is, by the way, a solution: sandbox Vega JS part on the server, allow users to generate graph SVG or PNG images, and that’s it. No interactivity, of course, but also no security threat to the end-users. Which is what the original removal of Graph extension was about. Tbf I don’t get why the perfect is such an enemy of the good here, yes, maybe Graph extension should be phased out in the long-term, but currently we are serving readers a big pile of nothing for the last year. Surely the WMF, with all of its resources, can develop a short-term solution that would not compromise security and then work on a long-term solution. @MMiller_WMF’s email seems weirdly avoidant of that.

I work in a software company (as a technical writer, but it doesn't matter). If any of the teams in our company would have concentrated on dark mode instead of quickly fixing such a major bug in production, I'm definitely sure that team would have been immediately kicked out of the company.

At that point, you might as well resurrect Graphoid in a more secure way.

Saying T211881#9425243 here again: The last (before archival) version of graphoid is prone to RCE due to CVE-2020-26296. For a POC, see https://github.com/vega/vega/issues/3018#issuecomment-748929438

And there are still potential issue in latest Vega 5, though I don't know the detail.

Note Graphoid is a service (i.e. continously running server in node.js), and by its design it accesses external data from the internet (such as result of WDQS query). It is not a single binary like imagemagick or lilypond that can be run one-off statelessly and without internet access (for that we have Shellbox for it).

I would advise the community to look for solutions for graphs outside of the foundation (maybe through a grant or something). I personally still think that running something on toolforge and exporting to SVG which can then be uploaded, is the most flexible, easiest to realize and most usable short term way to enable rich creation tools. It creates a hard separation between security contexts and avoids any sort of fragile dependency on MediaWiki and Wikimedia. The uploading can even be automated, OAuth login and commons uploading are commonly implemented things in tools already.

There are also downsides of it, as it would either clutter Commons with many versions of the same graph or people would need to ask for autopatrol specifically to edit graphs. An extreme example would be during the covid pandemic when the graphs were updated weekly or more frequently. In order to reupload a photo uploaded by someone else, an autopatrol right is needed, which is likely not to be given 'just to edit graphs' if person is not very active on Commons otherwise.

I feel that T66460 could be another promising solution for non-interactive graphs. When modules are able to generate SVGs on the fly, graphs can update dynamically based on changes in data and the laborious process of updating it in an external tool and reuploading the SVG can be avoided.

The ticket has an attached patch which looks like a good start, although it is 10 years old.

I would advise the community to look for solutions for graphs outside of the foundation (maybe through a grant or something). I personally still think that running something on toolforge and exporting to SVG which can then be uploaded, is the most flexible, easiest to realize and most usable short term way to enable rich creation tools. It creates a hard separation between security contexts and avoids any sort of fragile dependency on MediaWiki and Wikimedia. The uploading can even be automated, OAuth login and commons uploading are commonly implemented things in tools already.

There are also downsides of it, as it would either clutter Commons with many versions of the same graph or people would need to ask for autopatrol specifically to edit graphs. An extreme example would be during the covid pandemic when the graphs were updated weekly or more frequently. In order to reupload a photo uploaded by someone else, an autopatrol right is needed, which is likely not to be given 'just to edit graphs' if person is not very active on Commons otherwise.

I feel that T66460 could be another promising solution for non-interactive graphs. When modules are able to generate SVGs on the fly, graphs can update dynamically based on changes in data and the laborious process of updating it in an external tool and reuploading the SVG can be avoided.

The ticket has an attached patch which looks like a good start, although it is 10 years old.

This will require T334953: Introduce an SVG Sanitizer.

Why? I could be wrong but my understanding is that that patch would generate SVGs which are treated as if they were uploaded SVG files. So they're not rendered client-side, and so it would only require T86874: Make SVG sanitization into a library at best.

If the sandboxing approach is abandoned, https://www.mediawiki.org/wiki/Extension:Graph/Plans should be updated correspondingly to provide correct information about what is going on.

it has become clear that there isn’t a safe shortcut here and that the path forward will require a substantial investment – one that we have not yet started given the other priorities we’ve been working on.

Am I reading this right? A year after a major feature was broken they haven’t even started working on it?

it has become clear that there isn’t a safe shortcut here and that the path forward will require a substantial investment – one that we have not yet started given the other priorities we’ve been working on.

Am I reading this right? A year after a major feature was broken they haven’t even started working on it?

To be fair they did explore some options. I don't know why would they say configuring a cookieless-domain is a "substantial investment". It should be fairly easy... But maybe current infrastructure is so complicated that configuring a proxy with a new domain is somehow hard ¯\_(ツ)_/¯

Saying T211881#9425243 here again: The last (before archival) version of graphoid is prone to RCE due to CVE-2020-26296. For a POC, see https://github.com/vega/vega/issues/3018#issuecomment-748929438

I wasn’t strictly talking about making Graphoid available again, I was talking about providing a sandboxed generator of Graphoid-like images. Maybe I am missing something, but I can’t see this being impossible in the same way that container version of Lilypond is able to be used.

I am wondering if we could use a rewrite of the graphs extension using something like P5JS or something similar (in other words moving to the frontend). This would also reduce the need for caching of files and whatnot used in the plugin.

I can confirm the usefulness of the functionality now re-enabled by the rewrite of Template:OSM Location without using the graph module and that it compliments the other mapping functionality in en:Wikipedia. This rewrite allowed about 5,287 pages on en:Wikipedia to have improved user friendly information at first page sight compared to other mapping options that I have now used widely and been forced to understand their limitations especially on first page sight.

Even though this apparently more efficient and presumably more secure rewrite would not have happened if the graph module had remained available, this editor is of the view that a low overhead and secure graph option that allows an ordinary editor to update changing data by text entry is core functionality moving on that the Wikimedia Foundation could usefully prioritise.

This is interesting, but which are the differences between your project and https://www.mediawiki.org/wiki/Help:Extension:Kartographer? Most of that maps can be done fairly well using Kartographer.

The biggest single display difference is text-labels either alongside a marker or simply naming a map feature. More control over symbols and other graphical elements also means a map can be 'added to' rather than simply given markers. Numbered dots are most similar, but with extra caption and display features, and can be used along with text labels. In some measure it has different use cases, and of course it builds on Kartographer through use of the maplink template. It also makes more use of standard wikitemplate syntax.

I am wondering if we could use a rewrite of the graphs extension using something like P5JS or something similar (in other words moving to the frontend). This would also reduce the need for caching of files and whatnot used in the plugin.

The Graph extension was (at the time it got disabled) frontend-only. And this was the issue: the graph code was written by one (unprivileged, i.e. not interface admin) user and executed in another user’s browser. While its input was JSON, apparently it was possible to write such JSON that runs arbitrary JavaScript code. JavaScript running in a user’s browser can do bad things (a less serious example being that it vandalizes pages in the victim’s name). In a JS-based frontend like P5JS being able to run arbitrary JavaScript code is not a vulnerability but the basic design, so it’s even worse than Vega.

this editor is of the view that a low overhead and secure graph option that allows an ordinary editor to update changing data by text entry is core functionality moving on that the Wikimedia Foundation could usefully prioritise.

I have to dissent on that characterization. It's one that's been thrown into this Phab ticket several times by various participants. "Graphs are core functionality". Well, sorry, no.

It's an Internet encyclopedia. The webservers are core functionality. The Wikitext parser is core functionality. Page editing, the content database, template transclusion, File (image) embedding, user account management... these are all core functionalities. Lose any one of them, continuing to operate Wikipedia in any meaningful fashion becomes an unsustainable and extremely short-term proposition until the missing piece is restored.

(Even Echo, which we've come to rely upon so heavily that my initial list included it, I have to admit isn't core functionality. We managed to make do with fully manual, template-based talk page notifications for a really long time, and we'd find a way to make do if we suddenly had to go back to that again. It'd suck really, really hard, but it wouldn't kill the project.)

Scribunto probably IS core functionality, today, because even though we got by without Modules for many, many years, so much of the original pure-Template infrastructure has undergone an extremely one-way conversion to Lua code that I don't see how we could survive without it anymore.

Graphs are a valuable feature. They're an important feature to a sizable portion of both the editor and reader communities. They are not core functionality. The fact that the encyclopedia is still intact — diminished, certainly... perhaps even handicapped... but undeniably intact — after many months without them, demonstrates that all by itself.

It's an Internet encyclopedia.

Sorry, we do an Internet encyclopedia. We are not an Internet encyclopedia. We are more than that and want to be the central infrastructure of free knowledge (first sentence in the strategy).

Graphs aren't core to that either. I have to agree that it's not core, but it's still important functionality the WMF should prioritize.

this editor is of the view that a low overhead and secure graph option that allows an ordinary editor to update changing data by text entry is core functionality moving on that the Wikimedia Foundation could usefully prioritise.

I have to dissent on that characterization. It's one that's been thrown into this Phab ticket several times by various participants. "Graphs are core functionality". Well, sorry, no.

It's an Internet encyclopedia. The webservers are core functionality. The Wikitext parser is core functionality. Page editing, the content database, template transclusion, File (image) embedding, user account management... these are all core functionalities. Lose any one of them, continuing to operate Wikipedia in any meaningful fashion becomes an unsustainable and extremely short-term proposition until the missing piece is restored.

(Even Echo, which we've come to rely upon so heavily that my initial list included it, I have to admit isn't core functionality. We managed to make do with fully manual, template-based talk page notifications for a really long time, and we'd find a way to make do if we suddenly had to go back to that again. It'd suck really, really hard, but it wouldn't kill the project.)

Scribunto probably IS core functionality, today, because even though we got by without Modules for many, many years, so much of the original pure-Template infrastructure has undergone an extremely one-way conversion to Lua code that I don't see how we could survive without it anymore.

Graphs are a valuable feature. They're an important feature to a sizable portion of both the editor and reader communities. They are not core functionality. The fact that the encyclopedia is still intact — diminished, certainly... perhaps even handicapped... but undeniably intact — after many months without them, demonstrates that all by itself.

The problem with this argument is that while graphs are not technically "core" functionality, they are a lot more important than a lot of things that Wikimedia is focusing its resources on. I would argue that in terms of the mission of the WMF, restoring the graph extension is a lot more urgent than (for example) funding external projects through the Knowledge Equity Fund. We are being told that limited resources is the reason for the failure to address this problem, while the WMF is freely spending on many things that are only marginally related to building an encyclopedia, let alone core or urgent. So it may not be "core" but graphs should be pretty high on the list of priority in my opinion.

this editor is of the view that a low overhead and secure graph option that allows an ordinary editor to update changing data by text entry is core functionality moving on that the Wikimedia Foundation could usefully prioritise.

I have to dissent on that characterization. It's one that's been thrown into this Phab ticket several times by various participants. "Graphs are core functionality". Well, sorry, no.

It's an Internet encyclopedia. The webservers are core functionality. The Wikitext parser is core functionality. Page editing, the content database, template transclusion, File (image) embedding, user account management... these are all core functionalities. Lose any one of them, continuing to operate Wikipedia in any meaningful fashion becomes an unsustainable and extremely short-term proposition until the missing piece is restored.

(Even Echo, which we've come to rely upon so heavily that my initial list included it, I have to admit isn't core functionality. We managed to make do with fully manual, template-based talk page notifications for a really long time, and we'd find a way to make do if we suddenly had to go back to that again. It'd suck really, really hard, but it wouldn't kill the project.)

Scribunto probably IS core functionality, today, because even though we got by without Modules for many, many years, so much of the original pure-Template infrastructure has undergone an extremely one-way conversion to Lua code that I don't see how we could survive without it anymore.

Graphs are a valuable feature. They're an important feature to a sizable portion of both the editor and reader communities. They are not core functionality. The fact that the encyclopedia is still intact — diminished, certainly... perhaps even handicapped... but undeniably intact — after many months without them, demonstrates that all by itself.

Do you think that dark mode of the UI is a core functionality? Judging from the fact that it is one of the features that WMF is working on, it is.

Graphs are a valuable feature. They're an important feature to a sizable portion of both the editor and reader communities.

A feature that was used on 0.07% of all articles, 0.2% of good articles and 0.3% of featured articles (as of 2020, sorry don't have more recent stats). "Sizable" is highly debatable.

is a lot more urgent than (for example) funding external projects through the Knowledge Equity Fund

I think that is pretty obvious. You could set that money on fire, and it would be more useful, as at least it would keep somebody warm.

Just noting that the the Knowledge Equity Fund, at least, builds free knowledge projects. According to the WMF foundation audit (https://wikimediafoundation.org/wp-content/uploads/2023/11/Wikimedia_Foundation_FS_FY2022-2023_Audit_Report.pdf) there are 29 million USD in stocks.

irudia.png (631×1 px, 121 KB)

We still don't know how much would it cost to have the graphs back, but I doub that it will be more than the money used to speculate in the Stock Exchange, which is not part of our mission.

Please read the Phabricator etiquette before commenting on this task.
Relevant excerpt:

Thoughts unrelated to the topic of the report (for example, meta-level discussions on priorities in general or on whether a new extension is wanted at all) should go to the appropriate mailing lists, wiki talk pages, or separate reports.

Today the "key results" of the 2024-2025 Annual Plan have been published and there is no single mention to solving this issue: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2024-2025/Product_%26_Technology_OKRs

As the planning of the next fiscal year happens there, I would suggest to discuss also there about the problems with graphs.

Even if WMF was planning to do this (i have no idea if they are) it would probably not be a top level OKR.

Hello everyone. I'm Marshall Miller; I'm a Senior Director of Product at WMF. Thank you all for thinking about and discussing how to move forward with this challenging issue of graphs being unavailable. I've posted an update on the project page for the Graph extension, in which we are proposing building a new extension for graphs. Please check out the update and join the discussion on the talk page as we think together about how to proceed.

MMiller_WMF added a subscriber: CCiufo-WMF.

Per my previous update on our proposed plan, I am assigning this task to @CCiufo-WMF, who will be the product manager stewarding the graphs situation going forward. Thank you, Chris!

This comment was removed by Betseg.

Wikimedia Hackathon 2024 (chat via Telegram at https://t.me/wmhacks) is ongoing. I hope this be given some attention and collaboratively discuss a path to a resolution.

For those attending the hackathon, please find @Catrope if you'd like to discuss the proposed new path forward for graphs.

Just updated few broken links. Sorry for extra notification.

Wikimedia plans to develop Charts as a replacement of Graph.

@Emanuel2010Nikolli: This is not a chat forum. Please refrain from adding such comments - thanks!