Page MenuHomePhabricator

Wikibase must not insert local recentchanges entries for nonexistent local users (days: 5)
Closed, ResolvedPublic

Description

Currently, if an entry is inserted into the recentchanges table for a username that could exist but does not, it will be inserted with rc_user 0 and rc_user_text being the username. Wikibase inserts such recentchanges entries.

Per T167246, that is not going to work in the future. Instead, the recentchanges table will have an rc_actor column referring to the actor table. The actor table will only contain entries for local users that actually exist and for things like IP users and interwiki usernames (see T179832) that cannot exist as a local user.

I can think of three ways to fix this situation:

  1. Have Wikibase always use interwiki usernames for these recentchanges entries.
  2. Have Wikibase detect whether the local user exists and is attached to the SUL account on both the client and repo wikis, and use interwiki usernames if the named user doesn't exist.
  3. Have Wikibase somehow autocreate the local user if it doesn't exist, assuming these RC entries are only inserted on client wikis sharing SUL with the repo wiki. You'd probably still need a fallback to interwiki usernames in case autocreation fails.

#1 would probably be the easiest, although #2 is pretty easy too using core's CentralIdLookup class. For #3 you might reuse the 'ImportHandleUnknownUser' hook.

Note to self: After this is resolved and deployed, run maintenance/cleanupUsersWithNoId.php with --table recentchanges --force, appropriate --prefix, and (if not option #1) --assign on all wikis.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Anomie We already do the SUL lookup and use the local user ID if it exists. The relevant code is in RecentChangeFactory::getClientUserId. So, if I understand correctly, all we would have to do now is to add a prefix to $userText if $clientUserId is 0 in prepareChangesAttributes. Is there a helper methods for safely constructing such an interwiki user name? Also, should we be creating an actor entry directly, or is it OK to continue to use rc_user and rc_user_text when constructing a RecentChange object?

@Anomie We already do the SUL lookup and use the local user ID if it exists. The relevant code is in RecentChangeFactory::getClientUserId. So, if I understand correctly, all we would have to do now is to add a prefix to $userText if $clientUserId is 0 in prepareChangesAttributes.

That sounds correct, assuming you've already verified that it isn't an IP.

Is there a helper methods for safely constructing such an interwiki user name?

There isn't. If you want to add one, I'll review the patch.

Also, should we be creating an actor entry directly, or is it OK to continue to use rc_user and rc_user_text when constructing a RecentChange object?

It's ok to continue to use rc_user and rc_user_text, this code takes care of it.

Ok, so this boils down to:

  • in core, find a good place for helper functions for constructing and parsing interwiki user names of the form "$prefix>$name". As per https://gerrit.wikimedia.org/r/#/c/386625/10, that code is currently in WikiImporter and Linker.
  • determine the interwiki prefix for the repo (via SiteLookup, probably) and make it known to RecentChangeFactory
  • in RecentChangeFactory::prepareChangesAttributes, set $userText to an interwiki name (using the new helper function) if $clientUserId is 0, and it's not an IP according to IP::isIPAddress().

I just talked to @Lydia_Pintscher about the 3 options.

I believe #3 would be bad. What would happen? Let's say I make an edit on Wikidata, and this edit spreads to 100 client wikis. My user account is automatically created on all these wikis the same time. On half of the wikis a bot comes and adds a welcome message to the local talk page. I get notifications for all of these. This is bad for multiple reasons: The obvious issue is the Echo spam, and also that I now have multiple dozen talk pages I never wanted.

#2 does have an important benefit: for all edits that are done from a client (currently only for sitelinks, but we want to have much more client editing), the local recent changes entries for this edit will point to the local user. This is exactly what I would expect as a user doing this edit. I never visited Wikidata. I don't want recent changes entries point to a user page on Wikidata I do not use.

The problem with #2 is that for all other wikis that are not my home wiki, the recent changes entires will also point to local user pages I do not use, and do not want to maintain.

#1 is the most consistent solution. Always point to Wikidata. After all, the edit was actually done on Wikidata.

With solution #1, the user doing edits from a client wiki will wonder why recent changes entires do not point to his own local user page, but to Wikidata. The good thing about this issue is that it is always very easy to solve: this is only 1 user page the user needs to maintain, not hundreds.

Change 403629 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/core@master] Move methods for handling external usernames to a dedicated class

https://gerrit.wikimedia.org/r/403629

I'm on the patch that @daniel suggested to implement in core

WMDE-leszek renamed this task from Wikibase must not insert local recentchanges entries for nonexistent local users to Wikibase must not insert local recentchanges entries for nonexistent local users (days: 1).Jan 15 2018, 5:23 PM
WMDE-leszek renamed this task from Wikibase must not insert local recentchanges entries for nonexistent local users (days: 1) to Wikibase must not insert local recentchanges entries for nonexistent local users (days: 2).Jan 16 2018, 3:33 PM

Change 405881 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] Use external username when building RecentChange entry

https://gerrit.wikimedia.org/r/405881

WMDE-leszek renamed this task from Wikibase must not insert local recentchanges entries for nonexistent local users (days: 2) to Wikibase must not insert local recentchanges entries for nonexistent local users (days: 4).Jan 23 2018, 4:53 PM
WMDE-leszek renamed this task from Wikibase must not insert local recentchanges entries for nonexistent local users (days: 4) to Wikibase must not insert local recentchanges entries for nonexistent local users (days: 5).Jan 24 2018, 3:47 PM

Change 405881 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Use external username when building RecentChange entry

https://gerrit.wikimedia.org/r/405881

I believe this is done.

It appears that the fix committed for this task isn't actually being used. Doing some investigation, I find:

anomie@terbium:~$ mwrepl enwiki
hphpd> $c = Wikibase\Client\WikibaseClient::getDefaultInstance();
$c = Wikibase\Client\WikibaseClient::getDefaultInstance();
hphpd> =Wikimedia\TestingAccessWrapper::newFromObject( $c->getRecentChangeFactory() )->externalUsernames;
=Wikimedia\TestingAccessWrapper::newFromObject( $c->getRecentChangeFactory() )->externalUsernames;
null

In other words, the new ExternalUsernames class isn't being used in RecentChangeFactory.

hphpd> $cc = Wikimedia\TestingAccessWrapper::newFromObject( $c );
$cc = Wikimedia\TestingAccessWrapper::newFromObject( $c );
hphpd> $s = $cc->siteLookup->getSite( $c->getRepositoryDefinitions()->getDatabaseNames()[''] );                                                             
$s = $cc->siteLookup->getSite( $c->getRepositoryDefinitions()->getDatabaseNames()[''] );
hphpd> =$s->getInterwikiIds();
=$s->getInterwikiIds();
Array
(
)

That's probably why, for whatever reason it's not finding any interwiki ID.

Digging into that SiteLookup class, it looks like someone needs to run whatever maintenance script will repopulate site_identifiers on every wiki, and then make sure it gets re-run in the future whenever the interwiki map changes. That should fix the problem for WMF sites.

I also note that this behavior of not using ExternalUsernames if the client wiki doesn't have an interwiki map entry for the repo is going to start failing once the actor table starts being used. Instead of falling back to just not using ExternalUsernames, you might want to start throwing exceptions to tell people they need to configure the necessary interwiki map entry.

*sigh* I started to rewrite SiteLookup stuff twice, and failed to finish it both times... This is so annoying.

Mentioned in SAL (#wikimedia-operations) [2018-02-23T18:28:32Z] <Amir1> mwscript extensions/Wikibase/lib/maintenance/populateSitesTable.php --wiki=enwiki --force-protocol https (T183019)

With the rebuilding of these tables, it's still can't find it. Problem lies somewhere else it seems

of course, populateSitesTable doesn't update site_identifiers table.

Mentioned in SAL (#wikimedia-operations) [2018-02-27T13:00:36Z] <Amir1> inserting wikidata-related interwikis to site_identifiers table using eval.php in enwiki (T183019)

Mentioned in SAL (#wikimedia-operations) [2018-02-27T13:00:36Z] <Amir1> inserting wikidata-related interwikis to site_identifiers table using eval.php in enwiki (T183019)

I was considering doing the same thing to resolve the immediate problem here. But we'll need to do that for every other wiki too, they're all missing it.

Yeah, yeah. Of course. I'm working on mid-term (fixing populateSites.php maintenance script) and long-term solutions (rewriting SiteLookup) too. This is just to be sure that fixes the problem.

Change 416705 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] Add code of special wikis as interwiki when populating sites table

https://gerrit.wikimedia.org/r/416705

^ If this gets merged and backported, what needs to happen is to run it on all clients (see https://wikitech.wikimedia.org/wiki/Add_a_wiki#Wikidata) and we are all good until the overhaul happens.

Change 416705 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add code of special wikis as interwiki when populating sites table

https://gerrit.wikimedia.org/r/416705

Change 416744 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@wmf/1.31.0-wmf.24] Add code of special wikis as interwiki when populating sites table

https://gerrit.wikimedia.org/r/416744

Change 416745 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@wmf/1.31.0-wmf.23] Add code of special wikis as interwiki when populating sites table

https://gerrit.wikimedia.org/r/416745

Change 416744 merged by Ladsgroup:
[mediawiki/extensions/Wikibase@wmf/1.31.0-wmf.24] Add code of special wikis as interwiki when populating sites table

https://gerrit.wikimedia.org/r/416744

Mentioned in SAL (#wikimedia-operations) [2018-03-07T14:27:19Z] <ladsgroup@tin> Synchronized php-1.31.0-wmf.24/extensions/Wikibase/lib/includes/Sites/SiteMatrixParser.php: [[gerrit:416744|Add code of special wikis as interwiki when populating sites table (T183019)]] (duration: 01m 16s)

Change 416745 merged by Ladsgroup:
[mediawiki/extensions/Wikibase@wmf/1.31.0-wmf.23] Add code of special wikis as interwiki when populating sites table

https://gerrit.wikimedia.org/r/416745

Mentioned in SAL (#wikimedia-operations) [2018-03-07T14:31:01Z] <ladsgroup@tin> Synchronized php-1.31.0-wmf.24/extensions/Wikibase/lib/tests/phpunit/Sites/SiteMatrixParserTest.php: [[gerrit:416744|Add code of special wikis as interwiki when populating sites table, part II (T183019)]] (duration: 01m 15s)

Mentioned in SAL (#wikimedia-operations) [2018-03-07T14:35:47Z] <ladsgroup@tin> Synchronized php-1.31.0-wmf.23/extensions/Wikibase/lib/includes/Sites/SiteMatrixParser.php: [[gerrit:416745|Add code of special wikis as interwiki when populating sites table (T183019)]] (duration: 01m 16s)

Mentioned in SAL (#wikimedia-operations) [2018-03-07T14:37:35Z] <ladsgroup@tin> Synchronized php-1.31.0-wmf.23/extensions/Wikibase/lib/tests/phpunit/Sites/SiteMatrixParserTest.php: [[gerrit:416745|Add code of special wikis as interwiki when populating sites table, part II (T183019)]] (duration: 01m 16s)

Mentioned in SAL (#wikimedia-operations) [2018-03-07T19:18:26Z] <Amir1> ladsgroup@terbium:~$ mwscript extensions/Wikibase/lib/maintenance/populateSitesTable.php --wiki=mediawikiwiki --force-protocol https (T183019)

Mentioned in SAL (#wikimedia-operations) [2018-03-07T19:35:32Z] <Amir1> ladsgroup@terbium:~$ mwscript extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https on fawiki and hewiki (T183019)

Mentioned in SAL (#wikimedia-operations) [2018-03-07T19:43:03Z] <Amir1> ladsgroup@terbium:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T183019)

Ladsgroup moved this task from In Progress to Done on the Wikidata-Ministry-Of-Magic board.

It's done now.

hashar subscribed.

Based on Brad comment at T210732#4785277 that might have regressed :(

It looks like it's just that the need to run populateSitesTable.php when enabling Wikibase on wikis never made it into the right documentation.

We can't run populateSitesTable.php on prod now because the InterwikiSortOrder.php doesn't contain language of some wikis that have been recently created. See T209820: Add Wikidata support to new wikis

I just ran it on several wiktionaries and it seemed to work?

It might cause fatals here and there. Proceed with caution. One reason is that compact language links is the default in most wikis so it never reaches InterwikiSortOrder.php

FYI: I ran it on a bunch more wiktionaries and all worked. I also tried running it on incubatorwiki and sourceswiki, but both failed (I inserted the necessary row manually on those two).

It seems done to me.