It's time for Wikidata Cleanup 2024 !
The year 2024 is slowly coming to an end as the end of this month is approaching. Also in 2024 many items have been created, and while in many cases this went well, also many items are missing the basic properties. When people see such an item, we have no clue what the item is about and with queries they can't be found. This results in that an item without properties is almost useless. If we don't tackle this problem, the number of items with this problem will only be growing each year. Therefore I think it is good to have some coordinated effort to work on this together in the last ten days of 2024 when many users have some extra time because of holidays.
Why are basic properties missing?
It can be simple vandalism, a risk especially with Q's with lower numbers and many interwiki links. If such comes across, it might be the easiest to just check the history of an item and revert the vandalism if someone tried to (partly) empty an item.
Also it is still a problem that there are still users that simply remove basic properties if one is considered to be wrong, instead of replacing it by the right one.
Or a basis property was simple forgotten to be added or the user who created the item did not know it is needed.
In many cases, just one of the basic properties is missing, but often multiple of them are missing on items. To keep it a bit clear, I divided the missing properties in multiple levels. Each level builds on a previous one. Once a certain level has been added to an item, it can be found (quicker) by people working on the other levels. You can just work on one level, but it would be handy if you could also see if you can easily add another other basic level(s) too.
It is not my intention to be complete with all basic properties as there are too many groups of subjects with their basic properties. I have chosen the most generic and largest basic properties, and I just hope that this would help to reduce the backlog.
== Level 1 == Every single item (Q) should have the property "instance of" (P31 https://www.wikidata.org/wiki/Property:P31) or "subclass of" (P279 https://www.wikidata.org/wiki/Property:P279). So our goal is to make sure all items have one of these two properties, sometimes both.
For many obvious, but as it still goes wrong, please use "instance of" on specific examples of that subject, and "subclass of" on items that are a subset. Example: K2 https://www.wikidata.org/wiki/Q43512 is an instance of mountain; volcano https://www.wikidata.org/wiki/Q8072 is a subclass of mountain.
How to start working on this? There can be various approaches and each one that results in the missing properties to be added (properly) is good, choose the one that works for you. I share my approach here below. I think it can be helpful for users if other approaches are shared to so it becomes easier to work on this problem.
The approach I personally choose is to work by language. As my native language is Dutch, I try to make sure that all items with a sitelink to the Dutch Wikipedia have P31 and/or P279. To do that I use this query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%20%20%3Fxx_article%20schema%3Aabout%20%3Fitem%20.%0A%20%20%20%20%3Fxx_article%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fnl.wikipedia.org%2F%3E%20.%0A%20%20%0A%20%20%20%20OPTIONAL%20%7B%3Fitem%20wdt%3AP31%20%3Ftype%20%7D%0A%20%20%20%20OPTIONAL%20%7B%3Fitem%20wdt%3AP279%20%3Fsubclass%20%7D%0A%20%20%20%20FILTER%20%28%20%21BOUND%28%3Ftype%29%20%26%26%20%21BOUND%28%3Fsubclass%29%20%29%0A%20%20%0A%20%20OPTIONAL%20%7B%3Fitem%20rdfs%3Alabel%20%3FitemLabel%20FILTER%20%28lang%28%3FitemLabel%29%20%3D%20%22nl%22%29%20%7D%0A%20%20%0A%20%20%7D%0A. The limit is set on 10, otherwise the query can result in error because it takes too much time. If you want to use it for your language, just replace 2x the "nl" into the language code of your language.
If you are unsure about which item to select with P31/P279, please check other items about similar subjects.
The goal is to make sure every single item to have P31 or P27. If you work by language like I do, when done with one language, please consider also to work on other languages, especially the ones with a limited amount of speakers/users active.
== Level 2: country == One of the main characteristics of many subjects is that they are related to or located in a certain country. A country is basic knowledge people want to know about a subject, and in queries, the country is often used to limit the amount of results.
What does have a country? Use "country" (P17 https://www.wikidata.org/wiki/Property:P17) for buildings and other structures, human settlements, physical features (rivers, mountains, etc), sports event/competition, and more. But not with subjects on Antarctica, in international waters, on a different planet or other astronomical object, etc. *With Antarctica:* set "country" with no value and add the property "continent" (P30 https://www.wikidata.org/wiki/Property:P30) with " Antarctica https://www.wikidata.org/wiki/Q51". *In international waters:* set "country" with no value. *On other astronomical object:* set "located on astronomical body" (P376 https://www.wikidata.org/wiki/Property:P376). Example query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FitemDescription%20%3FinstanceOfLabel%20%3FinstanceOptionsLabel%20%3Flocation%20%3FlocationLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ41176%20.%0A%20%20MINUS%20%7B%20%3Fitem%20wdt%3AP17%20%3Fsome2%20.%20%7D%0A%20%20MINUS%20%7B%20%3Fitem%20rdf%3Atype%20wdno%3AP17%20.%20%7D%0A%20%20MINUS%20%7B%20%3Fitem%20wdt%3AP376%20%3Fsome3%20.%20%7D%0A%20%20OPTIONAL%7B%3Fitem%20wdt%3AP625%20%3Flocation%20.%20%7D%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0ALIMIT%2010, excluding Antarctica, items with "country" no value and items with astronomical body.
Use "country of citizenship" (P27 https://www.wikidata.org/wiki/Property:P27) for people.
Use "country of origin" (P495 https://www.wikidata.org/wiki/Property:P495) for some other subjects.
== Level 3: location == There are about 23 towns with the name "Paris" in the United States, so just saying that a certain item is an instance of and is located in a certain country is not enough.
Use "located in the administrative territorial entity" (P131 https://www.wikidata.org/wiki/Property:P131) for physical subjects.
Use "coordinate location" (P625 https://www.wikidata.org/wiki/Property:P625) to add coordinates.
Use "location" (P276 https://www.wikidata.org/wiki/Property:P276) for other indications of a location, but not for streets (use "located on street" (P669 https://www.wikidata.org/wiki/Property:P669)), not for mountain ranges (use "mountain range" (P4552 https://www.wikidata.org/wiki/Property:P4552)), not for rivers and other water bodies (use "located in or next to body of water" (P206 https://www.wikidata.org/wiki/Property:P206)) and not for others natural features (use "located in/on physical feature" (P706 https://www.wikidata.org/wiki/Property:P706)).
== Level 4: sports == A popular name for various sports competitions are for example League or Liga 1, so knowing which sport (P641 https://www.wikidata.org/wiki/Property:P641) a league or liga is part of is basic information. Example query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FitemDescription%20%3FinstanceOfLabel%20%3FinstanceOptionsLabel%20%3Flocation%20%3FlocationLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ27020041%20.%0A%20%20MINUS%20%7B%20%3Fitem%20wdt%3AP17%20%3Fany%20.%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%2C%5BAUTO_LANGUAGE%5D%22.%20%7D%0A%7D%0ALIMIT%20100 for sports season without sport.
== Bonus: series == With various groups of subjects, an item is part of a larger series. For example sports events or seasons of competitions with each item about one season. There are two things needed:
1. Make sure an item of a certain year/season has the same properties as the previous/next year, and try to avoid variances between them. 2. Make sure each item is connected to the previous (P155 https://www.wikidata.org/wiki/Property:P155) and next (P156 https://www.wikidata.org/wiki/Property:P156) item of that series.
Often the newly created item for a new year/season is missing properties, sometimes this is the case for multiple years on a row.
== Input welcome == Do you know another or better query to work on the listed problems above, please share them so working on it becomes easier.
For everyone: have a nice time in the coming days!
Romaine
Thanks Romaine.
Though this is obviously not the first priority, for streets one can have some fun adding intersections using P2789
As an example, see how I have done that here:
https://www.wikidata.org/wiki/Q19317077
Note that this is not to reactivate the old discussion whether all streets are notable. In 99% cases I am using streets which already have a Wikidata item (in NL, all streets which are part of a postal address have it).
Best Yaroslav
On Sun, Dec 22, 2024 at 6:43 PM Romaine Wiki [email protected] wrote:
It's time for Wikidata Cleanup 2024 !
The year 2024 is slowly coming to an end as the end of this month is approaching. Also in 2024 many items have been created, and while in many cases this went well, also many items are missing the basic properties. When people see such an item, we have no clue what the item is about and with queries they can't be found. This results in that an item without properties is almost useless. If we don't tackle this problem, the number of items with this problem will only be growing each year. Therefore I think it is good to have some coordinated effort to work on this together in the last ten days of 2024 when many users have some extra time because of holidays.
Why are basic properties missing?
It can be simple vandalism, a risk especially with Q's with lower numbers and many interwiki links. If such comes across, it might be the easiest to just check the history of an item and revert the vandalism if someone tried to (partly) empty an item.
Also it is still a problem that there are still users that simply remove basic properties if one is considered to be wrong, instead of replacing it by the right one.
Or a basis property was simple forgotten to be added or the user who created the item did not know it is needed.
In many cases, just one of the basic properties is missing, but often multiple of them are missing on items. To keep it a bit clear, I divided the missing properties in multiple levels. Each level builds on a previous one. Once a certain level has been added to an item, it can be found (quicker) by people working on the other levels. You can just work on one level, but it would be handy if you could also see if you can easily add another other basic level(s) too.
It is not my intention to be complete with all basic properties as there are too many groups of subjects with their basic properties. I have chosen the most generic and largest basic properties, and I just hope that this would help to reduce the backlog.
== Level 1 == Every single item (Q) should have the property "instance of" (P31 https://www.wikidata.org/wiki/Property:P31) or "subclass of" (P279 https://www.wikidata.org/wiki/Property:P279). So our goal is to make sure all items have one of these two properties, sometimes both.
For many obvious, but as it still goes wrong, please use "instance of" on specific examples of that subject, and "subclass of" on items that are a subset. Example: K2 https://www.wikidata.org/wiki/Q43512 is an instance of mountain; volcano https://www.wikidata.org/wiki/Q8072 is a subclass of mountain.
How to start working on this? There can be various approaches and each one that results in the missing properties to be added (properly) is good, choose the one that works for you. I share my approach here below. I think it can be helpful for users if other approaches are shared to so it becomes easier to work on this problem.
The approach I personally choose is to work by language. As my native language is Dutch, I try to make sure that all items with a sitelink to the Dutch Wikipedia have P31 and/or P279. To do that I use this query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%20%20%3Fxx_article%20schema%3Aabout%20%3Fitem%20.%0A%20%20%20%20%3Fxx_article%20schema%3AisPartOf%20%3Chttps%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fnl.wikipedia.org%2F%3E%20.%0A%20%20%0A%20%20%20%20OPTIONAL%20%7B%3Fitem%20wdt%3AP31%20%3Ftype%20%7D%0A%20%20%20%20OPTIONAL%20%7B%3Fitem%20wdt%3AP279%20%3Fsubclass%20%7D%0A%20%20%20%20FILTER%20%28%20%21BOUND%28%3Ftype%29%20%26%26%20%21BOUND%28%3Fsubclass%29%20%29%0A%20%20%0A%20%20OPTIONAL%20%7B%3Fitem%20rdfs%3Alabel%20%3FitemLabel%20FILTER%20%28lang%28%3FitemLabel%29%20%3D%20%22nl%22%29%20%7D%0A%20%20%0A%20%20%7D%0A. The limit is set on 10, otherwise the query can result in error because it takes too much time. If you want to use it for your language, just replace 2x the "nl" into the language code of your language.
If you are unsure about which item to select with P31/P279, please check other items about similar subjects.
The goal is to make sure every single item to have P31 or P27. If you work by language like I do, when done with one language, please consider also to work on other languages, especially the ones with a limited amount of speakers/users active.
== Level 2: country == One of the main characteristics of many subjects is that they are related to or located in a certain country. A country is basic knowledge people want to know about a subject, and in queries, the country is often used to limit the amount of results.
What does have a country? Use "country" (P17 https://www.wikidata.org/wiki/Property:P17) for buildings and other structures, human settlements, physical features (rivers, mountains, etc), sports event/competition, and more. But not with subjects on Antarctica, in international waters, on a different planet or other astronomical object, etc. *With Antarctica:* set "country" with no value and add the property "continent" (P30 https://www.wikidata.org/wiki/Property:P30) with " Antarctica https://www.wikidata.org/wiki/Q51". *In international waters:* set "country" with no value. *On other astronomical object:* set "located on astronomical body" (P376 https://www.wikidata.org/wiki/Property:P376). Example query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FitemDescription%20%3FinstanceOfLabel%20%3FinstanceOptionsLabel%20%3Flocation%20%3FlocationLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ41176%20.%0A%20%20MINUS%20%7B%20%3Fitem%20wdt%3AP17%20%3Fsome2%20.%20%7D%0A%20%20MINUS%20%7B%20%3Fitem%20rdf%3Atype%20wdno%3AP17%20.%20%7D%0A%20%20MINUS%20%7B%20%3Fitem%20wdt%3AP376%20%3Fsome3%20.%20%7D%0A%20%20OPTIONAL%7B%3Fitem%20wdt%3AP625%20%3Flocation%20.%20%7D%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0ALIMIT%2010, excluding Antarctica, items with "country" no value and items with astronomical body.
Use "country of citizenship" (P27 https://www.wikidata.org/wiki/Property:P27) for people.
Use "country of origin" (P495 https://www.wikidata.org/wiki/Property:P495) for some other subjects.
== Level 3: location == There are about 23 towns with the name "Paris" in the United States, so just saying that a certain item is an instance of and is located in a certain country is not enough.
Use "located in the administrative territorial entity" (P131 https://www.wikidata.org/wiki/Property:P131) for physical subjects.
Use "coordinate location" (P625 https://www.wikidata.org/wiki/Property:P625) to add coordinates.
Use "location" (P276 https://www.wikidata.org/wiki/Property:P276) for other indications of a location, but not for streets (use "located on street" (P669 https://www.wikidata.org/wiki/Property:P669)), not for mountain ranges (use "mountain range" (P4552 https://www.wikidata.org/wiki/Property:P4552)), not for rivers and other water bodies (use "located in or next to body of water" (P206 https://www.wikidata.org/wiki/Property:P206)) and not for others natural features (use "located in/on physical feature" (P706 https://www.wikidata.org/wiki/Property:P706)).
== Level 4: sports == A popular name for various sports competitions are for example League or Liga 1, so knowing which sport (P641 https://www.wikidata.org/wiki/Property:P641) a league or liga is part of is basic information. Example query https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FitemDescription%20%3FinstanceOfLabel%20%3FinstanceOptionsLabel%20%3Flocation%20%3FlocationLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ27020041%20.%0A%20%20MINUS%20%7B%20%3Fitem%20wdt%3AP17%20%3Fany%20.%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%2C%5BAUTO_LANGUAGE%5D%22.%20%7D%0A%7D%0ALIMIT%20100 for sports season without sport.
== Bonus: series == With various groups of subjects, an item is part of a larger series. For example sports events or seasons of competitions with each item about one season. There are two things needed:
- Make sure an item of a certain year/season has the same properties
as the previous/next year, and try to avoid variances between them. 2. Make sure each item is connected to the previous (P155 https://www.wikidata.org/wiki/Property:P155) and next (P156 https://www.wikidata.org/wiki/Property:P156) item of that series.
Often the newly created item for a new year/season is missing properties, sometimes this is the case for multiple years on a row.
== Input welcome == Do you know another or better query to work on the listed problems above, please share them so working on it becomes easier.
For everyone: have a nice time in the coming days!
Romaine _______________________________________________ Wikidata mailing list -- [email protected] Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/mes... To unsubscribe send an email to [email protected]
Ok ,I will do my best.
Yahoo Mail: Search, Organize, Conquer
On Sun, Dec 22, 2024 at 10:43 PM, Romaine Wiki[email protected] wrote: It's time for Wikidata Cleanup 2024!
The year 2024 is slowly coming to an end as the end of this month is approaching. Also in 2024 many items have been created, and while in many cases this went well, also many items are missing the basic properties. When people see such an item, we have no clue what the item is about and with queries they can't be found. This results in that an item without properties is almost useless. If we don't tackle this problem, the number of items with this problem will only be growing each year. Therefore I think it is good to have some coordinated effort to work on this together in the last ten days of 2024 when many users have some extra time because of holidays.
Why are basic properties missing?
It can be simple vandalism, a risk especially with Q's with lower numbers and many interwiki links. If such comes across, it might be the easiest to just check the history of an item and revert the vandalism if someone tried to (partly) empty an item.
Also it is still a problem that there are still users that simply remove basic properties if one is considered to be wrong, instead of replacing it by the right one.
Or a basis property was simple forgotten to be added or the user who created the item did not know it is needed.
In many cases, just one of the basic properties is missing, but often multiple of them are missing on items. To keep it a bit clear, I divided the missing properties in multiple levels. Each level builds on a previous one. Once a certain level has been added to an item, it can be found (quicker) by people working on the other levels.You can just work on one level, but it would be handy if you could also see if you can easily add another other basic level(s) too.
It is not my intention to be complete with all basic properties as there are too many groups of subjects with their basic properties. I have chosen the most generic and largest basic properties, and I just hope that this would help to reduce the backlog.
== Level 1 == Every single item (Q) should have the property "instance of" (P31) or "subclass of" (P279). So our goal is to make sure all items have one of these two properties, sometimes both.
For many obvious, but as it still goes wrong, please use "instance of" on specific examples of that subject, and "subclass of" on items that are a subset. Example: K2 is an instance of mountain; volcano is a subclass of mountain. How to start working on this? There can be various approaches and each one that results in the missing properties to be added (properly) is good, choose the one that works for you. I share my approach here below. I think it can be helpful for users if other approaches are shared to so it becomes easier to work on this problem.
The approach I personally choose is to work by language. As my native language is Dutch, I try to make sure that all items with a sitelink to the Dutch Wikipedia have P31 and/or P279. To do that I use this query. The limit is set on 10, otherwise the query can result in error because it takes too much time. If you want to use it for your language, just replace 2x the "nl" into the language code of your language. If you are unsure about which item to select with P31/P279, please check other items about similar subjects.
The goal is to make sure every single item to have P31 or P27. If you work by language like I do, when done with one language, please consider also to work on other languages, especially the ones with a limited amount of speakers/users active.
== Level 2: country ==One of the main characteristics of many subjects is that they are related to or located in a certain country. A country is basic knowledge people want to know about a subject, and in queries, the country is often used to limit the amount of results.
What does have a country?Use "country" (P17) for buildings and other structures, human settlements, physical features (rivers, mountains, etc), sports event/competition, and more.But not with subjects on Antarctica, in international waters, on a different planet or other astronomical object, etc. With Antarctica: set "country" with no value and add the property "continent" (P30) with "Antarctica". In international waters: set "country" with no value. On other astronomical object: set "located on astronomical body" (P376). Example query, excluding Antarctica, items with "country" no value and items with astronomical body.
Use "country of citizenship" (P27) for people.
Use "country of origin" (P495) for some other subjects.
== Level 3: location ==There are about 23 towns with the name "Paris" in the United States, so just saying that a certain item is an instance of and is located in a certain country is not enough.
Use "located in the administrative territorial entity" (P131) for physical subjects.
Use "coordinate location" (P625) to add coordinates.
Use "location" (P276) for other indications of a location, but not for streets (use "located on street" (P669)), not for mountain ranges (use "mountain range" (P4552)), not for rivers and other water bodies (use "located in or next to body of water" (P206)) and not for others natural features (use "located in/on physical feature" (P706)).
== Level 4: sports == A popular name for various sports competitions are for example League or Liga 1, so knowing which sport (P641) a league or liga is part of is basic information. Example query for sports season without sport.
== Bonus: series == With various groups of subjects, an item is part of a larger series. For example sports events or seasons of competitions with each item about one season. There are two things needed:
- Make sure an item of a certain year/season has the same properties as the previous/next year, and try to avoid variances between them. - Make sure each item is connected to the previous (P155) and next (P156) item of that series.
Often the newly created item for a new year/season is missing properties, sometimes this is the case for multiple years on a row.
== Input welcome == Do you know another or better query to work on the listed problems above, please share them so working on it becomes easier.
For everyone: have a nice time in the coming days! Romaine _______________________________________________ Wikidata mailing list -- [email protected] Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/mes... To unsubscribe send an email to [email protected]