Wikidata:Property proposal/taxonomic treatment
taxonomic treatment
[edit]Originally proposed at Wikidata:Property proposal/Natural science
Description | subject has object as taxon treatment |
---|---|
Represents | taxonomic treatment (Q32945461) |
Data type | Item |
Domain | taxon (Q16521) |
Allowed values | taxonomic treatment (Q32945461) |
Example 1 | Mileewa triloba (Q105626659) → Mileewa digitata Yu, He & Yang, 2021 (Q105626708) |
Example 2 | Cerchysiella raddeii (Q106154577) → Cerchysiella raddeii Yang, 2013 (Q106155510) |
Example 3 | Aspidosiphon cutleri (Q106253474) → Aspidosiphon cutleri Silva-Morales & Gómez-Vásquez, 2021 (Q106254505) |
Example 4 | Niphargus yasujensis (Q106254624) → Niphargus yasujensis Bargrizaneh, Fišer & Esmaeili-Rineh, 2021 (Q106254542) |
Example 5 | Cotesia lasallei (Q96373037) → Cotesia lasallei Fagan-Jeffries & Austin, 2020 (Q96372803) |
Planned use | We plan to enrich Wikidata with data on taxonomic treatments as they are extracted from the literature, and through that provide an additional gateway to the scientific data and further cited items such as specimens or gene sequences about the taxon |
Expected completeness | always incomplete (Q21873886) |
Robot and gadget jobs | Plazi treatments are available through the api. The CC0 license applies. A bot will be written to add statements linking taxon items to treatments items, where this property will link the two items. |
Wikidata project | WikiProject Taxonomy (Q8503033) |
Motivation
[edit]Taxonomic treatments include the facts used to propose a new species or taxon in general, later add more facts or the statement why something is being synonymized, i.e. the bases of the catalogue of life. More recent treatments include references to gene sequences and physical specimens that are increasingly digitally accessible and thus part of the biodiversity knowledge graph to which the data in treatments are a major contributor. Each taxon has at least one taxonomic treatment, many have multiple ones, thus being a major contributor to the 500M pages of biodiversity literature or an increasing contributor, especially of new species, to the Global Biodiversity Information Facility (Q1531570) (see eg https://www.gbif.org/species/180876843) An increasing number of journals are publishing in formats wherein treatments are explicitly marked up (e.g. ZooKeys (Q219980)). Andrawaag (talk) 14:46, 7 April 2021 (UTC)
Discussion
[edit]WikiProject Taxonomy has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
- Support Each new taxonomic name needs to have a taxonomic treatment in order to be available from a nomenclatural point of view. Furthermore, each treatment includes the facts used to describe a new species. These facts are increasingly also including links to digital specimens, figures and gene accession codes and thus are an inroad to scientific data about the taxon. --Myrmoteras (talk) 14:58, 7 April 2021 (UTC)
- Support Taxonomic treatment is a fairly accepted concept within the biodiversity community by now, most notable being its presence as a first class citizen in Zenodo and GBIF. Taxonomic treatments allow for linking biodiversity data into the most relevant citable piece of literature in taxonomic works (the actual treatment of a given taxon concept) and with this, it enhances the discoverability and accessibility of information such as specimen codes, accession numbers, geographical coordinates, synonyms, etc. --Mguidoti (talk) 15:02, 7 April 2021 (UTC)
- Support This will make a useful link between taxa, their names and the taxonomic literature Qgroom (talk) 16:02, 7 April 2021 (UTC)
- Support Providing access to taxonomic treatments in machine-readable form is essential for biodiversity research and connecting taxa/species with specimens, museum collections, identifications, collectors, biodiversity hotspots, and many other ways that are envisioned in concepts such as the biodiversity knowledge graph (https://doi.org/10.7717/peerj.6739), the extended specimen (https://doi.org/10.1093/biosci/biz140), or the digital specimen (https://www.dissco.eu/dissco-consultation-convergence-digital-extended-specimen/). Plazi treatments provide open-access to these data, which often are published in journals behind a paywall and/or in unstructured ways in such publications. --Tdikow (talk) 00:55, 9 April 2021 (UTC)
- Question At the moment Plazi ID (P1992) is linked to items containing taxon name (P225), so something has to be changed. --Succu (talk) 17:38, 7 April 2021 (UTC)
- : Can you elaborate on this? What needs to be changed? Is it that the range for taxon name (P225) is increased, or is it the type specification using instance of (P31) that needs to be clarified. --Andrawaag (talk) 12:20, 8 April 2021 (UTC)
- Another one: 31F96F41E3E002BD88985A4F3A20E45A (DwCA) and 31F96F41-E3E0-02BD-8898-5A4F3A20E45A (persistent identifier) link to the same page. What's the prefered format? --Succu (talk) 18:18, 7 April 2021 (UTC)
- The two versions are different by including “-” as it is used in Zoobank for UUIDs, whilst in Plazi UUID without “-” are used. That means the two are valid and interchangeable. --Mguidoti (talk) 12:15, 8 April 2021 (UTC)
- Should Voluta pertusa have a taxonomic treatment? The name is surpressed (Voluta pertusa Linnaeus, 1758; Voluta morio Linnaeus, 1767; Voluta ruffina Linnaeus, 1767; Bulla conoidea Linnaeus, 1767 (Mollusca, Gastropoda) : proposed suppression under the plenary powers (Q43377431)) or see here (AnimalBase). --Succu (talk) 18:37, 7 April 2021 (UTC)
- A taxonomic treatment is the published text that accompanies the publication of the respective name. Thus it is irrelevant, whether the name is suppressed or not, the respective publication and its treatment exist. In a later treatment or list, the suppression is officialized, that is the NAME itself is deprecated or suppressed. This act also has an associated treatment to it. This group of treatments tells the taxonomic history of that given taxon concept. --Mguidoti (talk) 12:11, 8 April 2021 (UTC)
- TaxPub has a definition for a taxon treatment (taxonomic treatment (Q32945461)). I can not see this here.
So for the moment Oppose. Please clarify this. --Succu (talk) 18:57, 7 April 2021 (UTC)- We indeed slimmed down the definition to how it is currently by modelling it to child (P40). Initially, we were thinking of "objects are annotations/publications documenting the features or distribution of a related group of the subject." which came from the same definition that you cite: "taxon treatment: A publication or section of a publication documenting the features or distribution of a related group of organisms (called a “taxon”, plural “taxa”) in ways adhering to highly formalized conventions.". However I thought that the definition expressed in terms of Wikidata items, read a bit weird, so we modelled according to child (P40). How would you describe this property using this definition? --Andrawaag (talk) 11:46, 8 April 2021 (UTC)
- I think there is a difference between nomenclatural act (Q56027914) (eg. ZooBank) and taxonomic treatment (Q32945461) (Plazi). --Succu (talk) 19:09, 7 April 2021 (UTC)
- Yes, these are two different things. The nomenclatural act is an “act” invoking something rules by the Code of nomenclatures. The taxonomic treatment is a section of a publication that can describe the facts behind a nomenclatural act, e.g. describing a new species, or an objective synonym, or the designation of a neotype, but a treatment can also add data that is not linked to a nomenclatural act, such as additional description of distribution data. --Myrmoteras (talk) 11:56, 8 April 2021 (UTC)
- @User:Myrmoteras: Maurilloides (Q106254996) should be referenced to Systematics and convergent evolution in three Australian genera of Pepsinae spider wasps (Hymenoptera: Pompilidae) (Q106255296) not to a "Plazi treatment". --Succu (talk) 19:42, 7 April 2021 (UTC)
- Maurilloides is a reference to a treatment, not the article per se. The reason to link it to a Plazi treatment is that the plazi treatment is open access, whilst in many cases the article is not. Adding the suggested article reference can be done. --Myrmoteras (talk) 12:04, 8 April 2021 (UTC)
- Is significant event (P793) = nomenclatural act (Q56027914) an alternitive? --Succu (talk) 20:28, 7 April 2021 (UTC)
- Although I personally believe significant event (P793) = nomenclatural act (Q56027914), @User:Myrmoteras stated above that taxonomic treatments and nomenclatural acts are not the exact same thing. The latter is always explicit in an instance of the former, but the former can also provide information that extend the knowledge on a given taxon concept instead of proposing any nomenclatural acts. --Mguidoti (talk) 13:27, 8 April 2021 (UTC)
- @Andrawaag: Could you please provide an example with multiple (>2) taxonomic treatments in different sources by different authors. --Succu (talk) 07:12, 8 April 2021 (UTC)
- @Andrawaag: Added one myself: small-spotted catshark (Q84822) formerly known as Squalus canicula (Q106420292). --Succu (talk) 17:28, 8 April 2021 (UTC)
- @Succu: Here is an example of taxon name Rhinolophus affinis (Q1765459) with four different treatments from different publications and authors: Burgin, 2019 (885887A2FFE68A03F84FEFCAF52ADA38), Suyanto & Struebig, 2007 (725D87ABFFF5FFBCFD7D51A7F024FAEB), Voon-Ching et al., 2017 (03F3F77FFF85FF9FFDB4D441BD28FEA8) and Thomas et al., 2013 (690487A5FFD3FF9F0EB0FF32FD84D2F2). --Mguidoti (talk) 19:34, 8 April 2021 (UTC)
- @Andrawaag: @Succu: @Myrmoteras : @Mguidoti: Is anyone bothered by small-spotted catshark (Q84822) having five values for Plazi ID (P1992)? I guess this raises the age old question of "what is a taxon?". But it seems to me that treatments are really pairs of taxa and publications, which might argue for treating them as roles for references as suggested below. --Rdmpage (talk) 19:37, 8 April 2021 (UTC)
- @Rdmpage: I wouldn't say bothered, but rather excited. These five IDs leads to taxonomic treatments - the exact property being proposed here. They are openly and freely available exerts from publications authored by different people at different times, adding different pieces of information (carrying their respective authors interpretation of the taxon concept) to the taxon concept that is being represented here by the vernacular name small-spotted catshark (Q84822) (which has the taxon name as Scyliorhinus canicula). These exerts - taxonomic treatments - are parts of original publications that actually deal with the given taxon concept and as such, should be cited and/or linked directly, properly and unambiguously. These five IDs makes the case for taxonomic treatments as a property, not the opposite, in my humble opinion. --Mguidoti (talk) 20:07, 8 April 2021 (UTC)
- I share the interpretation of @Mguidoti: that the unique identifiers to individual treatments of the same taxon provide the taxonomic concepts given at the time by a particular researcher and therefore it is neccessary to have unique identifiers for each individual treatment (and not lump them together under a taxon ID). --Tdikow (talk) 00:55, 9 April 2021 (UTC)
- @Mguidoti: @Tdikow: I feel that I understand what treatments are, and I'm happy that we seem to agree that having multiple Plazi ID (P1992) as values for taxon identifiers is not ideal. My concern here is how best to link treatments and taxa. In particular, as someone who writes queries to retrieve data on publications, people, and taxa (see https://alec-demo.herokuapp.com ) my first reaction to new properties like this is "How do I include that in queries? Does it make my life as a developer easier or harder?". Hence my question below about whether treatments are best treated as references for taxon name (P225) rather than as another property of a taxon. --Rdmpage (talk) 05:16, 9 April 2021 (UTC)
- For the new established monotypic genus Kalathomyrmex (Q6351608) Plazi has three different IDs: 012DFB545CD21DC0D5AC2D6043D273F6, 03FC87DAF514C57942B2FE14FDD6FBC0 and 03FC87DAF504C56842B2FE9FFB83FBEE All are refering to the treatment in Revision of the fungus-growing ant genera Mycetophylax Emery and Paramycetophylax Kusnezov rev. stat., and description of Kalathomyrmex n. gen. (Formicidae: Myrmicinae: Attini) (Q97497705) p. 21-22 and 5. Furthermore there are two more IDs for the single species Kalathomyrmex emeryi (Q13374210): 883BEA8BA6C71A6891ACFCA24FD6449B and 03FC87DAF515C57442B2FBBEFE1BF860. I don't think WD should have five (!) items for the same thing. --Succu (talk) 06:41, 9 April 2021 (UTC)
- @Mguidoti: @Myrmoteras: OK now I am confused. For Kalathomyrmex emeryi (Q13374210) Plazi has two treatments as @Succu: notes. Looking at their content they seem to be the same thing, yet they have different values for "persistent identifier" ( http://treatment.plazi.org/id/883BEA8B-A6C7-1A68-91AC-FCA24FD6449B and http://treatment.plazi.org/id/03FC87DA-F515-C574-42B2-FBBEFE1BF860 ). So these seem to be duplicates. Their version history on Plaza tells us that they were contributed by different people. Are both to be included in Wikidata? Or just the one linked to the Zenodo record for the parent publication? For the genus Kalathomyrmex (Q6351608) things are also confusing. There are three treatments, one of which appear to be duplicates (again from two different sources). The other treatment is the same name in a different location in the article. The value for "persistent identifier" given on the Plazi site is the same for all three treatments(!), and is the same as the ZooBank ID for the taxon. So it's a little unclear what "persistent identifier" means and to what it refers (a treatment, a taxon, what?). And just to make things more interesting there are two DOIs for Revision of the fungus-growing ant genera Mycetophylax Emery and Paramycetophylax Kusnezov rev. stat., and description of Kalathomyrmex n. gen. (Formicidae: Myrmicinae: Attini) (Q97497705), one minted by Plazi, the other by the journal itself. I think it would be helpful to have a clear sense of what treatments from Plazi will added. In this particular case where there are duplicates and multiple treatments for the same name in the same publication, would it be fair to assume that the plan is to add only those treatments linked to the Zenodo identifier for this publication (see those listed on https://doi.org/10.5281/ZENODO.186623 ) and that the treatments would be linked to their parent Revision of the fungus-growing ant genera Mycetophylax Emery and Paramycetophylax Kusnezov rev. stat., and description of Kalathomyrmex n. gen. (Formicidae: Myrmicinae: Attini) (Q97497705) by the part of (P361) property? --Rdmpage (talk) 07:36, 9 April 2021 (UTC)
- At the moment, the goal is to add taxonomic treatments of new described species to Wikidata as they become published on a daily base. In the longer term, we plan to add also treatments of already known taxa, including protologues. This scenario is in foreseen exactly because within Plazi we work to avoid uploading duplicates and other artifacts by running a quality control in this respect. This is already in place for the upload of treatments to BLR, seen by the much lower number of treatments (20% currently) in BLR than TreatmentBank.
- A reason for multiple treatments of the same taxon in the same publication is that few authors split their treatments in various sections in a publication, each covering a specific topic.
- @rdmpage: The goal is to use the original DOI of the article if available, add the article to Wikidata as scholarly article (Q13442814) if not already available, use those treatments that are linked to the Zenodo deposit of the article, and which passed the quality control mentioned above.
- @succu: thanks for pointing out the Kalathomyrmex (Q6351608) case above. We will fix it.
- @Myrmoteras: Great, this is what I supposed you'd be doing. Treatments in Zenodo added to Wikidata, and linked to parent publication in Wikidata. Nice. So, my only question is whether (a) it is best to have a distinct property for "taxon treatment" (as proposed here) or (b) add treatments as "references" for the taxon name and use the existing reference has role (P6184) property to flag that it is a treatment (or, if wanted, a more specific role for that treatment, such as first description, etc.), so that treatments are listed alongside the other references for a name. To be clear, my comments here are in no way against treatments, or adding them to Wikidata, I'd just like to make sure there's a good reason to add another property to an already messy model of taxonomy Wikidata_talk:WikiProject_Taxonomy/Archive/2020/06#Understanding_Wikidata_taxonomy. Put another way, it seems that we can add treatments and link them to taxa right now using existing properties and practice (although we'd need to deal with the constraint violations for Plazi ID (P1992)) --Rdmpage (talk) 10:21, 9 April 2021 (UTC)
- @Myrmoteras: did exactly this at Q106254996#P225. --Succu (talk) 20:09, 12 April 2021 (UTC)
- @Myrmoteras: Great, this is what I supposed you'd be doing. Treatments in Zenodo added to Wikidata, and linked to parent publication in Wikidata. Nice. So, my only question is whether (a) it is best to have a distinct property for "taxon treatment" (as proposed here) or (b) add treatments as "references" for the taxon name and use the existing reference has role (P6184) property to flag that it is a treatment (or, if wanted, a more specific role for that treatment, such as first description, etc.), so that treatments are listed alongside the other references for a name. To be clear, my comments here are in no way against treatments, or adding them to Wikidata, I'd just like to make sure there's a good reason to add another property to an already messy model of taxonomy Wikidata_talk:WikiProject_Taxonomy/Archive/2020/06#Understanding_Wikidata_taxonomy. Put another way, it seems that we can add treatments and link them to taxa right now using existing properties and practice (although we'd need to deal with the constraint violations for Plazi ID (P1992)) --Rdmpage (talk) 10:21, 9 April 2021 (UTC)
- @Mguidoti: @Myrmoteras: OK now I am confused. For Kalathomyrmex emeryi (Q13374210) Plazi has two treatments as @Succu: notes. Looking at their content they seem to be the same thing, yet they have different values for "persistent identifier" ( http://treatment.plazi.org/id/883BEA8B-A6C7-1A68-91AC-FCA24FD6449B and http://treatment.plazi.org/id/03FC87DA-F515-C574-42B2-FBBEFE1BF860 ). So these seem to be duplicates. Their version history on Plaza tells us that they were contributed by different people. Are both to be included in Wikidata? Or just the one linked to the Zenodo record for the parent publication? For the genus Kalathomyrmex (Q6351608) things are also confusing. There are three treatments, one of which appear to be duplicates (again from two different sources). The other treatment is the same name in a different location in the article. The value for "persistent identifier" given on the Plazi site is the same for all three treatments(!), and is the same as the ZooBank ID for the taxon. So it's a little unclear what "persistent identifier" means and to what it refers (a treatment, a taxon, what?). And just to make things more interesting there are two DOIs for Revision of the fungus-growing ant genera Mycetophylax Emery and Paramycetophylax Kusnezov rev. stat., and description of Kalathomyrmex n. gen. (Formicidae: Myrmicinae: Attini) (Q97497705), one minted by Plazi, the other by the journal itself. I think it would be helpful to have a clear sense of what treatments from Plazi will added. In this particular case where there are duplicates and multiple treatments for the same name in the same publication, would it be fair to assume that the plan is to add only those treatments linked to the Zenodo identifier for this publication (see those listed on https://doi.org/10.5281/ZENODO.186623 ) and that the treatments would be linked to their parent Revision of the fungus-growing ant genera Mycetophylax Emery and Paramycetophylax Kusnezov rev. stat., and description of Kalathomyrmex n. gen. (Formicidae: Myrmicinae: Attini) (Q97497705) by the part of (P361) property? --Rdmpage (talk) 07:36, 9 April 2021 (UTC)
- @Andrawaag: @Succu: @Myrmoteras : @Mguidoti: Is anyone bothered by small-spotted catshark (Q84822) having five values for Plazi ID (P1992)? I guess this raises the age old question of "what is a taxon?". But it seems to me that treatments are really pairs of taxa and publications, which might argue for treating them as roles for references as suggested below. --Rdmpage (talk) 19:37, 8 April 2021 (UTC)
- Another one: 31F96F41E3E002BD88985A4F3A20E45A (DwCA) and 31F96F41-E3E0-02BD-8898-5A4F3A20E45A (persistent identifier) link to the same page. What's the prefered format? --Succu (talk) 18:18, 7 April 2021 (UTC)
- Question There's a discussion Rethinking "publication in which this taxon name was established" which seems relevant here. One approach is to have things like publications and treatments as properties of taxon (Q16521), the other approach is to connect publications (and treatments) to taxon name (P225) and use reference has role (P6184) to state what role the publication (or treatment) has. This seems to be what is sort of already done with Plazi ID (P1992) where it is used as a reference (e.g., Maurilloides (Q106254996)). So my concern here is that we try and be consistent. Either we are going to have publications linked to taxa as properties of taxa (in which case we're likely to end up with a bunch of properties depending on what the reference does, we already have P5326 (P5326)) or we have publications as references for taxon names and we qualify what role those play (e.g., first valid description (Q1361864), protologue (Q1928959), recombination (Q14594740), etc.). In other words, do we do this:
- taxon → property: taxonomic treatment → treatment QID
- or do we do this:
- taxon → property: taxon name → taxon name → reference → treatment QID, has role=treatment
- --Rdmpage (talk) 16:23, 8 April 2021 (UTC)
- : I would argue the first, not? The second is invalid because in Wikidata it is not possible to add qualifiers to references. --Andrawaag (talk) 18:03, 10 April 2021 (UTC)
- Following on from this, I've added the treatment Mileewa digitata Yu, He & Yang, 2021 (Q105626708) as a reference for Mileewa triloba (Q105626659) as an example of the approach I'm suggesting might be a better way to link treatments to names. --Rdmpage (talk) 05:41, 9 April 2021 (UTC)
Property | Value | ||
---|---|---|---|
taxon name | Mileewa triloba | ||
reference 1 | |||
stated in (P248) | Mileewa digitata Yu, He & Yang, 2021 (Q105626708) | ||
reference has role (P6184) | taxonomic treatment (Q32945461) |
- The difference between reference has role (P6184) and treatments as taxon (Q16521) is that the former states that there is a treatment with a particular role, the second provides the treatment and with this allows to make use of taxon specific links to gene sequences, specimens, figures or bibliographic citations stated in the treatment. It also allows to make use of citations of previous treatments (analogous to the citations of a publication in a publication via a bibliographic reference). They have two different functions and both are needed. In comparison to a publication where a taxon or mulitples of can be cited, a treatment and its content is specific about one taxon, which allows much more specific queries, and thus is a great enhancement to the very notion of wikidata. A taxonomic name refers to a taxonomic treatment in a publication. A taxonomic treatment (protologue (Q1928959) is a pre-requisite by the taxonomic Codes to make a taxonomic name available. Thus the proposed taxonomic treatment allows direct linking. From a taxon point of view, the relevant part of a publication is the treatment, not the entire publication. This is also shown in the widespread use in taxonomy by authors including in their citation of a taxonomic name not just the article, but the actual page of the treatment of the taxon (see e.g. [1] Ampulex fasciata Jurine, 1807: 133.) --Myrmoteras (talk) 08:52, 9 April 2021 (UTC)
- @ Myrmoteras: I don't quite follow this. I don't think the issue is "what is a treatment?" rather it is how do we link this to an item for a taxon (Q16521) in a way that (a) is easy to add and (b) easy to query. In other words, do we have a specific property of a taxon (Q16521) (as proposed here), analogous to P5326 (P5326), or do we regard a treatment as a reference (in the Wikidata sense of referencing a claim) for a taxon name (P225). I don't think either choice impacts your desire for "taxon specific links to gene sequences, specimens, figures or bibliographic citations stated in the treatment". Each treatment would exist as a Wikidata item, and hence each could be linked to these other things (such as the parent publication, publications or treatments cited by that treatment, etc.). To make this more concrete, here is a query https://w.wiki/3AoX that retrieves the treatment for Mileewa triloba (Q105626659) and links the treatment to the enclosing publication. This query is crude, what I'd want to add is the ability to include other references that are direct to publications not just treatments, but that would be straightforward to do. So, to repeat, I don't think the issue is the value of treatments, it how we connect them to taxa in Wikidata. --Rdmpage (talk) 09:36, 9 April 2021 (UTC)
Sigh @Myrmoteras: Do you have an idea how a scientific name is modeled here? Please recheck Aspidosiphon cutleri (Q106253474), Niphargus yasujensis (Q106254624) and Cerchysiella raddeii (Q106154577). --Succu (talk) 18:23, 9 April 2021 (UTC)
@Andrawaag: A bot account should be created for this specific task (or a task approved for an existing one). --Succu (talk) 18:23, 9 April 2021 (UTC)
- @Succu I am aware of that. It is either going to be an additional task to [[2]] or new bot account related to plazi. I would like to wait a bit to see how this discussion evolves, whether it is going to be an additional task or a new bot account --Andrawaag (talk) 16:13, 10 April 2021 (UTC)
- Question @Myrmoteras, Mguidoti, Qgroom, Tdikow and Andrawaag: Every Plazi taxon treatment ID (Q20644485) is associated with a rdf representation. Why not create a triplestore (Q3539533) at Plazi and allow federated queries federated query (Q105159989). Wouldn't that really free the data? --Succu (talk) 18:52, 9 April 2021 (UTC)
- : There are sparql endpoint for RDF from plazi (eg: lindas. However to support federation, of which I am big proponent, we do need mappings betweens wikidata and the orginal RDF of the treatments in Plazi. For this I believe we need the property being proposed here. Also, by having a minimal description of treatments in Wikidata there is also the possibility to allow federation with non-Plazi RDF resoures on treatments. --Andrawaag (talk) 17:17, 10 April 2021 (UTC)
- Why use taxonomic treatment (Q32945461) and not circumscription (Q5121761)? --Succu (talk) 19:25, 9 April 2021 (UTC)
- All examples above are about a species nova (Q27652812) and a single treatment. Tyrannosaurus rex (Q13098211) has 85 treatments (if I got the query right). Aparently all this treatments should be distinguishable by the item lable. Any plans how to do this? --Succu (talk) 19:30, 10 April 2021 (UTC)
- Question @Andrawaag : The more I think about this the more I think having this proposed property and Plazi ID (P1992) as properties of taxon (Q16521) is not the best way forward. A treatment is a combination of taxon (Q16521) and, say, scholarly article (Q13442814), hence Plazi ID (P1992) and treatment ID are a property of that pair of entities. In this way they are similar to Open Citation Identifier (Q56447794), which is an identifier for a citation relationship between two publications (see https://opencitations.net/oci for details). OCI are not properties of an individual publication that are properties of the relationship. Likewise, treatment ids are not properties of taxa, or of publications, but of the relationship between those two things. (If we were modelling this in a labelled property graph we'd have treatment ids as labels for edges connected taxon names and publications.) Hence I'd argue that these identifiers should be used as qualifiers for references for taxon name (P225). If we have the identifiers for treatments then we can express the relationship between publication and taxon name. These identifiers could be Plazi ID (P1992) and/or digital object identifier (Q25670) for treatments in Zenodo. --Rdmpage (talk) 09:36, 11 April 2021 (UTC)
- @Rdmpage: Except that on Wikidata the items regarding taxon and taxonname are confluated. There are more properties to taxon, that should actually be aligned as qualifiers to taxon name (P225) (eg. has basionym (P566)). So aligning them to taxon name (P225) as qualifiers would be the way forward, except that in the current Wikidata model it is not possible to add qualifiers to references (nor other qualifiers). This is where the qualifier/reference model of wikidata in all its beauty can be a burden. Maybe a simpler sollution would be to store those taxonname as items with a instance of (P31) scientific name (Q10753560) and change the value type of taxon name (P225) to item, versus the string now. However, this is a discussion that we should have in a bigger context (or new property proposal (e.g. scientific name as item) than in this property proposal. Moving forward I would suggest to use this proposed property as we use the property has basionym (P566), i.e. as property to taxon. To allow for future changes to the schema where the property is linked to scientific name items, we could store the name as a qualifier to this property. --Andrawaag (talk) 10:41, 11 April 2021 (UTC)
- @Andrawaag : I agree that taxa and taxon names are conflated in Wikidata, but that's a near universal feature of taxonomic databases (doesn't mean it's a good thing, just that it's not unique to Wikidata). I'm puzzled by your statement except that in the current Wikidata model it is not possible to add qualifiers to references. Surely this is incorrect? You can add properties to references, for example Maurilloides (Q106254996) has a reference that includes Plazi ID (P1992), digital object identifier (Q25670), and publication date (P577), which is pretty much exactly the model I'm suggesting. --Rdmpage (talk) 10:54, 11 April 2021 (UTC)
- You can add qualifiers to the statements, not to the references. I guess with some creativity you could replicate your proposal: "taxon → property: taxon name → taxon name → reference → treatment QID, has role=treatment" as "taxon → property: taxon name → taxon name → qualifier: has role=treatment → references: treatment QID, DOI, etc". For a person with some taxonomic background it can be clear that the has role qualifier applies to the treatment ID, not to the scientific name, but that role is implicit and will not be picked up in machine readable appraoches. To be semantically correct, you would need to add that role as qualifier to the referenced treatment. In Wikidata for both the qualifiers and references you cannot add those additional metadata. --Andrawaag (talk) 12:50, 11 April 2021 (UTC)
- @Andrawaag: OK, now you've got me confused. For any individual reference for a property value I can add properties such as identifiers, role, date, DOI, etc. You've done this yourself for Cerchysiella raddeii (Q106154577). I can retrieve those in SPARQL https://w.wiki/3B9S as properties of the reference (via prov:wasDerivedFrom), therefore by definition this is machine readable. What am I missing? --Rdmpage (talk) 15:05, 11 April 2021 (UTC)
- @Rdmpage: We might have different definitions of qualifiers. For me references are available through prov:wasDerivedFrom, while qualifiers are available through the prefix pq:. Expressed as a pseudo EntitySchemas a statement with both qualifiers and references is as follows:
<item> p:XX { ps:XX . ; #statement pq:YY . * ; #qualifier # Zero or more qualifiers prov:wasDerivedFrom { pr:ZZ .* # Zero or more references }
- Many use cases in Wikidata ignore the p: prefix completely and use the wdt: prefix which captures only the "truthy" statements. Valid statements using qualifiers and references are completely lost when using the truthy subgraph of the Wikidata knowledge graph. So yes reference and qualifiers are great to capture metadata, but whenever more depth is needed like in this example where we want to add a qualifier to the reference we need something like <item> -> prov:wasDerivedFrom -> pq:hasRole treatment -> prov:wasDerivedFrom plazi etc. This recursive use of qualifiers is currently not supported in Wikidata and I am not sure if we should advocate to do so. Than remodelling using direct properties is more straightforward. --Andrawaag (talk) 16:20, 11 April 2021 (UTC)
- @Andrawaag: I guess I don't understand why you think recursion is needed here. For a Wikidata property value we can have one or more references (prov:wasDerivedFrom). Each reference can have one or more properties (with prefix pr:) which we can associate things such as identifiers, dates, page ranges, roles, etc. for a given reference. No need for recursion. In the SPARQL query I gave we can retrieve the PlaziID (and other values) for each reference for a taxon name. You did essentially this for Cerchysiella raddeii (Q106154577) see https://www.wikidata.org/wiki/Q106154577#P225 . Why is that not enough for your purposes? (Having said that, personally I'd make A new species of Cerchysiella (Hymenoptera: Encyrtidae) parasitic in larva of chestnut trunk borer (Coleoptera: Cerambycidae) from China with notes on its biology (Q104453394) the value of stated in (P248) for this reference because that's the source of the name, the treatment refers to the relevant part of the publication). Sorry if I'm being dense here but I think this is being made more complicated than it needs to be. I view treatments as just one example of the general notion that we can link a name to a publication and combine that with some notion of where in the publication that name occurs, such as specific page, a set of pages, or a block of text (a treatment). Given that we can link publications to names already (as references) and we can add property, value pairs to that statement of provenace (accessible by querying for properties of the provenace statement with the prefix pr:), why do we need a specific property for treatments? And if you think we do, is it a property of a taxon or a publication? (I would argue it is neither). --Rdmpage (talk) 18:33, 11 April 2021 (UTC)
- @Andrawaag: Just to be clear, from my perspective I want to assign properties to the (taxon name, publication) pair. The role a publication plays is specific to a particular name, so being able to say that a given (taxon name, publication) pair (AKA a "reference" in Wikidata) has certain properties (e.g., is treatment, has an external identifier, etc.) is pretty much exactly what I want to say. Hence it seems to me that we have everything in place already to handle treatments (providing we remove the Plazi ID (P1992) constraint that requires it to be associated with a taxon name (P225)). --Rdmpage (talk) 15:26, 11 April 2021 (UTC)
- @Rdmpage: It is more to the point assign properties to the (taxon name, taxonomic treatment) pair, because a taxon name refers primarily to a taxonomic treatment, and secondarily to a publication because the treatment happens to be a section of a publication. The taxonomic name Apis mellifera L. 1758 refers to this treatment directly (http://treatment.plazi.org/id/9082C709-6347-4768-A0DC-27DC44400CB2). The link to the publication is ambiguous. It does not provide unambiguously what is referred to, rather in this case to 4819 treatments, nor in a format that can easily be reused Article DOI (https://doi.org/10.5962/bhl.title.542). At the same time, the link from a treatment to the respective article is provided. --Myrmoteras (talk) 07:43, 12 April 2021 (UTC)
- @Myrmoteras: I guess I'm confused because I think this is exactly what references in Wikidata can already do. To make this concrete, I've added a treatment to Apis mellifera (Q30034). It already had a reference to ITIS, I've added one to 10th edition of Systema Naturae (Q4547210) that includes properties stating the actual physical page, that page in BHL's scan, and the Zenodo DOI and PlaziID for the treatment. Here's a screenshot with an explanation. So my question is "why doesn't this do what you want?". We can store the relevant details about the location of the name in a publication, identifiers for that location 9the treatment), and we can retrieve that information, see https://w.wiki/3BFX. --Rdmpage (talk) 09:25, 12 April 2021 (UTC)
- Question @Andrawaag: In Sitana sushili (Q106040954) (created 2021-03-19) you used a different model using described at URL (P973) and object of statement has role (P3831). Why was this approach rejected? --Succu (talk) 16:41, 15 April 2021 (UTC)
- @Succu: I understand this was addressed to User:Andrawaag, but I think it's a no-brainer: most likely because they were exploring alternatives before fully understanding the conceptual and practical implications of this endeavor, which culminated in this property proposal as the preferred and logical way to approach this. Probably forgot to undo the edit. --Mguidoti (talk) 19:12, 15 April 2021 (UTC)
- Question @Andrawaag: In Sitana sushili (Q106040954) (created 2021-03-19) you used a different model using described at URL (P973) and object of statement has role (P3831). Why was this approach rejected? --Succu (talk) 16:41, 15 April 2021 (UTC)
- @Myrmoteras: I guess I'm confused because I think this is exactly what references in Wikidata can already do. To make this concrete, I've added a treatment to Apis mellifera (Q30034). It already had a reference to ITIS, I've added one to 10th edition of Systema Naturae (Q4547210) that includes properties stating the actual physical page, that page in BHL's scan, and the Zenodo DOI and PlaziID for the treatment. Here's a screenshot with an explanation. So my question is "why doesn't this do what you want?". We can store the relevant details about the location of the name in a publication, identifiers for that location 9the treatment), and we can retrieve that information, see https://w.wiki/3BFX. --Rdmpage (talk) 09:25, 12 April 2021 (UTC)
- Support I only saw the initial arguments, but looks like a solid usage that improves the reach of Wikidata TiagoLubiana (talk) 16:50, 12 April 2021 (UTC)
- Comment I'm curious about the answer to User:Succu's question about Tyrannosaurus rex (Q13098211). Does this scale in Wikidata? --- Jura 18:09, 12 April 2021 (UTC)
- From Forty-seven new species of Sinopoda from Asia with a considerable extension of the distribution range to the South and description of a new species group (Sparassidae: Heteropodinae) (Q96679826): Sinopoda Jäger, 1999 the genus and Sinopoda, Jager, 1999 (=Sinopoda sp.). The latter one is an unidentified species, but internally handeled as the genus Sinopoda (Q3485044). --Succu (talk) 19:25, 12 April 2021 (UTC)
- The latter is now changed to species Sinopoda sp.. Thanks for pointing this out. We will check similar cases.--Myrmoteras (talk) 06:22, 13 April 2021 (UTC)
- From The bees of the genus Hylaeus Fabricius 1793 of the Asian part of Russia, with a key to species (Hymenoptera: Apoidea: Colletidae) (Q97574284): Plazi has an entry for Hylaeus Fabricius 1793 (the genus Hylaeus (Q940917)) and a key Hylaeus to the „species of the Asian part of Russia“ (identification key (Q218682)). --Succu (talk) 19:51, 12 April 2021 (UTC)
- Internally, we know that the latter includes a key. At the same time these are two indipendent sections of a publication. From a point of view to retrieve keys, it is better to keep the two treatments separate. In case for wikidata, this could also be changed. Open for disucssion.--Myrmoteras (talk) 06:22, 13 April 2021 (UTC)
- @Myrmoteras: Why the hell is Acanthaceae at treatment of the family Acanthaceae (Q53475)? It treats the species Justicia adhatoda (Q61022811) and cites the lectotype provided by JARVIS, C. Order out of chaos. Linnaean plant names and their types. The Linnean Society of London in association with the Natural History Museum, London, London: 2007. Pp xii, 1,017; illustrated. Price £ 80.00. ISBN 978-0-9506207-7-0. (Q96173143). There is an endless amount of such kind of examples. --Succu (talk) 20:49, 13 April 2021 (UTC)
- @Succu. Again, thanks for pointing this out. it is fixed now Acanthaceae and I expect in this J section of the book no more errors. Why this is about Acanthaceae is provided in the treatments that Jarvis published. May be keep in mind, that Jarvis book is a print only and to get the data out is a lot of work we think is worth it, because this is a relevant work. Converting printed books into facts is not an easy task, but with your and other help, we can improve this further by fixing errors, and learn from it to improve the algorithms. --Myrmoteras (talk) 20:57, 14 April 2021 (UTC)
- Sicyos (Sicyos (Q628933)) is claimed to be a treatment of the family (!) Violaceae (Q156060) According to Biodiversity inventories in high gear: DNA barcoding facilitates a rapid biotic survey of a temperate nature reserve. (Q30994040). The paper is correct: Family Cucurbitaceae / Sicyos angulatus Linnaeus / Family Violaceae / ... The treatment not. --Succu (talk) 21:10, 13 April 2021 (UTC)
- @Succu please explain. Plazi (Q7203726) does not add any higher taxa to articles imported from Biodiversity Data Journal (Q19370769) and thus the error comes from the publisher.
- @Myrmoteras: Why the hell is Acanthaceae at treatment of the family Acanthaceae (Q53475)? It treats the species Justicia adhatoda (Q61022811) and cites the lectotype provided by JARVIS, C. Order out of chaos. Linnaean plant names and their types. The Linnean Society of London in association with the Natural History Museum, London, London: 2007. Pp xii, 1,017; illustrated. Price £ 80.00. ISBN 978-0-9506207-7-0. (Q96173143). There is an endless amount of such kind of examples. --Succu (talk) 20:49, 13 April 2021 (UTC)
- Internally, we know that the latter includes a key. At the same time these are two indipendent sections of a publication. From a point of view to retrieve keys, it is better to keep the two treatments separate. In case for wikidata, this could also be changed. Open for disucssion.--Myrmoteras (talk) 06:22, 13 April 2021 (UTC)
- This has the potential to scale up with continued and increased funding, increased involvement of the community, and increasing number of journals using semantic markup. As mentioned above, the intention right now is to add treatments for new species and in a later phase add additional treatments, which includes a defined set of criteria when they can be uploaded similar to what is in place for the upload of treatments to BLR.--Myrmoteras (talk) 06:22, 13 April 2021 (UTC)
- The question is if the proposed way of modeling scales within Wikidata (properties with a large number of statements on the same item and large number of items with similar labels aren't ideal), not how much funding you can get or not. --- Jura 08:47, 13 April 2021 (UTC)
- From Forty-seven new species of Sinopoda from Asia with a considerable extension of the distribution range to the South and description of a new species group (Sparassidae: Heteropodinae) (Q96679826): Sinopoda Jäger, 1999 the genus and Sinopoda, Jager, 1999 (=Sinopoda sp.). The latter one is an unidentified species, but internally handeled as the genus Sinopoda (Q3485044). --Succu (talk) 19:25, 12 April 2021 (UTC)
- An alternative could be to add this directly to publications instead. I suppose this would avoid having to answer the question about Tyrannosaurus rex (Q13098211) --- Jura 09:26, 14 April 2021 (UTC)
Finally Oppose: The source is unreliable. A last example: This plazi query is about the genus Serratia (Q134980) in the Kingdom Bacteria. The treatment referes to The Luciolinae of S. E. Asia and the Australopacific region: a revisionary checklist (Coleoptera: Lampyridae) including description of three new genera and 13 new species (Q86977050), hence to the genus Serratia (Q71426030) in the Animal kingdom. Note: I removed more than 2000 ids I created in 2015 which do not longer exist. --Succu (talk) 18:51, 14 April 2021 (UTC)
- @Succu:Thanks for pointing out this and some other issues (above) with the automated data extraction process. It's important to understand, however, a couple of things here:
- 1) the automatic data mining process deals with a platoon of variation in data presentation, caused by different journals layouts and sometimes, graphic editors mistakes, which can hamper the output;
- 2) it's expected to have such issues, that's why Plazi has implemented in the last year a "data transit control" system which blocks the treatment from going to certain destinations (e.g., Zenodo, GBIF, and luckily, Wikidata) if certain conditions are not met, but NOT treatmentBank, which is what you're checking;
- 3) higher ranks are not always available on the publication and Plazi currently rely on outside sources to find the information - Serratia is a homonym, both an Animal and a Bacteria genus, for instance: https://en.wikipedia.org/wiki/Serratia - this particular treatment is indeed wrongly associated with this kingdom, but that's something Plazi can easily fix as well,
- 4) this kind of data issue is not only fixable, but a non-issue for the linking of the treatment as a property to the taxon item, which is what has been proposed and not any kind of data rollout based on Plazi's attributes and finally,
- 5) in order to proper participate in any discussion, of any subject, it's important to completely understand the underlying processes and the conceptual meaning of their outputs for the given field, otherwise we might take the risk of focusing on non-related issues with what has been proposed.
- Additionally: it has been said here that the intention is to push only the treatments that are describing new species, that meet specific criteria of quality, and that are publish from now on. Adding older treatments should be treated in a second step and it would have to go through the same quality criteria. Adding any treatment (e.g. treaments that only extend the knowledge on a particular taxon name or concept) is out of bonds from what I understood.
- Therefore, I don't see these examples you are citing as a contribution to this current important discussion as they are unrelated to what has been de facto proposed here. However, I do appreciate you pointing these out as it gives the opportunity to fix them, but I think we would have a more fruitful discussion if we could focus on what has been said and proposed after really understanding Plazi's processes and data workflows. For having treatments that propose new taxa/names associated with the taxon as property (as they actually conceptually and theoretically are, because a name can't exist or be valid without a published treatment, according, say, to the nomenclatural code for zoology: https://www.iczn.org/the-code/the-international-code-of-zoological-nomenclature/), that are published from now on and that goes through a set of rules in order to block the transit of bad data, do you have any concerns? --Mguidoti (talk) 19:27, 14 April 2021 (UTC)
- @Succu thanks for pointing this error out. This article includes 127 genera, of which the new genus Serratia has gotten the wrong higher categories. All the other 125 taxa are fine as seen in this Plazi query, now including the edited higher taxa for Serratia and the new species. The article has been processed in 2019-10-18. --Myrmoteras (talk) 20:27, 14 April 2021 (UTC)
- @Succu All our data is accessible, so I hope you have a record of the 2000 deleted record, so we can fix them. There are increasingly more controls in the processing workflow. Being aware of the data issues, we propose here a more careful step to add only recently described new species until we feel ascertained that a high level of accuracy is provided. We are also responsive to errors that can be reported here. Together we can clean this resource up. In return, this resource provides access to hundred thousands of taxonomic treatments and related figures many of which are not accessible because they are in closed access publications, such as a great number of the 10,000 new species treatments published since 2020 and available --Myrmoteras (talk) 20:27, 14 April 2021 (UTC)
This proposal is not about the source of data, but to introduce the property taxonomic treatment . As already stated in the introduction by Andrawaag, this is an element that has been an integral part of taxonomy (Q8269924) and the charting of the diversity of the world from the beginning of modern binomial nomenclature (Q36642) in taxonomy (Q8269924) by Linnaeus 1753 Species Plantarum. 1st Edition (Q21856050) for plants, and Linnaeus 1758 Systema Naturae. 10th edition, Volume 1 (Q21608408) in zoology. Each Latin Binomen both in the plants and the animal volume is followed by a taxon treatment taxonomic treatment (Q32945461) that is clearly delimited and about the respective taxon. In these treatments Linnaeus already provided concise data about the species he describes, such as the flowers the Apis mellifera (Q30034) visits, or its distribution (see the original treatment. Today these treatments (e.g. Stephostethus yuanfengensis sp. nov. can include much richer data such as material citations that are references to the specimens used in the study, and thus allow reproducibility (Q1425625) of this scientific result, a common practice in open science (Q309823). Published treatments are also reused in Global Biodiversity Information Facility (Q1531570). Thus having a taxonomic treatment propertiy is pertinent for understanding and access to biodiveristy data, not provided by citing the publication nor the page where the treatment is. – The preceding unsigned comment was added by Myrmoteras (talk • contribs).
- @Myrmoteras: Since summer 2013 all taxa of Species Plantarum. 1st Edition (Q21856050) are linked to Linnés original description (query). The treatments at Plazi are incomplete. You missed Species Plantarum. 1st Edition, Volume 2 (Q21856107). BTW: I wrote the featured german articles about Carl Linnaeus (Q1043) and Systema Naturae (Q29270). Regards --Succu (talk) 17:43, 15 April 2021 (UTC)
- @Succu You ask a question about data and plazi, which is not relevant in the question of whether we should have a taxonomic treatment property. However in one aspect, it is relevant. Yes you added a citation but not the actual treatment. When you look at the first taxon Amomum cardamomum (Q21870176) in the response to your (query) you don’t have access to the treatment per se, but to page 1 that includes six taxa, not to the treatment only. The only way you get to the treatment is to use the GBIF taxon ID GBIF taxon ID (P846) where there is the original treatment, provided by Plazi. A basionym is about a taxon, not a page where there are taxa. A basionym refers to a very clear, well defined section of a text, the taxonomic treatment. On the respective page 1 there is no way to mis-interpret the boundaries of what Linnaeus provided as taxonomic treatments for each name.
- @Succu Outside the taxonomic treatment property proposal. If you provide a transcript in latin font of Linnaeus Volume 2, we will process. We searched for a respective copy, but it does not exist as far as we know. We do not have the resources to pay for it ourselves. So, here is our offer.
- Question @Succu: I think you missed my question addressed to you in the middle of this large discussion that clearly deviated from the proposal in my humble opinion, so I'm re-posting it here: For having treatments that propose new taxa/names associated with the taxon as property (as they actually conceptually and theoretically are, because a name can't exist or be valid without a published treatment, according, say, to the nomenclatural code for zoology: https://www.iczn.org/the-code/the-international-code-of-zoological-nomenclature/), that are published from now on and that goes through a set of rules in order to block the transit of bad data, do you have any concerns? Thanks in advance --Mguidoti
- @Mguidoti: Sorry, but I don't get your point. Could you please rephrase your question? An example would be helpful. --Succu (talk) 20:36, 15 April 2021 (UTC)
- Sure, @Succu:, of course, and I apologize in advance for this lengthy answer but I feel like I should summarize the points after this huge back-and-forth discussion here, in an attempt to answer the remaining questions from you and @ Jura1:, if you don't mind. So, here are the main points:
- 1) This property proposal is focused on adding taxonomic treatments that define taxon names/concepts from now on only. These treatments currently go through - as they have been for the past few months - a specific quality control that holds data transits in case of broken data. That's why you can find treatments on TreatmentBank that are not on GBIF nor Zenodo, for instance. The same control would apply here. That's the current state-of-the-art. This also means that we wouldn't add 80ish treatments for T. rex as you were concerned at some point (@ Jura1: too I guess). One example of defining treatment that passed the control mechanisms recently: http://tb.plazi.org/GgServer/html/003B87FAFF88FFC5BCCBF42427C3FD8A.
- 2) The data issues pointed out by you are fixable - Plazi has in place a workflow to fix the reported issues alongside with the quality control system for newly extracted ones. There is even a specific Github repository were these can be reported and answered in a timely manner. They aren't, however, in my humble opinion, relevant here, as the goal is to add the defining treatments published from now on only at this point (see above), and not use any kind of data rollout to bring treatments' attributes into the Wikidata taxon item.
- 3) There were in the past a few attempts to add Plazi IDs in Wikidata and because these attempts do not reflect the actual relationship between treatments and taxa/names (see below), we do not believe they are the right way forward conceptually speaking (again, see below) - but the proposal is the right way for these defining treatments. I'm not sure the extension of the problem of having the same ID being referenced in different places (if there is any), maybe @ Andrawaag: can weight-in here - and I'm not sure where @Myrmoteras: stands on these, but I think they could be removed/fixed too, provided this proposal is accepted, assuming that this is a problem.
- 4) Another important point that I made regards what the defining treatment represents for a taxon/taxon name. The latter is defined by the former, not merely cited in. The original source (the publication were the treatment is included) is a reference where the taxon/taxon name were cited, but the treatment (the exact excerpt from the original source) is its true definition, according to nomenclatural codes, and hence, a property of the entity (it's actual, formal, valid and accepted definition). This is the way the field words and it's indirectly stated in nomenclatural codes at this given time. That's another reason why we believe these defining treatments must be a property and any other representation is a hack and doesn't make justice to the undeniable relationship between taxa/names and defining treatments.
- @Mguidoti: Sorry, but I don't get your point. Could you please rephrase your question? An example would be helpful. --Succu (talk) 20:36, 15 April 2021 (UTC)
With that being said, I asked you if there are any concerns that are actually applicable to the actual proposal here. I think the issues raised by you are extremely important and I'm glad you raised them, but it doesn't seem to be part of this property proposal discussion. Thus, my question: considering what I just summarized here and the actual property proposal, are there any concerns? --Mguidoti (talk) 14:46, 16 April 2021 (UTC)
- Support. YULdigitalpreservation (talk) 14:33, 16 April 2021 (UTC)
- Weak support My concerns about the multiplicity of ways we can connect taxa to references remain, but I'm aware that making acceptance of a new property contingent on resolving that issue may be unreasonable. While I don't think that adding treatments to Wikidata requires this property, but based on lengthy discussions here and offline I know that the proposers of this property feel otherwise. If this property is accepted I would really like to see Plazi ID (P1992) cleaned up so that it is constrained to only be assigned to Wikidata items that are treatments (i.e., https://www.wikidata.org/wiki/Property:P1992#P2302 is changed to requiring instance of (P31) to be taxonomic treatment (Q32945461)) and any occurrence of Plazi ID (P1992) that isn't attached to a Wikidata item that is a taxonomic treatment (Q32945461) should be deleted. --Rdmpage (talk) 18:05, 16 April 2021 (UTC)
- Agreed if we create items for Plazi taxon treatment ID (Q20644485) they should be removed from the taxon items. --Succu (talk) 19:50, 16 April 2021 (UTC)
- This proposed rearrengement of the scope of Plazi ID (P1992) from taxon items to treatment items makes sense to me. --Daniel Mietchen (talk) 20:01, 16 April 2021 (UTC)
- Agreed if we create items for Plazi taxon treatment ID (Q20644485) they should be removed from the taxon items. --Succu (talk) 19:50, 16 April 2021 (UTC)
- Oppose given the open questions.
Oppose This appears to be an unneeded inverse property that can lead to countless statements for the property on a single item. The items suggested as values are hard to differentiate by non-bots (i.e. Wikidata volunteer contributors) making the inverse even more problematic.
In addition, the planned automated population based on a reference that may lead to countless updates and possibly overwrites on a fairly important part of our database.
This is not an opinion on the merits of including the information in Wikidata as such (which is already done on specific items) nor on the merits of the planned datasource as such, but one on the planned way of including the information based on experience with Wikidata what can or could work in Wikidata in general. --- Jura 19:12, 16 April 2021 (UTC)- @Jura1 What questions, exactly? I think they were addressed in the summary I presented (and attempted to ping you, sorry if I failed) two or three points above. Have you see it? I have a feeling that you haven't because on your second opposition you said "This appears to be unneeded inverse property that can lead to countless statements for the property on a single item" which I expressively said it isn't the goal (just defining treatments of new species from now on, meaning, one per item). But if you did read my summary/last comment, could you please enumerate the remaining open questions in your opinion? Thanks in advance! --Mguidoti (talk) 19:28, 16 April 2021 (UTC)
- (edit conflict)
- I am for trying to separate the taxon and taxon name statements in Wikidata, as per Andrawaag’s comment from April 11, and I agree with him that the conversation about that is bigger than the current one on whether or not there should be a dedicated property for “taxon treatment”. On that basis, the next question is whether the current property proposal discussion should wait for that larger discussion to be finished, and here, my answer would be no, as it might well be that the current conflation of the concepts for taxon and taxon name will remain in Wikidata for quite some time. Apart from the property proposal itself, there is the screenshot that Rdmpage has shared about storing treatment-related information as part of the reference metadata. The problem I see with this approach is that it conflates information related to a publication (page(s) (P304) and BHL page ID (P687)) with information related to the treatment (DOI (P356) - note that this is the DOI of the treatment, not of the publication - and Plazi ID (P1992)), when it is known that there is not a 1:1 relationship between publications and treatments, e.g. a publication (or even page) might contain multiple treatments, or a treatment might span across multiple pages. So I took a close look at the ’’Apis mellifera’’ example and created a separate item for its treatment, which I then annotated as best I could - see the attached screenshot. What I ended up with is essentially the way Mileewa digitata Yu, He & Yang, 2021 (Q105626708) was modeled that is linked from the property proposal (the difference is that the main subject (P921) property there links to the taxon item, while I linked it to a new item for the taxon name, which reflects the large-scale reorganization discussion Andra alluded to). So with the treatment modeled this way, I think the best way of making the link between taxon and treatment is indeed a dedicated property as proposed here, which I thus Support. --Daniel Mietchen (talk) 19:57, 16 April 2021 (UTC)
- @Daniel: Order out of chaos: Apis mellifera (Q30034), Apis mellifera (Q106519505), Apis mellifera Linnaeus, 1758 (Q106519469) and Apis mellifera (Plazi). --Succu (talk) 20:18, 16 April 2021 (UTC)
- Again, @Succu: I think there is an underlying misconception about what treatments are, about what we are proposing here and now, about how Plazi's search works. Your query results is returning any treatments that mention the queried terms (in this case, Apis mellifera). If you use the treatment API, you would have 17 treatments returned to you, not 60. However, even if it was 60 treatments available for this species, what we are proposing here is to include the defining treatments, mentioned and explained more than once above: which means, one treatment, not 17 (or 60). Please, notice that only one has the label "sp. nov." on the provided link. That's the defining treatment. A taxon concept or name can have multiple treatments - that's normal. We are proposing the addition of the defining ones here. --Mguidoti (talk) 20:38, 16 April 2021 (UTC)
- @Daniel: Order out of chaos: Apis mellifera (Q30034), Apis mellifera (Q106519505), Apis mellifera Linnaeus, 1758 (Q106519469) and Apis mellifera (Plazi). --Succu (talk) 20:18, 16 April 2021 (UTC)
- @Mguidoti: The proposal is a broad one and not restricted to the establishment of a new scientific name (tax. nov.) by a nomenclatural act (Q56027914) (see above). This is an assumption about how your property is used in the future. The question how to label and describe instance of (P31)=taxonomic treatment (Q32945461) to make it clearly distinguishable from instance of (P31)=taxon (Q16521) is unanswered till now. --Succu (talk) 19:37, 17 April 2021 (UTC)
- @Daniel Mietchen: I don't think there's any conflation (although I don't doubt that conflation can happen). The example I gave uses stated in (P248) to say that this name is in 10th edition of Systema Naturae (Q4547210) and then uses qualifiers to say "it's on this page, which also has this BHL Page ID, which corresponds to a treatment treatment with this DOI and Plazi ID". These qualifiers are all about the location of information within the publication, they are qualifiers of the reference field, not properties of 10th edition of Systema Naturae (Q4547210). The name is on page 576, 10th edition of Systema Naturae (Q4547210) doesn't have 576 pages. I regard treatments as just another way of referring to a location within a larger document, just like page numbers or other locators (such as selectors for annotations, see https://www.w3.org/TR/annotation-model/#selectors ). This is why I argued that treatments aren't "special" in the sense of requiring their own property. The duration of the discussion on this proposal suggests that referencing taxonomic names is messy and non-obvious (in that what seems obvious to one person is opaque to another), which is why I'm attempting to disengage (evidently with limited success). --Rdmpage (talk) 23:11, 16 April 2021 (UTC)
- Support Having had a look at the various aspects discussed, it's hard to oversee where everything is, but most are about a different property or seemingly about "we could also mimic it with...". I note a question about the difference of treatment and circumscription; I do not know the difference, but also could not find they are the same. I conclude from this I prefer to support people who want to work on the richness of Wikidata and therefore support the property. --Egon Willighagen (talk) 15:11, 12 June 2021 (UTC)
Would to nice to have an enWP article about taxonomic treatment (Q32945461). A potential source is XML schemas and mark-up practices of taxonomic literature (Q22679571). --Succu (talk) 20:41, 16 April 2021 (UTC)
- @Succu here is the taxonomic treatment article in enWP. The main source is TaxPub: An Extension of the NLM/NCBI Journal Publishing DTD for Taxonomic Descriptions. (Q106628627). --Myrmoteras (talk) 07:08, 27 April 2021 (UTC)
- Comment just a neutral comment, to have a record for one specific name in one specific publication is a bit how works Zoobank. Most of the Zoobank database is hidden to the public, so we can not see it, but I'm aware because I sometimes discuss with an administrator of Zoobank. The records for the new names are available to the public, and a few others too, such as the parent taxa of the new names, e.g. [3] or [4], note the field "Original Usage:" just below the name. Regarding Plazi, note that a Plazi treatment may have it's own GBIF ID, e.g. Plazi → GBIF which in Zoobank would be "Ophiura sarsii Lütken, 1855 sec. Thuy & Stöhr, 2011". Regards, Christian Ferrer (talk) 06:46, 1 August 2021 (UTC)
- Me, currently, when I come accross a publication, I add a reference to the taxon name e.g., with the "what links there" you can see all the names in that publications which is a bit more than the 19 treatments available in Plazi, another view of that wikidata item (and of the corresponding names) can be seen in in that cool toy. Christian Ferrer (talk) 07:50, 1 August 2021 (UTC)
- @Andrawaag, Myrmoteras, Mguidoti, Qgroom, Tdikow, Succu: @Rdmpage, TiagoLubiana, Christian Ferrer, Egon Willighagen, Daniel Mietchen, Jura1: @YULdigitalpreservation: WikiProject Taxonomy has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. Done as taxonomic treatment (P10594).