Skip to content

Reviews at rOpenSci #1

@cromanpa94

Description

@cromanpa94

Response to reviewers - treedata.table

About

In this document, we address all the comments raised during the review phase of treedata.table in rOpenSci. When possible, we use a single commit to answer each of the comments.

Where are changes implemented?

All the changes listed below are implemented in the following branch of our GitHub repository:

https://github.com/uyedaj/treedata.table/tree/cristian

Acknowledgements

We thank two @Bisaloo (reviewer), @karinorman (reviewer), @jooolia (editor) for their comments!


First reviewer (@Bisaloo):

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

44cceda

  • It's recommended practice to add an ORCiD for the authors if they have one.

abfb2cb
b76f4a8

  • Please add a contributing guide.

a9a7bde
16b6fbe

  • As far as I know, the Type field in the DESCRIPTION is not standard and in all cases unnecessary

07746cf

  • The package Title is not actually using title case. I don't care much personally but in my experience, this can cause issues when you submit to CRAN. You can verify this with the tools::toTitleCase() function.

ff610ad

  • Please avoid as much as possible to use Depends when you can use Imports. Here is a relevant quote from Hadley Wickham and Jenny Bryan's book on R package development: Depends: Prior to the rollout of namespaces in R 2.14.0, Depends was the only way to “depend” on another package. Now, despite the name, you should almost always use Imports, not Depends. You’ll learn why, and when you should still use Depends, in namespaces.

3d43122

  • According to the CRAN guidelines for MIT licenses, your previous version of LICENSE was correct (before uyedaj/treedata.table@7764ef2). You just need to update to name of the copyright holder(s).

8b6250e

  • The README does not contain enough information about the package, how to install and use it. You may find this relevant chapter of rOpenSci's devguide useful.

9cd075a

  • Since this package is presented as addressing a performance need, it would be very nice to see some benchmarks to back this up. You can use the microbenchmark package for this.

3bd6574

  • (optional) it can be nice to add more badges to your README to advertise the fact that you follow the current best practices in package development. In particular, I like badges with code coverage. Here are the instructions on how to do this with codecov.

b38d88a

  • Please avoid multiline instructions (separated by ;). It makes the code harder the read. This will also help with the long lines warning reported by goodpratice in @jooolia's comment:

59872e5
748ddde

  • When you perform tests on objects that are already booleans (TRUE/FALSE), you don't need to write == TRUE.

9afe310

  • Class testing should be done with inherits() (for S3 objects, which is the case for ape objects), or is() (for S4 objects). Please see this blog post by Martin Maechler for more info.
data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
p <- pull.treedata.table(td, type = "phy")
d <- pull.treedata.table(td, type = "dat")
td2 <- as.treedata.table(p, d)
# You would expect to get identical(td, td2) but this line actually throws an error

as.treedata.table uses a data.frame as input:

data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
p <- pull.treedata.table(td, type = "phy")
d <- pull.treedata.table(td, type = "dat")
td2 <- as.treedata.table(p, as.data.frame(d))
> identical(td,td2)
[1] TRUE

1b5c188

  • Regarding your replacement of 1:length() by seq_len() in uyedaj/treedata.table@7764ef2, the correct syntax of seq_len() doesn't use 1:.

700e0ed
19a777c
9779592
96f84da

  • Communicate information to users via the message() function instead of cat(). This will always surely cause issues upon CRAN submission if not corrected. See this detailed SO answer for more info about the differences between message() and cat().

b99ea3c

  • On a related note, you don't need to use paste0() in message() calls. The message() will automatically concatenate the arguments you give it.

c7ebb3b

  • There is a problem in as.treedata.table() with the name_column argument not working. No matter what the user enters in name_column or what the auto-detection code finds, you always use the first column for tip.labels.

cb04cf5

  • This behaviour should be tested in tests.

e00fef9

  • (optional) it would be nice to have a message indicating which column was auto-detected as containing the tip.labels when name_column = "detect" in as.treedata.table()

5b1951b

  • (optional) uniformise the indents in your code. You sometimes use tabs and sometimes spaces. This can be fixed by running styler::style_pkg() in the root of your RStudio project.

d860070
d4cffa5
cd0c945
016bc06
bfb2572

  • As far as I can tell, the repeatsAsDiscrete is not used in detectCharacterType() or detectAllCharacters()

79cfcb4

  • This line will cause wrong identifications since the conversion of a data.frame to a matrix forces all elements to the same type.

8d4b9e1

  • Regarding my later point about tests, this would make a good test: ensuring that you find the correct number of discrete/continuous characters in the anolis dataset.

3cd17b9
4752e44

  • I think you missed some 1: instances from @jooolia's comment. E.g.,

19a777c
9779592
96f84da

  • It may be my unfamiliarity with data.table but I don't understand the warning/prompt in droptreedata.table(). As far as I can tell, the original data is NOT modified. Using your example:
data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
td_old <- td
td_new <- droptreedata.table(tdObject = td, taxa = c("chamaeleonides" ,"eugenegrahami"))

identical(td, td_old)
#> TRUE

identical(td, td_new)
#> FALSE

droptreedata.table() is used to remove species from a treedata.table object. In your example, 2 species (chamaeleonides and eugenegrahami) are being excluded from the treedata.table object:

> data(anolis)
> td <- as.treedata.table(anolis$phy, anolis$dat)
Tip labels detected in column: X
Phylo object detected

No tips were dropped from the original tree/dataset
> td_old <- td
> td_new <- droptreedata.table(tdObject = td, taxa = c("chamaeleonides" ,"eugenegrahami"))
Please confirm that you would like to make changes to the ORIGINAL data?
Type: (1) YES, (2) NO: 1
> nrow(td_new$dat)
[1] 98
> nrow(td_old$dat)
[1] 100

The message was used to inform the user that n taxa are being dropped from the dataset. We have removed this prompt from the revised version. Regarding your tests, (1)identical(td, td_old) must be TRUE because these two are duplicates (td_old <- td). However, (2) identical(td, td_new) must be FALSE given two taxa were dropped from the latter object (please see the difference in row numbers between both objects)

a6d0b60

  • The dots handling in extractVector() may probably be simplified with lazyeval since you already have it as a dependency

63a2935

  • It would be useful to add a match.arg(type, c("dat", "phy")) in pull.treedata.table()

9408ac3

  • In tdt, do you really need ...? Wouldn't an FUN arg be sufficient?

Although we fully agree with the reviewer, both approaches are essentially doing the same. We decided to keep the ellipsis.

  • Since you provide a head() method for treedata.table, it would be nice to have a tail() method as well

8fbaaa6
9b62811

  • paste()s are unnecessary here since cat()

c7ebb3b

  • The output of summary.treedata.table() is slightly confusing in my opinion. On the example you provide (with anolis), where no taxa are dropped from the tree or the data, you get:

We changed the message to “Taxa dropped from the tree/data”:

> data(anolis)
> td <- as.treedata.table(anolis$phy, anolis$dat)
Tip labels detected in column: X
Phylo object detected
No tips were dropped from the original tree/dataset
> summary(td)
A treedata.table object
The dataset contains 11 traits
Continuous traits: tip.label, SVL, PCI_limbs, PCII_head, PCIII_padwidth_vs_tail, PCIV_lamella_num, awesomeness, hostility, attitude
Discrete traits: ecomorph, island
The following traits have missing values: 0
Taxa dropped from the tree:  0
Taxa dropped from the data:  0

We also modified the droptreedata.table() function to keep track of the dropped species in the new object:

> td_new <- droptreedata.table(tdObject = td, taxa = c("chamaeleonides" ,"eugenegrahami"))
2 taxa were dropped from the ORIGINAL treedata.table object
> summary(td_new)
A treedata.table object
The dataset contains 11 traits
Continuous traits: tip.label, SVL, PCI_limbs, PCII_head, PCIII_padwidth_vs_tail, PCIV_lamella_num, awesomeness, hostility, attitude
Discrete traits: ecomorph, island
The following traits have missing values: 0
Taxa dropped from the tree: chamaeleonides, eugenegrahami
Taxa dropped from the data: chamaeleonides, eugenegrahami

292295c#diff-262ada0e7c14967a2a41b70149c367fb

  • When I run examples from [.treedata.table(), I get the following warning:
Warning message:
In 1:seq_along(x$dat) :
numerical expression has 11 elements: only the first used

We cannot replicate this warning in the latest version of the package. It was probably fixed in a previous commit!

  • (optional) your tests are not compatible with the upcoming v3 of the testthat package. In particular, you could replace the (soon to be) deprecated expect_is() function by expect_s3_class(). Excepted these two lines

We will keep the current functions for now but we thank the reviewer for this comment.

  • I think the test coverage needs to be increased a lot, as shown by the bugs uncovered during this review. I sometimes noted what would be good candidates for tests. As a good starting point, you can also look at which lines are not covered by your tests on codecov: https://codecov.io/github/uyedaj/treedata.table?branch=master. I see no technical limitations that would prevent you to reach 100% coverage for this package but for now, I think you should at least try to reach 80% coverage.

The current version has >80% of coverage

ab53e26

  • (optional) you may want to use markdown syntax in your Roxygen comments. This produces more readable documentation in the source file in my opinion. Automatic conversion of your current Roxygen comments to markdown can be done with the roxygen2md package.
    Please add a sentence for the name_column argument to explain that "detect" (the default) will auto-detect this column:

b5f19cb

  • In as.treedata.table(), you declare but this is unnecessary since you don't use setDT() here and as.data.table() is prefixed with the data.table:: namespace

92bf3c8

  • I'm not sure what the first word means in your @return roxygen comments. I think it will be less confusing if you remove these(?)

10354e2

  • (optional) you can use the @inheritParams detectCharacterType roxygen comment in the documentation of detectAllCharacters() to avoid duplicating the documentation of the function arguments. This is useful because it ensures the documentation of these functions will always stay in sync in the future. No risk of forgetting to update one of the two!

We chose to leave all the parameters in both of the functions. We thank the reviewer for his comment.

  • (optional) I find it useful to explicitly state which is default for all arguments (when it exists) in the documentation. E.g., I would change it to something like #' @param returnType Either discrete (the default) or continuous

fa8fd8f

  • Please expand the documentation of the hasNames()/forceNames() functions by providing a short explanation of the function purpose and/or a @return roxygen comment.

2b62d02
3acb5cc

  • The documentation fo extractVector() should be updated to indicate that multiple column names can be passed to ..., which means you don't necessarily get a named vector but a list of named vectors, as opposed to what the documentation says.

3acb5cc
0f31ded
42491d2

  • I think the function name pull.treedata.table is slightly confusing as it sounds like a S3 and it's not. Something along the lines of pull_treedata.table() or pulldata.data.table() (you already have a dropdatatree.table() function so this would make sense) or ??? would probably be better.

cd8fc93

  • Shouldn't this just be "If negative, all but the n last rows of x" (i.e., remove "first"):

We have completely modified our previous head() function.

500648a

  • This doesn't seem correct. The dots are ignored in head.treedata.table() as far as I can tell

We have completely modified our previous head() function.

500648a

  • To fix the NOTE in R CMD check, you need to importFrom(utils,head) before you try to define another method for this generic. This can be done by adding a #' @importFrom utils head roxygen comment in the roxygen chunk of head.treedata.table() for example.

500648a

  • Please update the list of authors in the vignette

07d2178

  • (optional) it may useful for users to mention the differences between [[.treedata.table() and extractVector(), namely that [[.treedata.table() has an extra exact argument to enable partial match while extractVector() can extract multiple columns and accepts non-standard evaluation

07d2178

  • If you remove the "[1] FALSE FALSE" (which are actually output, and not R code) from this chunk, you will be able to remove the eval = FALSE. It's awlays better when all chunks run and can be copied/pasted in the console directly

07d2178


Second Reviewer (@karinorman):

  • The warnings for co-indexing calls (e.g. td[, head(.SD, 1), by = "ecomorph"] from the vignette don't seem particularly informative, I would suppress unless necessary.

We cannot replicate this warning in the latest version of the package. It seems to be fixed now!

  • I think having the user confirm the changes for droptreedata.table is unnecessary. For me at least it made me assume that the function was modifying an object in place, which I realized it wasn't after some exploration.

a6d0b60

  • Consider changing the naming convention to *.td.table rather than *.treedata.table for the sake of brevity. You could also add a td to other exported functions (e.g. td.extractVector).

We thank the reviewer very much but we have decided to keep the current names.

  • The detectCharacter functions, filterMatrix, forceNames, and hasNames seem like they could be helper functions that facilitate the other main functions, but maybe don't need to be exported. If it makes sense I would not export them. If not, a little more explanation of how they fit into a workflow in the vignette would be helpful.

We have added more details to these functions in the vignette:

9988f78

  • forceNames and hasNames could use more detailed description and/or justification, I'm not sure their functionality even after running the example. data(anolis) and forceNames(anolis$dat, "row") return seemingly the same object.

2b62d02
3acb5cc
9988f78

  • It could be useful to make dropping tips or dataframe entries optional when matching trees/dataframes instead of automatically dropping from either to match, so that the user has the option to preserve all data.

We acknowledge that this could be a reasonable functionality of our package but this would involve major changes to the main idea behind of the package – to always match the the dataset and tree(s).

  • filterMatrix would be cleaner if charType wasn't required as an argument but instead calculated within the function. I have a hard time imagining a scenario in which you would want a vector of character types that didn't match the matrix you were already giving the function.

3253251

  • I'm curious about the pull.treedata.table function. Is there some utitlity in having a function that mimics the $ operator? Or maybe I'm missing some of the applications of this function?

We acknowledge this function may be redundant to the $ operator. However, it provides an explicit way to extract either of the objects. The vignette provides additional details on its functionality.

  • Please explain the definition of continuous or discrete in the descriptions of detectCharacterType, detectCharacterChanges, and filterMatrix.

ca94642

  • I would change the language around "character" which is a specific object type, whereas this function appears to perform on multiple vector types.detectVectorType or detectColumnType may be more intuitive.

'Character' is tricky here, as it is both R terminology, but also referring to the biological concept of a character (i.e., a quantification of some aspect of organismal form). We preferred to keep it as is to capture the biological meaning of this term.

  • There is strange behavior in the examples for these functions. For example detectCharacterType(anolis$dat[,1]) returns "discrete", but detectAllCharacters(anolis$dat[,1:3]) returns three "continuous" entries. From my understanding of how the functions work I would expect the first entry to be "discrete" to match detectCharacterType(anolis$dat[,1]).

We cannot replicate this behavior in the latest version of the package. This was probably addressed in a previous commit:

> detectCharacterType(anolis$dat[,1])
[1] "discrete"
> detectAllCharacters(anolis$dat[,1:3])
[1] "discrete"   "continuous" "continuous"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions