-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Response to reviewers - treedata.table
About
In this document, we address all the comments raised during the review phase of treedata.table in rOpenSci. When possible, we use a single commit to answer each of the comments.
Where are changes implemented?
All the changes listed below are implemented in the following branch of our GitHub repository:
https://github.com/uyedaj/treedata.table/tree/cristian
Acknowledgements
We thank two @Bisaloo (reviewer), @karinorman (reviewer), @jooolia (editor) for their comments!
First reviewer (@Bisaloo):
- Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("
rev" role) in the packageDESCRIPTIONfile.
- It's recommended practice to add an
ORCiDfor the authors if they have one.
- Please add a contributing guide.
- As far as I know, the Type field in the
DESCRIPTIONis not standard and in all cases unnecessary
- The package
Titleis not actually using title case. I don't care much personally but in my experience, this can cause issues when you submit to CRAN. You can verify this with thetools::toTitleCase()function.
- Please avoid as much as possible to use
Dependswhen you can useImports. Here is a relevant quote from Hadley Wickham and Jenny Bryan's book on R package development: Depends: Prior to the rollout of namespaces in R 2.14.0, Depends was the only way to “depend” on another package. Now, despite the name, you should almost always use Imports, not Depends. You’ll learn why, and when you should still use Depends, in namespaces.
- According to the CRAN guidelines for MIT licenses, your previous version of
LICENSEwas correct (beforeuyedaj/treedata.table@7764ef2). You just need to update to name of the copyright holder(s).
- The
READMEdoes not contain enough information about the package, how to install and use it. You may find this relevant chapter of rOpenSci's devguide useful.
- Since this package is presented as addressing a performance need, it would be very nice to see some benchmarks to back this up. You can use the microbenchmark package for this.
- (optional) it can be nice to add more badges to your
READMEto advertise the fact that you follow the current best practices in package development. In particular, I like badges with code coverage. Here are the instructions on how to do this withcodecov.
- Please avoid multiline instructions (separated by
;). It makes the code harder the read. This will also help with the long lines warning reported by goodpratice in @jooolia's comment:
- When you perform tests on objects that are already booleans (
TRUE/FALSE), you don't need towrite == TRUE.
- Class testing should be done with
inherits()(forS3objects, which is the case for ape objects), oris()(forS4objects). Please see this blog post by Martin Maechler for more info.
data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
p <- pull.treedata.table(td, type = "phy")
d <- pull.treedata.table(td, type = "dat")
td2 <- as.treedata.table(p, d)
# You would expect to get identical(td, td2) but this line actually throws an error
as.treedata.table uses a data.frame as input:
data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
p <- pull.treedata.table(td, type = "phy")
d <- pull.treedata.table(td, type = "dat")
td2 <- as.treedata.table(p, as.data.frame(d))
> identical(td,td2)
[1] TRUE
- Regarding your replacement of
1:length()byseq_len()inuyedaj/treedata.table@7764ef2, the correct syntax ofseq_len()doesn't use1:.
700e0ed
19a777c
9779592
96f84da
- Communicate information to users via the
message()function instead ofcat(). This will always surely cause issues upon CRAN submission if not corrected. See this detailed SO answer for more info about the differences betweenmessage()andcat().
- On a related note, you don't need to use
paste0()inmessage()calls. Themessage()will automatically concatenate the arguments you give it.
- There is a problem in
as.treedata.table()with the name_column argument not working. No matter what the user enters in name_column or what the auto-detection code finds, you always use the first column fortip.labels.
- This behaviour should be tested in tests.
- (optional) it would be nice to have a message indicating which column was auto-detected as containing the tip.labels when
name_column = "detect"inas.treedata.table()
- (optional) uniformise the indents in your code. You sometimes use tabs and sometimes spaces. This can be fixed by running
styler::style_pkg()in the root of your RStudio project.
d860070
d4cffa5
cd0c945
016bc06
bfb2572
- As far as I can tell, the repeatsAsDiscrete is not used in
detectCharacterType()ordetectAllCharacters()
- This line will cause wrong identifications since the conversion of a data.frame to a matrix forces all elements to the same type.
- Regarding my later point about tests, this would make a good test: ensuring that you find the correct number of
discrete/continuouscharacters in the anolis dataset.
- I think you missed some
1:instances from @jooolia's comment. E.g.,
- It may be my unfamiliarity with data.table but I don't understand the warning/prompt in
droptreedata.table(). As far as I can tell, the original data is NOT modified. Using your example:
data(anolis)
td <- as.treedata.table(anolis$phy, anolis$dat)
td_old <- td
td_new <- droptreedata.table(tdObject = td, taxa = c("chamaeleonides" ,"eugenegrahami"))
identical(td, td_old)
#> TRUE
identical(td, td_new)
#> FALSE
droptreedata.table() is used to remove species from a treedata.table object. In your example, 2 species (chamaeleonides and eugenegrahami) are being excluded from the treedata.table object:
> data(anolis)
> td <- as.treedata.table(anolis$phy, anolis$dat)
Tip labels detected in column: X
Phylo object detected
No tips were dropped from the original tree/dataset
> td_old <- td
> td_new <- droptreedata.table(tdObject = td, taxa = c("chamaeleonides" ,"eugenegrahami"))
Please confirm that you would like to make changes to the ORIGINAL data?
Type: (1) YES, (2) NO: 1
> nrow(td_new$dat)
[1] 98
> nrow(td_old$dat)
[1] 100
The message was used to inform the user that n taxa are being dropped from the dataset. We have removed this prompt from the revised version. Regarding your tests, (1)identical(td, td_old) must be TRUE because these two are duplicates (td_old <- td). However, (2) identical(td, td_new) must be FALSE given two taxa were dropped from the latter object (please see the difference in row numbers between both objects)
- The dots handling in
extractVector()may probably be simplified withlazyevalsince you already have it as a dependency
- It would be useful to add a
match.arg(type, c("dat", "phy"))in pull.treedata.table()
- In
tdt, do you really need...? Wouldn't anFUNarg be sufficient?
Although we fully agree with the reviewer, both approaches are essentially doing the same. We decided to keep the ellipsis.
- Since you provide a head() method for
treedata.table, it would be nice to have atail()method as well
-
paste()s are unnecessary here sincecat()
- The output of
summary.treedata.table()is slightly confusing in my opinion. On the example you provide (with anolis), where no taxa are dropped from the tree or the data, you get:
We changed the message to “Taxa dropped from the tree/data”:
> data(anolis)
> td <- as.treedata.table(anolis$phy, anolis$dat)
Tip labels detected in column: X
Phylo object detected
No tips were dropped from the original tree/dataset
> summary(td)
A treedata.table object
The dataset contains 11 traits
Continuous traits: tip.label, SVL, PCI_limbs, PCII_head, PCIII_padwidth_vs_tail, PCIV_lamella_num, awesomeness, hostility, attitude
Discrete traits: ecomorph, island
The following traits have missing values: 0
Taxa dropped from the tree: 0
Taxa dropped from the data: 0
We also modified the droptreedata.table() function to keep track of the dropped species in the new object:
> td_new <- droptreedata.table(tdObject = td, taxa = c("chamaeleonides" ,"eugenegrahami"))
2 taxa were dropped from the ORIGINAL treedata.table object
> summary(td_new)
A treedata.table object
The dataset contains 11 traits
Continuous traits: tip.label, SVL, PCI_limbs, PCII_head, PCIII_padwidth_vs_tail, PCIV_lamella_num, awesomeness, hostility, attitude
Discrete traits: ecomorph, island
The following traits have missing values: 0
Taxa dropped from the tree: chamaeleonides, eugenegrahami
Taxa dropped from the data: chamaeleonides, eugenegrahami
292295c#diff-262ada0e7c14967a2a41b70149c367fb
- When I run examples from
[.treedata.table(), I get the following warning:
Warning message:
In 1:seq_along(x$dat) :
numerical expression has 11 elements: only the first used
We cannot replicate this warning in the latest version of the package. It was probably fixed in a previous commit!
- (optional) your tests are not compatible with the upcoming v3 of the testthat package. In particular, you could replace the (soon to be) deprecated
expect_is()function byexpect_s3_class(). Excepted these two lines
We will keep the current functions for now but we thank the reviewer for this comment.
- I think the test coverage needs to be increased a lot, as shown by the bugs uncovered during this review. I sometimes noted what would be good candidates for tests. As a good starting point, you can also look at which lines are not covered by your tests on
codecov: https://codecov.io/github/uyedaj/treedata.table?branch=master. I see no technical limitations that would prevent you to reach 100% coverage for this package but for now, I think you should at least try to reach 80% coverage.
The current version has >80% of coverage
- (optional) you may want to use markdown syntax in your
Roxygencomments. This produces more readable documentation in the source file in my opinion. Automatic conversion of your currentRoxygencomments to markdown can be done with theroxygen2mdpackage.
Please add a sentence for the name_column argument to explain that "detect" (the default) will auto-detect this column:
- In
as.treedata.table(), you declare but this is unnecessary since you don't usesetDT()here andas.data.table()is prefixed with thedata.table::namespace
- I'm not sure what the first word means in your @return
roxygencomments. I think it will be less confusing if you remove these(?)
- (optional) you can use the @inheritParams
detectCharacterTyperoxygencomment in the documentation ofdetectAllCharacters()to avoid duplicating the documentation of the function arguments. This is useful because it ensures the documentation of these functions will always stay in sync in the future. No risk of forgetting to update one of the two!
We chose to leave all the parameters in both of the functions. We thank the reviewer for his comment.
- (optional) I find it useful to explicitly state which is default for all arguments (when it exists) in the documentation. E.g., I would change it to something like
#'@param returnType Either discrete (the default) or continuous
- Please expand the documentation of the
hasNames()/forceNames()functions by providing a short explanation of the function purpose and/or a @returnroxygencomment.
- The documentation fo
extractVector()should be updated to indicate that multiple column names can be passed to..., which means you don't necessarily get a named vector but a list of named vectors, as opposed to what the documentation says.
- I think the function name
pull.treedata.tableis slightly confusing as it sounds like aS3and it's not. Something along the lines ofpull_treedata.table()orpulldata.data.table()(you already have adropdatatree.table()function so this would make sense) or???would probably be better.
- Shouldn't this just be "
If negative, all but the n last rows of x" (i.e., remove "first"):
We have completely modified our previous head() function.
- This doesn't seem correct. The dots are ignored in
head.treedata.table()as far as I can tell
We have completely modified our previous head() function.
- To fix the
NOTEin R CMD check, you need toimportFrom(utils,head)before you try to define another method for this generic. This can be done by adding a #' @importFrom utils headroxygencomment in theroxygenchunk ofhead.treedata.table()for example.
- Please update the list of authors in the vignette
- (optional) it may useful for users to mention the differences between
[[.treedata.table()andextractVector(), namely that[[.treedata.table()has an extra exact argument to enable partial match while extractVector() can extract multiple columns and accepts non-standard evaluation
- If you remove the "
[1] FALSE FALSE" (which are actually output, and not R code) from this chunk, you will be able to remove theeval = FALSE. It's awlays better when all chunks run and can be copied/pasted in the console directly
Second Reviewer (@karinorman):
- The warnings for co-indexing calls (e.g.
td[, head(.SD, 1), by = "ecomorph"]from the vignette don't seem particularly informative, I would suppress unless necessary.
We cannot replicate this warning in the latest version of the package. It seems to be fixed now!
- I think having the user confirm the changes for
droptreedata.tableis unnecessary. For me at least it made me assume that the function was modifying an object in place, which I realized it wasn't after some exploration.
- Consider changing the naming convention to *.td.table rather than *.
treedata.tablefor the sake of brevity. You could also add a td to other exported functions (e.g. td.extractVector).
We thank the reviewer very much but we have decided to keep the current names.
- The
detectCharacterfunctions,filterMatrix,forceNames, andhasNamesseem like they could be helper functions that facilitate the other main functions, but maybe don't need to be exported. If it makes sense I would not export them. If not, a little more explanation of how they fit into a workflow in the vignette would be helpful.
We have added more details to these functions in the vignette:
-
forceNamesandhasNamescould use more detailed description and/or justification, I'm not sure their functionality even after running the example.data(anolis)andforceNames(anolis$dat, "row")return seemingly the same object.
- It could be useful to make dropping tips or dataframe entries optional when matching trees/dataframes instead of automatically dropping from either to match, so that the user has the option to preserve all data.
We acknowledge that this could be a reasonable functionality of our package but this would involve major changes to the main idea behind of the package – to always match the the dataset and tree(s).
-
filterMatrixwould be cleaner ifcharTypewasn't required as an argument but instead calculated within the function. I have a hard time imagining a scenario in which you would want a vector of character types that didn't match the matrix you were already giving the function.
- I'm curious about the
pull.treedata.tablefunction. Is there some utitlity in having a function that mimics the$operator? Or maybe I'm missing some of the applications of this function?
We acknowledge this function may be redundant to the $ operator. However, it provides an explicit way to extract either of the objects. The vignette provides additional details on its functionality.
- Please explain the definition of continuous or discrete in the descriptions of
detectCharacterType,detectCharacterChanges, andfilterMatrix.
- I would change the language around "
character" which is a specific object type, whereas this function appears to perform on multiple vectortypes.detectVectorTypeordetectColumnTypemay be more intuitive.
'Character' is tricky here, as it is both R terminology, but also referring to the biological concept of a character (i.e., a quantification of some aspect of organismal form). We preferred to keep it as is to capture the biological meaning of this term.
- There is strange behavior in the examples for these functions. For example
detectCharacterType(anolis$dat[,1])returns "discrete", butdetectAllCharacters(anolis$dat[,1:3])returns three "continuous" entries. From my understanding of how the functions work I would expect the first entry to be "discrete" to matchdetectCharacterType(anolis$dat[,1]).
We cannot replicate this behavior in the latest version of the package. This was probably addressed in a previous commit:
> detectCharacterType(anolis$dat[,1])
[1] "discrete"
> detectAllCharacters(anolis$dat[,1:3])
[1] "discrete" "continuous" "continuous"