eider is an R package for extracting machine learning features from tabular data, in particular health records, in a declarative manner.
Features are specified as JSON objects which contain all the necessary
information required to perform a given calculation. For example, the
following calculates the number of total rows per patient id in the
table labelled ae2 (details on how to specify this table are in the
function documentation).
{
"source_table": "ae2",
"transformation_type": "COUNT",
"grouping_column": "id",
"absent_default_value": 0,
"output_feature_name": "total_ae_attendances"
}The output of this is a column named total_ae_attendances, containing
the number of rows per patient, and with a value of 0 for any patients
who do not appear in the ae2 table.
This declarative approach provides an alternative to traditional,
imperative-style, dplyr pipelines which can be more difficult to
reason about, especially when a series of features is being extracted
and merged together. As features are specified without reference to a
specific programming language or paradigm, it also encourages code that
is concise, easy to read, and maintainable.
eider is a collaboration between The Alan Turing Institute, Public
Health Scotland, and the Universities of Edinburgh and Durham. It grew
out of a desire to generalise the feature extraction process for health
data, specifically the SPARRA (Scottish Patients At Risk of
Readmission and Admission)
project (GitHub
repo), and to allow similar
analyses to be carried out in different contexts.
Install via CRAN:
install.packages("eider")Alternatively, install eider from its source code on
GitHub using:
install.packages("devtools")
devtools::install_github("alan-turing-institute/eider", build_vignettes = TRUE)The package documentation is available online. In particular, the package articles contain a series of vignettes which provide detailed guidance on the package and its features.
If you are making changes to the library itself, first clone the repository:
git clone git@github.com:alan-turing-institute/eider.git
You will need to install the lintr, pkgdown, devtools R packages
to build documentation, run tests, and lint. Then, from the repository
root, you can use the following commands:
make docgenerates all function documentation, and also generates theREADME.mdfile fromREADME.rmdmake lintlints the project directorymake testruns all tests
You can also use pre-commit to run all of
these before committing, to ensure that you do not commit incomplete
code. Firstly, install pre-commit according to the instructions on the
webpage above. Then run pre-commit install.
What about vignettes? Well, building vignettes is slightly more
complicated. You can perform a one-time build from the R console using
pkgdown::build_site(), but running this every time you edit a file
gets tiring quickly. To automate this, first install the package with
make install, and install a working version of Python and also
entr (the latter is available on
Homebrew via brew install entr). Then run make vig: this will
monitor your vignette RMarkdown files, rebuild the vignettes any time
they are changed, and launch a HTTP server on port 8000 to view the
files. If you change any library code you will have to run
make install again before rerunning make vig.
To release a new version of the package, first update the version number by running the following in the R shell. The following will prompt you for a new version number (major, minor, or patch), which you can choose according to semantic versioning considerations:
usethis::use_version()Make sure that CRAN checks pass with this:
devtools::check(remote = TRUE, manual = TRUE)It's probably also good to make sure that CI is passing on GitHub. When you're ready, commit and push to GitHub. Then:
devtools::submit_cran()