Data Wrangling in R PDF
Data Wrangling in R PDF
Data Wrangling in R
Ben Best [email protected]
MBON Pole to Pole Brazil Workshop
2018-08-07
slides: bit.ly/r-wrangle-for-p2p
Motivation
● MBON Pole to Pole: develop a “Community of Practice”
○ Best practices
○ Common tools
● Manipulate (ie “wrangle”) data to:
○ Check quality
○ Analyze
○ Visualize
○ Publish
■ eg OBIS DarwinCore slides: bit.ly/r-wrangle-for-p2p 2
How many of you have used these?
1. R
scientific programming language
2. RStudio
integrated development environment (IDE)
3. dplyr
R package for grammar of data manipulation
4. rmarkdown
authoring framework for data science
5. git
version control system
6. Github
web hosting service for git to collaborate slides: bit.ly/r-wrangle-for-p2p 3
For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights
nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
2. Compose simple
functions with the
pipe Fall into the
Pit of
3. Embrace functional
Success
programming
slides: bit.ly/r-wrangle-for-p2p 6
Tidyverse process & packages (unofficial in italics)
Program
Import Visualize
readr, readxl purrr
ggplot2
DBI glue
ggraph
jsonlite, httr rlang
ggmap
googledrive fs
htmlwidgets:
devtools, usethis
plotly
Tidy Transform leaflet Model
tibble dplyr dygraphs broom
dplyr forcats modelr
tidyr lubridate
Spatial hms
rgdal stringr
sp -> sf Communicate
raster knitr, rmarkdown
DT
shiny slides: bit.ly/r-wrangle-for-p2p 7
Which package do I use and how?
CRAN Task Views
cran.r-project.org/web/views
Cheat Sheets
rstudio.com/resources/cheatsheets
● RStudio IDE
● R Markdown
● Data Import (tidyr)
● Data Transformation (dplyr) slides: bit.ly/r-wrangle-for-p2p 8
Data Import cheat sheet
Tidy data organizes tabular data (as long) for use across R packages:
slides: bit.ly/r-wrangle-for-p2p 9
Data Transformation cheat sheet
dplyr is a grammar of data manipulation, providing a consistent set of verbs that
help you solve the most common data manipulation challenges. (works w/ db’s)
Examples:
● bbest.github.io/ioos-bio-tidyr
● marinebon.github.io/info-intertidal
● marinebon.github.io/sbc-datasets
● bbest.github.io/bbest/p2p-demo-1
slides: bit.ly/r-wrangle-for-p2p 12