Data_Science_With_R_Workflow
Data_Science_With_R_Workflow
Workflow
The Data Science With R Workflow is available in the book: R
For Data Science. If you want to learn R and this
workflow for business analysis, take the R For Business
Analysis (DS4B 101-R) course through Business Science
University. Click the links for
Documentation
ggplot2 (CS)
readr (CS)
readxl / writexl tibble (CS) RMarkdown (CS)
Model
odbc / DBI tidyr (CS) Shiny (CS)
rvest
recipes broom
rsample yardstick
RStudio IDE (CS) fs (file system) parsnip dials
CS = Cheat Sheet
Important Resources
R For Data Science Book: http://r4ds.had.co.nz/
Rmarkdown Book: https://bookdown.org/yihui/rmarkdown/
Data Visualization Book: https://rkabacoff.github.io/datavis/
More Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
tidyverse packages: https://www.tidyverse.org/
Connecting to databases: https://db.rstudio.com/
RMarkdown website: https://rmarkdown.rstudio.com/
Shiny web applications website: http://shiny.rstudio.com/
Jenny Bryan's purrr tutorial: https://jennybryan.org/
Business Science University
"Data Science Education for the Enterprise" university.businessscience.io
version: 1.0
Data Science with Text Analysis & NLP Machine Learning
Special Topics Multi-Threaded/Scalable/Production ML:
Text Mining with R (Book): tidytext
NLP: H2O (CS)
H2O word2vec: Word embeddings Extreme Gradient Boosting: xgboost
text2vec: fast vectorization, topic modeling R + Spark: sparklyr (CS)
udpipe: UDPipe C++ lib in R Sparkling Water (Spark + H2O): rsparkling
Time Series Analysis ML (Tidy): parsnip
ML: caret (CS)
Time-aware tibbles: tibbletime & tsibble
Convert between classes: timetk & tsbox Network Analysis
Time Series Index Summary: timetk Deep Learning
Generating Future Series: timetk Network Data Transformations (Tidy): tidygraph
Network Data Transformations: igraph R Interface to TensorFlow Homepage:
Keras (CS)
Forecasting Network Viz TF Estimators
TensorFlow (Core)
ARIMA, ETS, etc: forecast & fable
Static:
Tidy, glance, augment for forecast models: sweep
ggraph - Graph plotting utilities for ggplot2
Converting forecast prediction to tibble: sweep
Interactive (JavaScript):
networkD3 - D3 Networks in R
plotly - plotly.js (network graphs) in R Speed & Scale
Anomaly Detection
Fastest Single-Node Speed: data.table (CS)
Identify anomalies: anomalize Distributed Cluster (Spark): sparklyr (CS)
Geospatial Analysis
Geocoding (getting lat/long, bboxes, & sf's):
Interoperability
ggmap - Google API (requires key)
Python: reticulate
Financial Analysis osmdata - OpenStreet Overpass API
tmaptools - OpenStreet Nominatum API
C++: Rcpp
Java: rJava