0% found this document useful (0 votes)
7 views

R Studio Reference Sheets Compilation

Uploaded by

Ahmed Abouraia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

R Studio Reference Sheets Compilation

Uploaded by

Ahmed Abouraia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

R and R Studio Reference Guides and "Cheat Sheet" Compilation

The following is a set of reference guides and "cheat sheets" that has been gathered from various
resources on the Internet. All of these sheets are used for academic purposes and should not be
reproduced, distributed, or copied without permission from the sheet designer / author (s). These are
intended as shortcuts and aides for the use of R and R Studio, along with R Script, R Markdown, and
some of the most popular R packages.

2 to 6 • R Markdown Reference Guide | rmarkdown.rstudio.com | updated 2014-10


7 to 8 • Base R Cheat Sheet | rstudio.com | updated 2015-03
9 to 10 • Data Import with tidyverse Cheat Sheet | readr.tidyverse.org | updated 2021-08
11 to 12 • rmarkdown Cheat Sheet | rmarkdown.rstudio.com | updated 2021-08
13 to 14 • R Studio IDE Cheat Sheet | rstudio.com | updated 2021-07
15 to 16 • Data transformation with dplyr Cheat Sheet | dplyr.tidyverse.org | updated 2021-07
17 to 18 • Data tidying with dplyr Cheat Sheet | dplyr.tidyverse.org | Updated 2021-07
19 to 20 • Data visualization with ggplot Cheat Sheet | ggplot2.tidyverse.org | Updated 2021-08
21 to 22 • How Big is Your Graph | https://www.rstudio.com/resources/cheatsheets/ | 2017-07
23 to 24 • R Syntax Comparison Cheat Sheet | science/smith.edu/~amcnamara | Updated 2018-01
25 to 26 • Tabular reporting with flextable Cheat Sheet | ardata-fr.github.io/flextable-book/ | 2021-03
R Markdown Reference Guide
Learn more about R Markdown at rmarkdown.rstudio.com
Contents:
1. Markdown Syntax
Learn more about Interactive Docs at shiny.rstudio.com/articles
2. Knitr chunk options
3. Pandoc options

Syntax Becomes
Plain text

End a line with two spaces


to start a new paragraph.

*italics* and _italics_

**bold** and __bold__

superscript^2^

~~strikethrough~~

[link](www.rstudio.com)

# Header 1

## Header 2

### Header 3

#### Header 4

##### Header 5

###### Header 6

endash: --

emdash: ---

ellipsis: ...

inline equation: $A = \pi*r^{2}$

image: ![](path/to/smallorb.png)

horizontal rule (or slide break):

***

> block quote

* unordered list
* item 2
+ sub-item 1
+ sub-item 2

1. ordered list
2. item 2
+ sub-item 1
+ sub-item 2

Table Header | Second Header


------------- | -------------
Table Cell | Cell 2
Cell 3 | Cell 4

Updated 10/30/2014 © 2014 RStudio, Inc. CC BY RStudio.

2
R Markdown Reference Guide
Learn more about R Markdown at rmarkdown.rstudio.com
Contents:
1. Markdown Syntax
Learn more about Interactive Docs at shiny.rstudio.com/articles
2. Knitr chunk options
3. Pandoc options
Syntax Becomes
Make a code chunk with three back ticks followed
by an r in braces. End the chunk with three back
ticks:

```{r}
paste("Hello", "World!")
```

Place code inline with a single back ticks. The


first back tick must be followed by an R, like
this `r paste("Hello", "World!")`.

Add chunk options within braces. For example,


`echo=FALSE` will prevent source code from being
displayed:

```{r eval=TRUE, echo=FALSE}


paste("Hello", "World!")
```
Learn more about chunk options at http://yihui.name/knitr/options

Chunk options
option default value description
Code evaluation
child NULL A character vector of filenames. Knitr will knit the files and place them into the main document.
code NULL Set to R code. Knitr will replace the code in the chunk with the code in the code option.
Knitr will evaluate the chunk in the named language, e.g. engine = 'python'. Run names(knitr::knit_engines$get()) to
engine 'R'
see supported languages.
eval TRUE If FALSE, knitr will not run the code in the code chunk.
include TRUE If FALSE, knitr will run the chunk but not include the chunk in the final document.
purl TRUE If FALSE, knitr will not include the chunk when running purl() to extract the source code.
Results
collapse FALSE If TRUE, knitr will collapse all the source and output blocks created by the chunk into a single block.
echo TRUE If FALSE, knitr will not display the code in the code chunk above it’s results in the final document.

If 'hide', knitr will not display the code’s results in the final document. If 'hold', knitr will delay displaying all output
results 'markup' pieces until the end of the chunk. If 'asis', knitr will pass through results without reformatting them (useful if results
return raw HTML, etc.)

error TRUE If FALSE, knitr will not display any error messages generated by the code.
message TRUE If FALSE, knitr will not display any messages generated by the code.
warning TRUE If FALSE, knitr will not display any warning messages generated by the code.
Code Decoration
comment '##' A character string. Knitr will append the string to the start of each line of results in the final document.
highlight TRUE If TRUE, knitr will highlight the source code in the final output.
prompt FALSE If TRUE, knitr will add > to the start of each line of code displayed in the final document.
strip.white TRUE If TRUE, knitr will remove white spaces that appear at the beginning or end of a code chunk.
tidy FALSE If TRUE, knitr will tidy code chunks for display with the tidy_source() function in the formatR package.

Updated 10/30/2014 © 2014 RStudio, Inc. CC BY RStudio.

3
R Markdown Reference Guide
Learn more about R Markdown at rmarkdown.rstudio.com
Contents:
1. Markdown Syntax
Learn more about Interactive Docs at shiny.rstudio.com/articles
2. Knitr chunk options
3. Pandoc options
Chunk options (Continued)
option default value description
Chunks
opts.label NULL The label of options set in knitr:: opts_template() to use with the chunk.
R.options NULL Local R options to use with the chunk. Options are set with options() at start of chunk. Defaults are restored at end.
ref.label NULL A character vector of labels of the chunks from which the code of the current chunk is inherited.
Cache
autodep FALSE If TRUE, knitr will attempt to figure out dependencies between chunks automatically by analyzing object names.
cache FALSE If TRUE, knitr will cache the results to reuse in future knits. Knitr will reuse the results until the code chunk is altered.
cache.comments NULL If FALSE, knitr will not rerun the chunk if only a code comment has changed.
cache.lazy TRUE If TRUE, knitr will use lazyload() to load objects in chunk. If FALSE, knitr will use load() to load objects in chunk.
cache.path 'cache/' A file path to the directory to store cached results in. Path should begin in the directory that the .Rmd file is saved in.
cache.vars NULL A character vector of object names to cache if you do not wish to cache each object in the chunk.
A character vector of chunk labels to specify which other chunks a chunk depends on. Knitr will update a cached
dependson NULL
chunk if its dependencies change.
Animation
anipots 'controls,loop' Extra options for animations (see the animate package).
interval 1 The number of seconds to pause between animation frames.
Plots
dev 'png' The R function name that will be used as a graphical device to record plots, e.g. dev='CairoPDF'.
dev.args NULL Arguments to be passed to the device, e.g. dev.args=list(bg='yellow', pointsize=10).
dpi 72 A number for knitr to use as the dots per inch (dpi) in graphics (when applicable).
external TRUE If TRUE, knitr will externalize tikz graphics to save LaTex compilation time (only for the tikzDevice::tikz() device).
fig.align 'default' How to align graphics in the final document. One of 'left', 'right', or 'center'.
fig.cap NULL A character string to be used as a figure caption in LaTex.
fig.env 'figure' The Latex environment for figures.
fig.ext NULL The file extension for figure output, e.g. fig.ext='png'.
fig.height, fig.width 7 The width and height to use in R for plots created by the chunk (in inches).

If 'high', knitr will merge low-level changes into high level plots. If 'all', knitr will keep all plots (low-level changes may
fig.keep 'high' produce new plots). If 'first', knitr will keep the first plot only. If 'last', knitr will keep the last plot only. If 'none', knitr
will discard all plots.

fig.lp 'fig:' A prefix to be used for figure labels in latex.


fig.path 'figure/' A file path to the directory where knitr should store the graphics files created by the chunk.
fig.pos '' A character string to be used as the figure position arrangement in LaTex.
fig.process NULL A function to post-process a figure file. Should take a filename and return a filename of a new figure source.
fig.retina 1 Dpi multiplier for displaying HTML output on retina screens.
fig.scap NULL A character string to be used as a short figure caption.
fig.subcap NULL A character string to be used as captions in sub-figures in LaTex.
If 'hide', knitr will generate the plots created in the chunk, but not include them in the final document. If 'hold', knitr
fig.show 'asis' will delay displaying the plots created by the chunk until the end of the chunk. If 'animate', knitr will combine all of
the plots created by the chunk into an animation.
fig.showtext NULL If TRUE, knitr will call showtext::showtext.begin() before drawing plots.
out.extra NULL A character string of extra options for figures to be passed to LaTex or HTML.
out.height, out.width NULL The width and height to scale plots to in the final output. Can be in units recognized by output, e.g. 8\\linewidth, 50px
resize.height, resize.width NULL The width and height to resize tike graphics in LaTex, passed to \resizebox{}{}.
sanitize FALSE If TRUE, knitr will sanitize tike graphics for LaTex.
© 2014 RStudio, Inc. CC BY RStudio.

4
R Markdown Reference Guide
Learn more about R Markdown at rmarkdown.rstudio.com
Contents:
1. Markdown Syntax
Learn more about Interactive Docs at shiny.rstudio.com/articles
2. Knitr chunk options
3. Pandoc options
Templates Basic YAML Template options Latex options Interactive Docs
--- ---
html_document --- ---
title: "Chapters" title: "Slides"
pdf_document title: "A Web Doc" title: "My PDF"
output: output:
word_document author: "John Doe" output: pdf_document
md_document html_document: slidy_presentation:
date: "May 1, 2015" fontsize: 11pt
ioslides_presentation toc: true incremental: true
output: md_document geometry: margin=1in
slidy_presentation toc_depth: 2 runtime: shiny
--- ---
beamer_presentation --- ---

Syntax for slide formats (ioslides, slidy, beamer)


# Dividing slides 1

Pandoc will start a new slide at each first level header

## Header 2

… as well as each second level header


becomes
***

You can start a new slide with a horizontal rule`***` if you do not want
a header.

## Bullets

Render bullets with

- a dash
- another dash

## Incremental bullets

>- Use this format


>- to have bullets appear
>- one at a time (incrementally)

Slide display modes


Press a key below during presentation to enter display mode. Press esc to exit display mode.

ioslides slidy
f - enable fullscreen mode C - show table of contents
w - toggle widescreen mode F - toggle display of the footer
o - enable overview mode A - toggle display of current vs all slides
h - enable code highlight mode S - make fonts smaller
p - show presenter notes B - make fonts bigger

Top level options to customize LaTex (pdf) output


option description

lang Document language code

fontsize Font size (e.g. 10pt, 11pt, 12 pt)

documentclass Latex document class (e.g. article)

classoption Option for document class (e.g. oneside); may be repeated

geometry Options for geometry class (e.g. margin=1in); may be repeated

mainfont, sansfont, monofont, mathfont Document fonts (works only with xelatex and lualatex, see the latex_engine option)

linkcolor, urlcolor, citecolor Color for internal, external, and citation links (red, green, magenta, cyan, blue, black)

Updated 10/30/2014 © 2014 RStudio, Inc. CC BY RStudio.

5
R Markdown Reference Guide
Learn more about R Markdown at rmarkdown.rstudio.com
Contents:
1. Markdown Syntax
Learn more about Interactive Docs at shiny.rstudio.com/articles
2. Knitr chunk options
3. Pandoc options

beamer
ioslides
word

slidy
html
pdf

md
option description

colortheme X Beamer color theme to use (e.g., colortheme: "dolphin").


css X X X Filepath to CSS style to use to style document (e.g., css: styles.css).
duration X Add a countdown timer (in minutes) to footer of slides (e.g., duration: 45).
fig_caption X X X X X X Should figures be rendered with captions?
fig_crop X X Should pdfcrop utility be automatically applied to figures (when available)?
fig_height X X X X X X X Default figure height (in inches) for document.
fig_retina X X X X Scaling to perform for retina displays (e.g., fig_retina: 2).
fig_width X X X X X X X Default figure width (in inches) for document.
font_adjustmen X Increase or decrease font size for entire presentation (e.g., font_adjustment: -1).
t fonttheme X Beamer font theme to use (e.g., fonttheme: "structurebold").
footer X Text to add to footer of each slide (e.g., footer: "Copyright (c) 2014 RStudio").
highlight X X X X Syntax highlighting style (e.g. "tango", "pygments", "kate", "zenburn", and
includes X X X X X X "textmate")
See below
-in_header X X X X X File of content to place in document header (e.g., in_header: header.html).
-before_body X X X X X File of content to place before document body (e.g., before_body:
-after_body X X X X X doc_prefix.html ). after document body (e.g., after_body: doc_suffix.html).
File of content to place
incremental X X X Should bullets appear one at a time (on presenter mouse clicks)?
keep_md X X X Save a copy of .md file that contains knitr output (in addition to the .Rmd and HTML files)?
keep_tex X X Save a copy of .tex file that contains knitr output (in addition to the .Rmd and PDF files)?
latex_engine X Engine to render latex. Should be one of "pdflatex", "xelatex", and "lualatex".
lib_dir X X X Directory of dependency files to use (Bootstrap, MathJax, etc.) (e.g., lib_dir: libs).
logo X File path to a logo (at least 128 x 128) to add to presentation (e.g., logo: logo.png).
mathjax X X X Set to local or a URL to use a local/URL version of MathJax to render equations
number_section X X Add section numbering to headers (e.g., number_sections: true).
s pandoc_args X X X X X X X Arguments to pass to Pandoc (e.g., pandoc_args: ["--title-prefix", "Foo"]).
preserve_yaml X Preserve YAML front matter in final document?
reference_docx X A .docx file whose styles should be copied to use (e.g., reference_docx:
self_contained X X X mystyles.docx ).
Embed dependencies into the doc? Set to false to keep dependencies in external files.
slide_level X The lowest heading level that defines individual slides (e.g., slide_level: 2).
smaller X Use the smaller font size in the presentation?
smart X X X Convert straight quotes to curly, dashes to em-dashes, … to ellipses, and so on?
template X X X X Pandoc template to use when rendering file (e.g., template:
quarterly_report.html).
Bootswatch or Beamer theme to use for page. Valid bootswatch themes include
theme X X
"cerulean", "journal", "flatly", "readable", "spacelab", "united", and "cosmo".

toc X X X X Add a table of contents at start of document? (e.g., toc: true).


toc_depth X X X The lowest level of headings to add to table of contents (e.g., toc_depth: 2).
transition X Speed of slide transitions should be "slower", "faster" or a number in seconds.
The flavor of markdown to use; one of "markdown", "markdown_strict",
variant X
"markdown_github", "markdown_mmd", and "markdown_phpextra"

widescreen X Display presentation in widescreen format?


Updated 10/30/2014 © 2014 RStudio, Inc. CC BY RStudio.

6
Base R Vectors Programming
Creating Vectors For Loop While Loop
Cheat Sheet c(2, 4, 6) 2 4 6
Join elements into
for (variable in sequence){ while (condition){
a vector

Do something Do something
Getting Help An integer
2:6 2 3 4 5 6
sequence } }

Accessing the help files seq(2, 3, by=0.5) 2.0 2.5 3.0


A complex Example Example
sequence
?mean for (i in 1:4){ while (i < 5){
Get help of a particular function. rep(1:2, times=3) 1 2 1 2 1 2 Repeat a vector
j <- i + 10 print(i)
help.search(‘weighted mean’)
print(j) i <- i + 1
Search the help files for a word or phrase. rep(1:2, each=3) 1 1 1 2 2 2 Repeat elements
of a vector
help(package = ‘dplyr’) } }
Find help for a package. Vector Functions
More about an object If Statements Functions
sort(x) rev(x)
Return x sorted. Return x reversed. if (condition){ function_name <- function(var){
str(iris)
table(x) unique(x) Do something
Get a summary of an object’s structure. Do something
See counts of values. See unique values. } else {
class(iris) Do something different return(new_variable)
Find the class an object belongs to. } }
Selecting Vector Elements
Example
Using Packages Example
By Position if (i > 3){ square <- function(x){
install.packages(‘dplyr’) x[4] The fourth element. print(‘Yes’)
squared <- x*x
Download and install a package from CRAN. } else {
print(‘No’) return(squared)
library(dplyr) x[-4] All but the fourth.
} }
Load the package into the session, making all
its functions available to use. x[2:4] Elements two to four.
Reading and Writing Data Also see the readr package.
dplyr::select All elements except
x[-(2:4)] Input Ouput Description
Use a particular function from a package. two to four.
Elements one and Read and write a delimited text
data(iris) df <- read.table(‘file.txt’) write.table(df, ‘file.txt’)
x[c(1, 5)] file.
Load a built-in dataset into the environment. five.

By Value Read and write a comma


Working Directory x[x == 10]
Elements which df <- read.csv(‘file.csv’) write.csv(df, ‘file.csv’) separated value file. This is a
special case of read.table/
are equal to 10.
write.table.
getwd()
All elements less
Find the current working directory (where x[x < 0]
than zero. Read and write an R data file, a
inputs are found and outputs are sent). load(‘file.RData’) save(df, file = ’file.Rdata’)
file type special for R.
x[x %in% Elements in the set
setwd(‘C://file/path’) c(1, 2, 5)] 1, 2, 5.
Change the current working directory.
Named Vectors Greater than
a == b Are equal a > b Greater than a >= b is.na(a) Is missing
or equal to
Conditions
Use projects in RStudio to set the working Element with Less than or
x[‘apple’] a != b Not equal a < b Less than a <= b is.null(a) Is null
directory to the folder you are working in. name ‘apple’. equal to

RStudio® is a trademark of RStudio, Inc. • CC BY Mhairi McNeill • [email protected] Learn more at web page or vignette • package version • Updated: 3/15 7
Types Matrices Strings Also see the stringr package.
m <- matrix(x, nrow = 3, ncol = 3) paste(x, y, sep = ' ')
Converting between common data types in R. Can always go Join multiple vectors together.
Create a matrix from x.
from a higher value in the table to a lower value.
paste(x, collapse = ' ') Join elements of a vector together.
m[2, ] - Select a row t(m)

w
ww Transpose
grep(pattern, x) Find regular expression matches in x.

ww
as.logical TRUE, FALSE, TRUE Boolean values (TRUE or FALSE).

w m[ , 1] - Select a column
m %*% n gsub(pattern, replace, x) Replace matches in x with a string.

Integers or floating point


w
ww Matrix Multiplication toupper(x) Convert to uppercase.

ww
as.numeric 1, 0, 1
numbers.

w m[2, 3] - Select an element


solve(m, n)
Find x in: m * x = n
tolower(x) Convert to lowercase.
as.character '1', '0', '1'
Character strings. Generally

w
ww
ww
preferred to factors. nchar(x) Number of characters in a string.

as.factor
'1', '0', '1',
levels: '1', '0'
Character strings with preset
levels. Needed for some
statistical models.
w Lists Factors
l <- list(x = 1:5, y = c('a', 'b')) factor(x) cut(x, breaks = 4)
Maths Functions A list is a collection of elements which can be of different types. Turn a vector into a factor. Can
set the levels of the factor and
Turn a numeric vector into a
factor by ‘cutting’ into
log(x) Natural log. sum(x) Sum. l[[2]] l[1] l$x l['y'] the order. sections.
New list with New list with
exp(x) Exponential. mean(x) Mean. Second element Element named
only the first only element
max(x) Largest element. median(x) Median.
of l.
element.
x.
named y. Statistics
min(x) Smallest element. quantile(x) Percentage
lm(y ~ x, data=df) prop.test
Also see the t.test(x, y)
quantiles.
dplyr package. Data Frames Linear model. Perform a t-test for Test for a
round(x, n) Round to n decimal rank(x) Rank of elements. difference
difference between
places. glm(y ~ x, data=df) between
df <- data.frame(x = 1:3, y = c('a', 'b', 'c')) means.
Generalised linear model. proportions.
signif(x, n) Round to n var(x) The variance. A special case of a list where all elements are the same length.
significant figures. pairwise.t.test
List subsetting summary aov
Perform a t-test for
cor(x, y) Correlation. sd(x) The standard x y Get more detailed information Analysis of
paired data.
deviation. out a model. variance.
df$x df[[2]]
1 a
Variable Assignment Distributions
2 b Understanding a data frame
> a <- 'apple' Random Density Cumulative
Quantile
> a See the full data Variates Function Distribution
3 c View(df)
[1] 'apple' frame. Normal rnorm dnorm pnorm qnorm
See the first 6
Matrix subsetting head(df) Poisson rpois dpois ppois qpois
rows.
The Environment Binomial rbinom dbinom pbinom qbinom
df[ , 2]
ls() List all variables in the nrow(df) cbind - Bind columns. Uniform runif dunif punif qunif
environment. Number of rows.

rm(x) Remove x from the


environment. df[2, ]
ncol(df)
Number of
Plotting Also see the ggplot2 package.

columns.
rm(list = ls()) Remove all variables from the rbind - Bind rows. plot(x) plot(x, y) hist(x)
environment. Values of x in Values of x Histogram of
dim(df)
Number of order. against y. x.
You can use the environment panel in RStudio to
df[2, 2] columns and
browse variables in your environment. rows.
Dates See the lubridate package.

RStudio® is a trademark of RStudio, Inc. • CC BY Mhairi McNeill • [email protected] • 844-448-1212 • rstudio.com


Learn more at web page or vignette • package version • Updated: 3/15 8
Data import with the tidyverse : : CHEAT SHEET
Read Tabular Data with readr
read_*(file, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale, n_max = Inf, One of the first steps of a project is to import OTHER TYPES OF DATA
skip = 0, na = c("", "NA"), guess_max = min(1000, n_max), show_col_types = TRUE) See ?read_delim outside data into R. Data is often stored in Try one of the following
tabular formats, like csv files or spreadsheets. packages to import other types of files:
A|B|C
A B C read_delim("file.txt", delim = "|") Read files with any delimiter. If no The front page of this sheet shows • haven - SPSS, Stata, and SAS files
1 2 3 delimiter is specified, it will automatically guess. how to import and save text files into • DBI - databases
1|2|3 4 5 NA To make file.txt, run: write_file("A|B|C\n1|2|3\n4|5|NA", file = "file.txt")
4|5|NA R using readr. • jsonlite - json
The back page shows how to import • xml2 - XML
A B C read_csv("file.csv") Read a comma delimited file with period • httr - Web APIs
A,B,C spreadsheet data from Excel files
1 2 3 decimal marks. • rvest - HTML (Web Scraping)
1,2,3 4 5 NA write_file("A,B,C\n1,2,3\n4,5,NA", file = "file.csv") using readxl or Google Sheets using
4,5,NA googlesheets4. • readr::read_lines() - text data

read_csv2("file2.csv") Read semicolon delimited files with comma


Column Specification with readr
A B C
A;B;C
1.5 2 3 decimal marks.
1,5;2;3 4.5 5 NA write_file("A;B;C\n1,5;2;3\n4,5;5;NA", file = "file2.csv")
4,5;5;NA Column specifications define what data type each
USEFUL COLUMN ARGUMENTS
column of a file will be imported as. By default
A B C read_tsv("file.tsv") Read a tab delimited file. Also read_table(). readr will generate a column spec when a file is Hide col spec message
ABC read_*(file, show_col_types = FALSE)
1 2 3 read_fwf("file.tsv", fwf_widths(c(2, 2, NA))) Read a fixed width file. read and output a summary.
123 4 5 NA write_file("A\tB\tC\n1\t2\t3\n4\t5\tNA\n", file = "file.tsv")
4 5 NA spec(x) Extract the full column specification for Select columns to import
the given imported data frame. Use names, position, or selection helpers.
read_*(file, col_select = c(age, earn))
USEFUL READ ARGUMENTS spec(x)
# cols(
A B C No header 1 2 3 Skip lines # age = col_integer(), age is an
integer Guess column types
1 2 3 read_csv("file.csv", col_names = FALSE) 4 5 NA read_csv("file.csv", skip = 1) # sex = col_character(),
4 5 NA # earn = col_double() To guess a column type, read_ *() looks at the
A B C Read a subset of lines # ) first 1000 rows of data. Increase with guess_max.
x y z Provide header 1 2 3 read_csv("file.csv", n_max = 1) sex is a read_*(file, guess_max = Inf)
A B C read_csv("file.csv", earn is a double (numeric) character
1 2 3 col_names = c("x", "y", "z")) A B C Read values as missing
4 5 NA
NA 2 3 read_csv("file.csv", na = c("1")) COLUMN TYPES DEFINE COLUMN SPECIFICATION
4 5 NA
Read multiple files into a single table Each column type has a function and Set a default type
read_csv(c(“f1.csv”, “f2.csv”, “f3.csv"), Specify decimal marks corresponding string abbreviation. read_csv(
A;B;C
id = "origin_file") read_delim("file2.csv", locale = file,
1,5;2;3,0 locale(decimal_mark = ",")) • col_logical() - "l"
col_type = list(.default = col_double())
• col_integer() - "i" )
• col_double() - "d"
Save Data with readr • col_number() - "n"
Use column type or string abbreviation
read_csv(
• col_character() - "c" file,
write_*(x, file, na = "NA", append, col_names, quote, escape, eol, num_threads, progress) • col_factor(levels, ordered = FALSE) - "f" col_type = list(x = col_double(), y = "l", z = "_")
• col_datetime(format = "") - "T" )
A B C write_delim(x, file, delim = " ") Write files with any delimiter. • col_date(format = "") - "D" Use a single string of abbreviations
A,B,C • col_time(format = "") - "t"
1 2 3
write_csv(x, file) Write a comma delimited file. # col types: skip, guess, integer, logical, character
4 5 NA 1,2,3 • col_skip() - "-", "_" read_csv(
4,5,NA write_csv2(x, file) Write a semicolon delimited file. • col_guess() - "?" file,
col_type = "_?ilc"
write_tsv(x, file) Write a tab delimited file. )

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at readr.tidyverse.org • readr 2.0.0 • readxl 1.3.1 • googlesheets4 1.0.0 • Updated: 2021-08
9
Import Spreadsheets
with readxl with googlesheets4
READ EXCEL FILES READ SHEETS
A B C D E A B C D E
1 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 1 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
2 x z 8 x NA z 8 NA 2 x z 8 x NA z 8 NA
3 y 7 9 10 y 7 NA 9 10 READXL COLUMN SPECIFICATION 3 y 7 9 10 y 7 NA 9 10 GOOGLESHEETS4 COLUMN SPECIFICATION
s1 s1
Column specifications define what data type Column specifications define what data type
each column of a file will be imported as. each column of a file will be imported as.
read_excel(path, sheet = NULL, range = NULL) read_sheet(ss, sheet = NULL, range = NULL)
Read a .xls or .xlsx file based on the file extension. Read a sheet from a URL, a Sheet ID, or a dribble
Use the col_types argument of read_excel() to Use the col_types argument of read_sheet()/
See front page for more read arguments. Also from the googledrive package. See front page for
set the column specification. range_read() to set the column specification.
read_xls() and read_xlsx(). more read arguments. Same as range_read().
read_excel("excel_file.xlsx")
Guess column types Guess column types
To guess a column type, read_ excel() looks at SHEETS METADATA To guess a column type read_sheet()/
READ SHEETS the first 1000 rows of data. Increase with the URLs are in the form: range_read() looks at the first 1000 rows of data.
guess_max argument. https://docs.google.com/spreadsheets/d/ Increase with guess_max.
A B C D E read_excel(path, sheet = read_excel(path, guess_max = Inf) read_sheet(path, guess_max = Inf)
NULL) Specify which sheet SPREADSHEET_ID/edit#gid=SHEET_ID
to read by position or name. Set all columns to same type, e.g. character gs4_get(ss) Get spreadsheet meta data. Set all columns to same type, e.g. character
read_excel(path, sheet = 1) read_excel(path, col_types = "text") read_sheet(path, col_types = "c")
s1 s2 s3
read_excel(path, sheet = "s1") gs4_find(...) Get data on all spreadsheet files.
Set each column individually sheet_properties(ss) Get a tibble of properties Set each column individually
read_excel( for each worksheet. Also sheet_names(). # col types: skip, guess, integer, logical, character
excel_sheets(path) Get a
vector of sheet names. path, read_sheets(ss, col_types = "_?ilc")
s1 s2 s3
col_types = c("text", "guess", "guess",“numeric") WRITE SHEETS
excel_sheets("excel_file.xlsx")
) A B C write_sheet(data, ss =
1 x 4 1 1 x 4 NULL, sheet = NULL) COLUMN TYPES
A B C D E To read multiple sheets: 2 y 5 2 2 y 5
Write a data frame into a
COLUMN TYPES l n c D L
A B C D E 1. Get a vector of sheet 3 z 6 3 3 z 6
new or existing Sheet. TRUE 2 hello 1947-01-08 hello
s1
names from the file path. logical numeric text date list FALSE 3.45 world 1956-10-21 1
A B C D E gs4_create(name, ...,
2. Set the vector names to TRUE 2 hello 1947-01-08 hello
s1 s2 A B C D sheets = NULL) Create a
be the sheet names. FALSE 3.45 world 1956-10-21 1 • skip - "_" or "-" • date - "D"
1 new Sheet with a vector
s1 s2 3. Use purrr::map_dfr() to • guess - "?" • datetime - "T"
• skip • logical • date 2 of names, a data frame,
s1 s2 s3 read multiple files into • logical - "l" • character - "c"
• guess • numeric • list s1 or a (named) list of data
one data frame. • integer - "i" • list-column - "L"
• text frames.
• double - "d" • cell - "C" Returns
path <- "your_file_path.xlsx" A B C
sheet_append(ss, data,
x1 x2 x3 1 x1 x2 x3 • numeric - "n" list of raw cell data.
path %>% excel_sheets() %>% Use list for columns that include multiple data 2 1 x 4 sheet = 1) Add rows to
2 y 5
set_names() %>% types. See tidyr and purrr for list-column data. 3 z 6 3 2 y 5 the end of a worksheet. Use list for columns that include multiple data
map_dfr(read_excel, path = path) 4 3 z 6 types. See tidyr and purrr for list-column data.
s1

OTHER USEFUL EXCEL PACKAGES CELL SPECIFICATION FOR READXL AND GOOGLESHEETS4 FILE LEVEL OPERATIONS
For functions to write data to Excel files, see: Use the range argument of readxl::read_excel() or googlesheets4 also offers ways to modify other
• openxlsx googlesheets4::read_sheet() to read a subset of cells from a aspects of Sheets (e.g. freeze rows, set column
• writexl A B C D E sheet. width, manage (work)sheets). Go to
1 1 2 3 4 5 2 3 4 read_excel(path, range = "Sheet1!B1:D2") googlesheets4.tidyverse.org to read more.
For working with non-tabular Excel data, see: 2 x y z NA y z read_sheet(ss, range = "B1:D2")
• tidyxl 3 6 7 9 10 For whole-file operations (e.g. renaming, sharing,
s1 Also use the range argument with cell specification functions placing within a folder), see the tidyverse
cell_limits(), cell_rows(), cell_cols(), and anchored(). package googledrive at
googledrive.tidyverse.org.

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • readxl.tidyverse.org and googlesheets4.tidyverse.org • readr 2.0.0 • readxl 1.3.1 • googlesheets4 1.0.0 • Updated: 2021-08
10
rmarkdown : : CHEAT SHEET SOURCE EDITOR
RENDERED OUTPUT file path to output document

What is rmarkdown? 1. New File Write with


.Rmd files · Develop your code and
ideas side-by-side in a single
5. Save and Render 6. Share find in document
publish to
rpubs.com,
Markdown
document. Run code as individual shinyapps.io, The syntax on the left renders as the output on the right.
chunks or as an entire document. set insert go to run code RStudio Connect
Rmd preview code code chunk(s) Plain text. Plain text.
Dynamic Documents · Knit together location chunk chunk show End a line with two spaces to End a line with two spaces to
plots, tables, and results with outline start a new paragraph. start a new paragraph.
narrative text. Render to a variety of 4. Set Output Format(s) Also end with a backslash\ Also end with a backslash
formats like HTML, PDF, MS Word, or and Options reload document to make a new line. to make a new line.
MS Powerpoint. *italics* and **bold** italics and bold
Reproducible Research · Upload, link superscript^2^/subscript~2~ superscript2/subscript2
to, or attach your report to share. ~~strikethrough~~ strikethrough
Anyone can read or run your code to 3. Write Text run all escaped: \* \_ \\ escaped: * _ \
reproduce your work. previous
modify chunks endash: --, emdash: --- endash: –, emdash: —
chunk run
options current # Header 1
Header 1
Workflow 2. Embed Code
chunk ## Header 2
... Header 2
...
11. Open a new .Rmd file in the RStudio IDE by ###### Header 6 Header 6
going to File > New File > R Markdown. • unordered list
- unordered list
22. Embed code in chunks. Run code by line, by - item 2 • item 2
chunk, or all at once. - item 2a (indent 1 tab) • item 2a (indent 1 tab)
- item 2b • item 2b
33. Write text and add tables, figures, images, and 1. ordered list 1. ordered list
citations. Format with Markdown syntax or the 2. item 2 2. item 2
RStudio Visual Markdown Editor.
44. Set output format(s) and options in the YAML
VISUAL EDITOR insert citations style options Insert Citations - item 2a (indent 1 tab)
- item 2b
<link url>
• item 2a (indent 1 tab)
• item 2b
http://www.rstudio.com/
header. Customize themes or add parameters Create citations from a bibliography file, a Zotero library,
to execute or add interactivity with Shiny. or from DOI references. [This is a link.](link url) This is a link.
[This is another link][id]. This is another link.
55. Save and render the whole document. Knit BUILD YOUR BIBLIOGRAPHY
periodically to preview your work as you write. At the end of the document:
add/edit
• Add BibTeX or CSL bibliographies to the YAML header. [id]: link url
66. Share your work! attributes
---
![Caption](image.png)
title: "My Document"
bibliography: references.bib or ![Caption][id2]
link-citations: TRUE At the end of the document:

Embed Code with knitr


--- [id2]: image.png Caption.
• If Zotero is installed locally, your main library will `verbatim code` verbatim code
automatically be available. ```
CODE CHUNKS OPTION DEFAULT EFFECTS
multiple lines multiple lines
Surround code chunks with ```{r} and ``` or use echo TRUE display code in output document • Add citations by DOI by searching "from DOI" in the
of verbatim code of verbatim code
the Insert Code Chunk button. Add a chunk label TRUE (display error messages in doc) Insert Citation dialog. ```
error FALSE FALSE (stop render when error occurs)
and/or chunk options inside the curly braces after r. > block quotes block quotes
eval TRUE run code in chunk INSERT CITATIONS
```{r chunk-label, include=FALSE} include TRUE include chunk in doc after running • Access the Insert Citations dialog in the Visual Editor equation: $e^{i \pi} + 1 = 0$ equation: e iπ + 1 = 0
summary(mtcars) message TRUE display code messages in document by clicking the @ symbol in the toolbar or by clicking
Insert > Citation. equation block: equation block:
``` warning TRUE display code warnings in document
"asis" (passthrough results) $$E = mc^{2}$$ E = m c2
• Add citations with markdown syntax by typing [@cite]
SET GLOBAL OPTIONS results "markup" "hide" (don't display results) or @cite. horizontal rule: horizontal rule:
"hold" (put all results below all code) ---
Set options for the entire document in the first chunk.
Insert Tables
fig.align "default" "left", "right", or "center"
fig.alt NULL alt text for a figure | Right | Left | Default | Center | Right Left Default Center
```{r include=FALSE} |-------:|:------|-----------|:---------:|
knitr::opts_chunk$set(message = FALSE) fig.cap NULL figure caption as a character string 12 12 12 12
Output data frames as tables using | 12 | 12 | 12 | 12 |
``` fig.path "figure/" prefix for generating figure file paths kable(data, caption). 123 123 123 123
| 123 | 123 | 123 | 123 |
fig.width & 1 1 1 1
fig.height 7 plot dimensions in inches |1|1|1|1|
INLINE CODE out.width rescales output width, e.g. "75%", "300px" ```{r}
Insert `r <code>` into text sections. Code is evaluated collapse FALSE collapse all sources & output into a single block data <- faithful[1:4, ] HTML Tabsets
at render and results appear as text.
comment "##" prefix for each line of results knitr::kable(data, # Results {.tabset} Results
## Plots text
"Built with `r getRversion()`" --> "Built with 4.1.0" child NULL files(s) to knit and then include caption = "Table with kable") text Plots Tables
include or exclude a code chunk when ```
purl TRUE extracting source code with knitr::purl() ## Tables text
See more options and defaults by running str(knitr::opts_chunk$get()) Other table packages include flextable, gt, and kableExtra. more text

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at rmarkdown.rstudio.com • rmarkdown 2.9.4 • Updated: 2021-08
11
Set Output Formats and their Options in YAML Render

MS Word
MS PPT
HTML
PDF
Use the document's YAML header to set an output IMPORTANT OPTIONS DESCRIPTION When you render a
format and customize it with output options. anchor_sections Show section anchors on mouse hover (TRUE or FALSE) X document, rmarkdown:
--- citation_package The LaTeX package to process citations ("default", "natbib", "biblatex") X 1. Runs the code and embeds
title: "My Document" results and text into an .md
author: "Author Name" code_download Give readers an option to download the .Rmd source code (TRUE or FALSE) X
file with knitr.
output: code_folding Let readers to toggle the display of R code ("none", "hide", or "show") X
html_document: Indent format 2 characters, 2. Converts the .md file into the output format with
toc: TRUE css CSS or SCSS file to use to style document (e.g. "style.css") X Pandoc.
indent options 4 characters
--- dev Graphics device to use for figure output (e.g. "png", "pdf") X X HTML
knitr pandoc
.Rmd .md PDF
df_print Method for printing data frames ("default", "kable", "tibble", "paged") X X X X DOC
OUTPUT FORMAT CREATES
html_document .html fig_caption Should figures be rendered with captions (TRUE or FALSE) X X X X
Save, then Knit to preview the document output.
pdf_document* .pdf highlight Syntax highlighting ("tango", "pygments", "kate", "zenburn", "textmate") X X X The resulting HTML/PDF/MS Word/etc. document will
word_document Microsoft Word (.docx) includes File of content to place in doc ("in_header", "before_body", "after_body") X X be created and saved in the same directory as
powerpoint_presentation Microsoft Powerpoint (.pptx)
the .Rmd file.
keep_md Keep the Markdown .md file generated by knitting (TRUE or FALSE) X X X X
odt_document OpenDocument Text Use rmarkdown::render() to render/knit in the R
keep_tex Keep the intermediate TEX file used to convert to PDF (TRUE or FALSE) X
rtf_document Rich Text Format console. See ?render for available options.
latex_engine LaTeX engine for producing PDF output ("pdflatex", "xelatex", or "lualatex") X
md_document Markdown
github_document
ioslides_presentation
Markdown for Github
ioslides HTML slides
reference_docx/_doc
theme
docx/pptx file containing styles to copy in the output (e.g. "file.docx", "file.pptx")
Theme options (see Bootswatch and Custom Themes below) X
X X
Share
Publish on RStudio Connect
slidy_presentation Slidy HTML slides toc Add a table of contents at start of document (TRUE or FALSE) X X X X to share R Markdown documents
beamer_presentation* Beamer slides toc_depth The lowest level of headings to add to table of contents (e.g. 2, 3) X X X X securely, schedule automatic
* Requires LaTeX, use tinytex::install_tinytex()
toc_float Float the table of contents to the left of the main document content (TRUE or FALSE) X updates, and interact with parameters in real time.
Also see flexdashboard, bookdown, distill, and blogdown.
Use ?<output format> to see all of a format's options, e.g. ?html_document rstudio.com/products/connect/

More Header Options


PARAMETERS BOOTSWATCH THEMES STYLING WITH CSS AND SCSS INTERACTIVITY
Parameterize your documents to reuse with new Customize HTML documents with Bootswatch Add CSS and SCSS to your document by adding a Turn your report into an interactive
inputs (e.g., data, values, etc.). themes from the bslib package using the theme path to a file with the css option in the YAML header. Shiny document in 4 steps:
--- output option. 1. Add runtime: shiny to the YAML header.
1. Add parameters params: ---
in the header as state: "hawaii" Use bslib::bootswatch_themes() to list available title: "My Document" 2. Call Shiny input functions to embed input objects.
sub-values of --- themes. author: "Author Name" 3. Call Shiny render functions to embed reactive
params. ```{r} --- output: output.
data <- df[, params$state] title: "Document Title" html_document:
2. Call parameters summary(data) css: "style.css" 4. Render with rmarkdown::run() or click Run
author: "Author Name"
in code using ``` output: --- Document in RStudio IDE.
params$<name>. html_document:
theme: Apply CSS styling by writing HTML tags directly or: ---
3. Set parameters bootswatch: solar • Use markdown to apply style attributes inline. output: html_document
with Knit with --- runtime: shiny
Parameters or the ---
Bracketed Span
params argument A [green]{.my-color} word. A green word. ```{r, echo = FALSE}
of render(). CUSTOM THEMES numericInput("n",
Fenced Div "How many cars?", 5)
REUSABLE TEMPLATES Customize individual HTML elements using bslib ::: {.my-color}
variables. Use ?bs_theme to see more variables. All of these words
1. Create a new package with a inst/rmarkdown/ All of these words renderTable({
templates directory. --- are green. are green. head(cars, input$n)
2. Add a folder containing template.yaml (below) output: ::: })
html_document: ```
and skeleton.Rmd (template contents).
--- theme: • Use the Visual Editor. Go to Format > Div/Span and
name: "My Template" bg: "#121212" add CSS styling directly with Edit Attributes. Also see Shiny Prerendered for better performance.
--- fg: "#E4E4E4" rmarkdown.rstudio.com/
3. Install the package to access template by going to base_font: authoring_shiny_prerendered
google: "Prompt" .my-css-tag ...
File > New R Markdown > From Template.
--- Embed a complete app into your document with
This is a div with some text in it. shiny::shinyAppDir(). More at bookdown.org/yihui/
More on bslib at pkgs.rstudio.com/bslib/.
rmarkdown/shiny-embedded.html.

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at rmarkdown.rstudio.com • rmarkdown 2.9.4 • Updated: 2021-08
12
RStudio IDE : : CHEAT SHEET
Documents and Apps Source Editor Tab Panes Version
Open Shiny, R Markdown,
knitr, Sweave, LaTeX, .Rd files
Navigate

forwards
Open in new Save Find and
backwards/ window replace
Compile as Run
notebook selected
code
Import data History of past
with wizard commands to
run/copy
Manage
external
View
memory
databases usage
R tutorials
Control
and more in Source Pane Turn on at Tools > Project Options > Git/SVN
A• Added M• Modified
Check Render Choose Configure Insert D• Deleted R• Renamed
?• Untracked

spelling output output render code Publish


format options chunk to server
Stage Commit Push/Pull View Current
Re-run Source with or Show file files: staged files to remote History branch
previous code w/out Echo or outline Load Save Clear R Search inside
as a Local Job workspace workspace workspace environment
Jump to Jump Run Show file Visual Multiple cursors/column selection Choose environment to display from Display objects
previous to next code outline Editor with Alt + mouse drag. list of parent environments as list or grid Open shell to type commands
chunk chunk (reverse
side) Code diagnostics that appear in the margin. Show file diff to view file differences
Run this and Hover over diagnostic symbols for details.
Jump to all previous Run this Syntax highlighting based
section code chunks code chunk on your file's extension
or chunk Tab completion to finish function
Set knitr Displays saved objects by View in data View function
chunk names, file paths, arguments, and more. type with short description viewer source code
options

Access markdown guide at


Help > Markdown Quick Reference
See reverse side for more on Visual Editor
Multi-language code snippets to
quickly use common blocks of code. More file
options
Debug Mode
Jump to function in file Change file type Use debug(), browser(), or a breakpoint and execute
Create Delete Rename Change
folder file file directory your code to open the debugger mode.
RStudio recognizes that files named app.R,
server.R, ui.R, and global.R belong to a shiny app Path to displayed directory Launch debugger Open traceback to examine
mode from origin the functions that R called
Working Run scripts in Maximize, of error before the error occurred
Directory separate sessions minimize panes
A File browser keyed to your working directory.
Run Choose Publish to Manage Click on file or directory name to open.
app location to shinyapps.io publish Ctrl/Cmd + a R Markdown Drag pane
view app or server accounts to see history Build Log boundaries

Package Development
Click next to line number to Highlighted line shows where
RStudio opens plots in a dedicated Plots pane RStudio opens documentation in a dedicated Help pane add/remove a breakpoint. execution has paused
Create a new package with
File > New Project > New Directory > R Package
Navigate Open in Export Delete Delete
Enable roxygen documentation with recent plots window plot plot all plots Home page of Search within Search for
Tools > Project Options > Build Tools helpful links help file help file
Roxygen guide at Help > Roxygen Quick Reference
See package information in the Build Tab Viewer pane displays HTML content, such as Shiny
apps, RMarkdown reports, and interactive visualizations
GUI Package manager lists every installed package
Install package Run devtools::load_all()
and restart R and reload changes
Stop Shiny Publish to shinyapps.io, Refresh
Install Update Browse app rpubs, RSConnect, … Run commands in Examine variables Select function
Packages Packages package site environment where in executing in traceback to
Clear output execution has paused environment debug
Run R CMD and rebuild
check View(<data>) opens spreadsheet like view of data set
Customize Run Click to load package with Package Delete
package build package library(). Unclick to detach version from
options tests package with detach(). installed library

Filter rows by value Sort by Search Step through Step into and Resume Quit debug
or value range values for value code one line out of functions execution mode
at a time to run
RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at rstudio.com • Font Awesome 5.15.3 • RStudio IDE 1.4.1717 • Updated: 2021-07
13
Keyboard Shortcuts RStudio
RUN CODE
Search command history
Windows/Linux
Ctrl+a
Mac
Cmd+a
DOCUMENTS AND APPS
Knit Document (knitr) Ctrl+Shift+K Cmd+Shift+K
Workbench
Interrupt current command Esc Esc Insert chunk (Sweave & Knitr) Ctrl+Alt+I Cmd+Option+I WHY RSTUDIO WORKBENCH?
Clear console Ctrl+L Ctrl+L Run from start to current line Ctrl+Alt+B Cmd+Option+B Extend the open source server with a
commercial license, support, and more:
NAVIGATE CODE MORE KEYBOARD SHORTCUTS
Go to File/Function Ctrl+. Ctrl+. Keyboard Shortcuts Help Alt+Shift+K Option+Shift+K • open and run multiple R sessions at once
Show Command Palette Ctrl+Shift+P Cmd+Shift+P • tune your resources to improve performance
WRITE CODE
Attempt completion Tab or Tab or
• administrative tools for managing user sessions
Ctrl+Space Ctrl+Space View the Keyboard Shortcut Quick Search for keyboard shortcuts with • collaborate real-time with others in shared projects
Insert <- (assignment operator) Alt+- Option+- Reference with Tools > Keyboard Tools > Show Command Palette • switch easily from one version of R to a different version
Shortcuts or Alt/Option + Shift + K or Ctrl/Cmd + Shift + P.
Insert %>% (pipe operator) Ctrl+Shift+M Cmd+Shift+M • integrate with your authentication, authorization, and audit practices
(Un)Comment selection Ctrl+Shift+C Cmd+Shift+C • work in the RStudio IDE, JupyterLab, Jupyter Notebooks, or VS Code
MAKE PACKAGES Windows/Linux Mac Download a free 45 day evaluation at
Load All (devtools) Ctrl+Shift+L Cmd+Shift+L www.rstudio.com/products/workbench/evaluation/
Test Package (Desktop)
Document Package
Ctrl+Shift+T
Ctrl+Shift+D
Cmd+Shift+T
Cmd+Shift+D Share Projects
File > New Project
RStudio saves the call history,

Visual Editor
workspace, and working Start new R Session Close R Session
Choose Choose Insert Jump to Jump Run directory associated with a in current project in project
Check Render output output code previous to next selected Publish Show file project. It reloads each when
spelling output format location chunk chunk chunk lines to server outline you re-open a project.
T H J
Back to
Source Editor
Block (front page) Active shared
format collaborators
Name of
current
Lists and Links Citations Images File outline project
Insert blocks, Select
block
citations, Insert and Share Project R Version
quotes More
formatting equations, and edit tables with Collaborators
Clear special
formatting characters
Insert
verbatim
code
Run Remote Jobs
Run R on remote clusters
(Kubernetes/Slurm) via the
Job Launcher
Add/Edit
attributes Monitor Launch a job
launcher jobs

Run this and


Set knitr all previous
chunk code chunks
options
Run this
Jump to chunk code chunk
or header

Run launcher
jobs remotely

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at rstudio.com • Font Awesome 5.15.3 • RStudio IDE 1.4.1717 • Updated: 2021-07
14
Data transformation with dplyr : : CHEAT SHEET
dplyr functions work with pipes and expect tidy data. In tidy data:
A B C A B C
Manipulate Cases Manipulate Variables
&
pipes EXTRACT CASES EXTRACT VARIABLES
Row functions return a subset of rows as a new table. Column functions return a set of columns as a new vector or table.
Each variable is in Each observation, or x %>% f(y)
its own column case, is in its own row becomes f(x, y) filter(.data, …, .preserve = FALSE) Extract rows pull(.data, var = -1, name = NULL, …) Extract

Summarise Cases w
www
ww that meet logical criteria.
filter(mtcars, mpg > 20) w
www column values as a vector, by name or index.
pull(mtcars, wt)

distinct(.data, …, .keep_all = FALSE) Remove select(.data, …) Extract columns as a table.

w
www
Apply summary functions to columns to create a new table of

w
www
ww
rows with duplicate values. select(mtcars, mpg, wt)
summary statistics. Summary functions take vectors as input and distinct(mtcars, gear)
return one value (see back).
relocate(.data, …, .before = NULL, .after = NULL)
slice(.data, …, .preserve = FALSE) Select rows

w
www
ww
summary function Move columns to new position.
by position. relocate(mtcars, mpg, cyl, .after = last_col())
slice(mtcars, 10:15)
summarise(.data, …)

w
ww w
www
ww
Compute table of summaries. slice_sample(.data, …, n, prop, weight_by =
summarise(mtcars, avg = mean(mpg)) NULL, replace = FALSE) Randomly select rows. Use these helpers with select() and across()
Use n to select a number of rows and prop to e.g. select(mtcars, mpg:cyl)
count(.data, …, wt = NULL, sort = FALSE, name = select a fraction of rows. contains(match) num_range(prefix, range) :, e.g. mpg:cyl
NULL) Count number of rows in each group slice_sample(mtcars, n = 5, replace = TRUE) ends_with(match) all_of(x)/any_of(x, …, vars) -, e.g, -gear

w
ww
defined by the variables in … Also tally(). starts_with(match) matches(match) everything()
count(mtcars, cyl) slice_min(.data, order_by, …, n, prop,
with_ties = TRUE) and slice_max() Select rows
with the lowest and highest values.
MANIPULATE MULTIPLE VARIABLES AT ONCE
Group Cases w
www
ww
slice_min(mtcars, mpg, prop = 0.25)
across(.cols, .funs, …, .names = NULL) Summarise
slice_head(.data, …, n, prop) and slice_tail()

w
ww
Use group_by(.data, …, .add = FALSE, .drop = TRUE) to create a or mutate multiple columns in the same way.
Select the first or last rows. summarise(mtcars, across(everything(), mean))
"grouped" copy of a table grouped by columns in ... dplyr slice_head(mtcars, n = 5)
functions will manipulate each "group" separately and combine
the results. c_across(.cols) Compute across columns in

w
ww
Logical and boolean operators to use with filter() row-wise data.
== < <= is.na() %in% | xor() transmute(rowwise(UKgas), total = sum(c_across(1:2)))

w
www
ww mtcars %>% != > >= !is.na() ! &

w
group_by(cyl) %>% MAKE NEW VARIABLES
summarise(avg = mean(mpg)) See ?base::Logic and ?Comparison for help.
Apply vectorized functions to columns. Vectorized functions take
vectors as input and return vectors of the same length as output
ARRANGE CASES (see back).
Use rowwise(.data, …) to group data into individual rows. dplyr vectorized function
arrange(.data, …, .by_group = FALSE) Order
functions will compute results for each row. Also apply functions

w
www
ww
rows by values of a column or columns (low to
to list-columns. See tidyr cheat sheet for list-column workflow. high), use with desc() to order from high to low. mutate(.data, …, .keep = "all", .before = NULL,

w
www
ww
arrange(mtcars, mpg) .after = NULL) Compute new column(s). Also
starwars %>% arrange(mtcars, desc(mpg)) add_column(), add_count(), and add_tally().

ww
www
ww
mutate(mtcars, gpm = 1 / mpg)

w
w
rowwise() %>%
mutate(film_count = length(films))
ADD CASES transmute(.data, …) Compute new column(s),

ungroup(x, …) Returns ungrouped copy of table.


add_row(.data, …, .before = NULL, .after = NULL)
w
ww drop others.
transmute(mtcars, gpm = 1 / mpg)

w
www
ww
Add one or more rows to a table.
ungroup(g_mtcars) add_row(cars, speed = 1, dist = 1) rename(.data, …) Rename columns. Use

w
wwww rename_with() to rename with a function.
rename(cars, distance = dist)

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at dplyr.tidyverse.org • dplyr 1.0.7 • Updated: 2021-07
15
Vectorized Functions Summary Functions Combine Tables
TO USE WITH MUTATE () TO USE WITH SUMMARISE () COMBINE VARIABLES COMBINE CASES
mutate() and transmute() apply vectorized summarise() applies summary functions to x y
functions to columns to create new columns. columns to create a new table. Summary A B C E F G A B C E F G A B C

Vectorized functions take vectors as input and


return vectors of the same length as output.
functions take vectors as input and return single
values as output.
a
b
c
t
u
v
1
2
3
+ a
b
d
t
u
w
3
2
1
= a
b
c
t
u
v
1
2
3
a
b
d
t
u
w
3
2
1
x
a t 1
b u 2
A B C

vectorized function summary function


bind_cols(…, .name_repair) Returns tables
placed side by side as a single table. Column
+ y
c v 3
d w 4 bind_rows(…, .id = NULL)
Returns tables one on top of the
lengths must be equal. Columns will NOT be DF A B C other as a single table. Set .id to
matched by id (to do that look at Relational Data x a t 1
a column name to add a column
OFFSET COUNT below), so be sure to check that both tables are
x
y
b
c
u
v
2
3 of the original table names (as
dplyr::lag() - offset elements by 1 dplyr::n() - number of values/rows ordered the way you want before binding. y d w 4 pictured).
dplyr::lead() - offset elements by -1 dplyr::n_distinct() - # of uniques
sum(!is.na()) - # of non-NA’s RELATIONAL DATA
CUMULATIVE AGGREGATE
POSITION Use a "Mutating Join" to join one table to Use a "Filtering Join" to filter one table against
dplyr::cumall() - cumulative all() columns from another, matching values with the the rows of another.
dplyr::cumany() - cumulative any() mean() - mean, also mean(!is.na()) rows that they correspond to. Each join retains a
cummax() - cumulative max() median() - median x y
different combination of values from the tables.
dplyr::cummean() - cumulative mean() A B C A B D

cummin() - cumulative min()


cumprod() - cumulative prod()
LOGICAL
A B C D left_join(x, y, by = NULL, copy = FALSE,
a
b
c
t
u
v
1
2
3
+ a
b
d
t
u
w
3
2
1
=
mean() - proportion of TRUE’s
cumsum() - cumulative sum() sum() - # of TRUE’s
a t 1 3
suffix = c(".x", ".y"), …, keep = FALSE,
b u 2 2
na_matched = "na") Join matching
A B C semi_join(x, y, by = NULL, copy = FALSE,
RANKING
c v 3 NA
values from y to x.
a t 1
b u 2
…, na_matches = "na") Return rows of x
ORDER that have a match in y. Use to see what
dplyr::cume_dist() - proportion of all values <= dplyr::first() - first value will be included in a join.
dplyr::dense_rank() - rank w ties = min, no gaps A B C D right_join(x, y, by = NULL, copy = FALSE,
dplyr::last() - last value a t 1 3
suffix = c(".x", ".y"), …, keep = FALSE,
dplyr::min_rank() - rank with ties = min dplyr::nth() - value in nth location of vector b u 2 2 A B C anti_join(x, y, by = NULL, copy = FALSE,
dplyr::ntile() - bins into n bins na_matches = "na") Join matching
d w NA 1
values from x to y.
c v 3
…, na_matches = "na") Return rows of x
dplyr::percent_rank() - min_rank scaled to [0,1] RANK that do not have a match in y. Use to see
dplyr::row_number() - rank with ties = "first" what will not be included in a join.
quantile() - nth quantile A B C D inner_join(x, y, by = NULL, copy = FALSE,
MATH min() - minimum value
a t 1 3
suffix = c(".x", ".y"), …, keep = FALSE,
b u 2 2
na_matches = "na") Join data. Retain Use a "Nest Join" to inner join one table to
max() - maximum value another into a nested data frame.
+, - , *, /, ^, %/%, %% - arithmetic ops only rows with matches.
log(), log2(), log10() - logs SPREAD
<, <=, >, >=, !=, == - logical comparisons
A B C y nest_join(x, y, by = NULL, copy =
A B C D full_join(x, y, by = NULL, copy = FALSE, a t 1 <tibble [1x2]>
FALSE, keep = FALSE, name =
dplyr::between() - x >= left & x <= right IQR() - Inter-Quartile Range a t 1 3
suffix = c(".x", ".y"), …, keep = FALSE, b u 2 <tibble [1x2]>
dplyr::near() - safe == for floating point numbers mad() - median absolute deviation b u 2 2 c v 3 <tibble [1x2]> NULL, …) Join data, nesting
c v 3 NA na_matches = "na") Join data. Retain all matches from y in a single new
sd() - standard deviation d w NA 1 values, all rows.
MISCELLANEOUS var() - variance data frame column.
dplyr::case_when() - multi-case if_else()
starwars %>%
mutate(type = case_when(
Row Names COLUMN MATCHING FOR JOINS SET OPERATIONS

Tidy data does not use rownames, which store a A B C intersect(x, y, …)


height > 200 | mass > 200 ~ "large", A B.x C B.y D Use by = c("col1", "col2", …) to
variable outside of the columns. To work with the
c v 3
Rows that appear in both x and y.
species == "Droid" ~ "robot",
a t 1 t 3
specify one or more common
TRUE ~ "other") rownames, first move them into a column. b u 2 u 2
columns to match on.
c v 3 NA NA
setdiff(x, y, …)
) tibble::rownames_to_column() left_join(x, y, by = "A") A B C
A B
a t 1 Rows that appear in x but not y.
dplyr::coalesce() - first non-NA values by 1 a t
C
1
A
a
B
t Move row names into col. b u 2
element across a set of vectors a <- rownames_to_column(mtcars,
A.x B.x C A.y B.y Use a named vector, by = c("col1" =
2 b u 2 b u a t 1 d w union(x, y, …)
dplyr::if_else() - element-wise if() + else() 3 c v 3 c v
var = "C") "col2"), to match on columns that A B C
b u 2 b u a t 1 Rows that appear in x or y.
dplyr::na_if() - replace specific values with NA c v 3 a t have different names in each table. b u 2
(Duplicates removed). union_all()
pmax() - element-wise max() tibble::column_to_rownames() left_join(x, y, by = c("C" = "D")) c v 3
A B C A B d w 4 retains duplicates.
pmin() - element-wise min() 1 a t t 1 a
Move col into row names.
2 b u u 2 b
column_to_rownames(a, var = "C") A1 B1 C A2 B2 Use suffix to specify the suffix to
3 c v v 3 c a t 1 d w
give to unmatched columns that Use setequal() to test whether two data sets
b u 2 b u
have the same name in both tables. contain the exact same rows (in any order).
Also tibble::has_rownames() and c v 3 a t
tibble::remove_rownames(). left_join(x, y, by = c("C" = "D"),
suffix = c("1", "2"))

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at dplyr.tidyverse.org • dplyr 1.0.7 • Updated: 2021-07
16
Data tidying with tidyr : : CHEAT SHEET
Tidy data is a way to organize tabular data in a
consistent data structure across packages. Reshape Data - Pivot data to reorganize values into a new layout. Expand
A table is tidy if:
A B C A B C
table4a Tables
country 1999 2000 country year cases pivot_longer(data, cols, names_to = "name", Create new combinations of variables or identify
& A
B
0.7K 2K
37K 80K
A
B
1999 0.7K
1999 37K
values_to = "value", values_drop_na = FALSE) implicit missing values (combinations of
C 212K 213K C 1999 212K "Lengthen" data by collapsing several columns variables not present in the data).
A 2000 2K
Each variable is in Each observation, or into two. Column names move to a new
B 2000 80K x
its own column case, is in its own row C 2000 213K names_to column and values to a new values_to x1 x2 x3 x1 x2 expand(data, …) Create a
column. A 1 3
B 1 4
A 1
A 2 new tibble with all possible
A B C A *B C pivot_longer(table4a, cols = 2:3, names_to ="year", B 2 3 B 1
B 2
combinations of the values
values_to = "cases") of the variables listed in …
Drop other variables.
table2 expand(mtcars, cyl, gear,
Access variables Preserve cases in country year type count country year cases pop pivot_wider(data, names_from = "name", carb)
as vectors vectorized operations A 1999 cases 0.7K A 1999 0.7K 19M
values_from = "value")
A 1999 pop 19M A 2000 2K 20M x
A 2000 cases 2K B 1999 37K 172M The inverse of pivot_longer(). "Widen" data by x1 x2 x3 x1 x2 x3 complete(data, …, fill =
Tibbles
A 1 3 A 1 3
A
B
2000
1999
pop 20M
cases 37K C
B 2000 80K 174M
1999 212K 1T
expanding two columns into several. One column B 1 4 A 2 NA list()) Add missing possible
B 1999 pop 172M C 2000 213K 1T provides the new column names, the other the B 2 3 B 1 4
combinations of values of
AN ENHANCED DATA FRAME
B 2 3
B 2000 cases 80K values. variables listed in … Fill
Tibbles are a table format provided B 2000 pop 174M
remaining variables with NA.
C 1999 cases 212K pivot_wider(table2, names_from = type,
by the tibble package. They inherit the complete(mtcars, cyl, gear,
C 1999 pop 1T values_from = count)
data frame class, but have improved behaviors: C 2000 cases 213K carb)
C 2000 pop 1T
• Subset a new tibble with ], a vector with [[ and $.
• No partial matching when subsetting columns.
• Display concise views of the data on one screen. Split Cells - Use these functions to split or combine cells into individual, isolated values. Handle Missing Values
options(tibble.print_max = n, tibble.print_min = m, table5 Drop or replace explicit missing values (NA).
tibble.width = Inf) Control default display settings. country century year country year unite(data, col, …, sep = "_", remove = TRUE,
x
View() or glimpse() View the entire data set.
A 19 99 A 1999
na.rm = FALSE) Collapse cells across several x1 x2 x1 x2 drop_na(data, …) Drop
A 20 00 A 2000
B 19 99 B 1999 columns into a single column. A 1 A 1
rows containing NA’s in …
CONSTRUCT A TIBBLE B NA D 3
B 20 00 B 2000
unite(table5, century, year, col = "year", sep = "") C
D
NA
3
columns.
tibble(…) Construct by columns. E NA drop_na(x, x2)
tibble(x = 1:3, y = c("a", "b", "c")) Both make table3
x
this tibble country year rate country year cases pop separate(data, col, into, sep = "[^[:alnum:]]+",
tribble(…) Construct by rows. x1 x2 x1 x2 fill(data, …, .direction =
A 1999 0.7K/19M0 A 1999 0.7K 19M remove = TRUE, convert = FALSE, extra = "warn", A 1 A 1
tribble(~x, ~y, A 2000 0.2K/20M0 A 2000 2K 20M B NA B 1 "down") Fill in NA’s in …
A tibble: 3 × 2 fill = "warn", …) Separate each cell in a column
1, "a", x y B 1999 .37K/172M B 1999 37K 172 C NA C 1
columns using the next or
<int> <chr> B 2000 .80K/174M B 2000 80K 174 into several columns. Also extract(). D 3 D 3
2, "b", 1 1 a
E NA E 3 previous value.
3, "c") 2
3
2
3
b
c
separate(table3, rate, sep = "/", fill(x, x2)
into = c("cases", "pop"))
x
as_tibble(x, …) Convert a data frame to a tibble. table3
country
A
year
1999
rate
0.7K x1 x2 x1 x2 replace_na(data, replace)
A 1 A 1
enframe(x, name = "name", value = "value") country year rate A 1999 19M
Specify a value to replace
A 1999 0.7K/19M0 A 2000 2K separate_rows(data, …, sep = "[^[:alnum:].]+", B NA B 2
Convert a named vector to a tibble. Also deframe(). C NA C 2
NA in selected columns.
A 2000 0.2K/20M0 A 2000 20M
convert = FALSE) Separate each cell in a column D 3 D 3

is_tibble(x) Test whether x is a tibble. B 1999 .37K/172M B 1999 37K


into several rows.
E NA E 2 replace_na(x, list(x2 = 2))
B 2000 .80K/174M B 1999 172M
B 2000 80K
B 2000 174M separate_rows(table3, rate, sep = "/")

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at tidyr.tidyverse.org • tibble 3.1.2 • tidyr 1.1.3 • Updated: 2021–08
17
Nested Data
A nested data frame stores individual tables as a list-column of data frames within a larger organizing data frame. List-columns can also be lists of vectors or lists of varying data types.
Use a nested data frame to:
• Preserve relationships between observations and subsets of data. Preserve the type of the variables being nested (factors and datetimes aren't coerced to character).
• Manipulate many sub-tables at once with purrr functions like map(), map2(), or pmap() or with dplyr rowwise() grouping.

CREATE NESTED DATA RESHAPE NESTED DATA TRANSFORM NESTED DATA


nest(data, …) Moves groups of cells into a list-column of a data unnest(data, cols, ..., keep_empty = FALSE) Flatten nested columns A vectorized function takes a vector, transforms each element in
frame. Use alone or with dplyr::group_by(): back to regular columns. The inverse of nest(). parallel, and returns a vector of the same length. By themselves
n_storms %>% unnest(data) vectorized functions cannot work with lists, such as list-columns.
1. Group the data frame with group_by() and use nest() to move
the groups into a list-column. unnest_longer(data, col, values_to = NULL, indices_to = NULL) dplyr::rowwise(.data, …) Group data so that each row is one
n_storms <- storms %>% Turn each element of a list-column into a row. group, and within the groups, elements of list-columns appear
group_by(name) %>% directly (accessed with [[ ), not as lists of length one. When you
nest() starwars %>% use rowwise(), dplyr functions will seem to apply functions to
select(name, films) %>% list-columns in a vectorized fashion.
2. Use nest(new_col = c(x, y)) to specify the columns to group
using dplyr::select() syntax. unnest_longer(films)
n_storms <- storms %>%
name films
nest(data = c(year:long)) data data data result
Luke The Empire Strik…
<tibble [50x4]> <tibble [50x4]> fun( <tibble [50x4]> , …) result 1
"cell" contents Luke Revenge of the S… <tibble [50x4]> <tibble [50x4]> fun( <tibble [50x4]> , …) result 2
yr lat long name films Luke Return of the Jed… <tibble [50x4]> <tibble [50x4]> fun( <tibble [50x4]> , …) result 3
name yr lat long name yr lat long 1975 27.5 -79.0 Luke <chr [5]> C-3PO The Empire Strik…
Amy 1975 27.5 -79.0 Amy 1975 27.5 -79.0 1975 28.5 -79.0 C-3PO <chr [6]> C-3PO Attack of the Cl…
Amy Amy 1975 28.5 -79.0 nested data frame 1975 29.5 -79.0
1975 28.5 -79.0 R2-D2 <chr[7]> C-3PO The Phantom M…
Amy 1975 29.5 -79.0 Amy 1975 29.5 -79.0
Bob 1979 22.0 -96.0 Bob 1979 22.0 -96.0
name data
Amy <tibble [50x3]>
yr lat long
1979 22.0 -96.0
R2-D2 The Empire Strik… Apply a function to a list-column and create a new list-column.
Bob 1979 22.5 -95.3
R2-D2 Attack of the Cl…
Bob 1979 22.5 -95.3 Bob <tibble [50x3]> 1979 22.5 -95.3
Bob 1979 23.0 -94.6 R2-D2 The Phantom M…
Bob 1979 23.0 -94.6 Zeta <tibble [50x3]> 1979 23.0 -94.6 dim() returns two
Zeta 2005 23.9 -35.6 Zeta 2005 23.9 -35.6
yr lat long
n_storms %>% values per row
Zeta 2005 24.2 -36.1 Zeta
Zeta
2005
2005
24.2
24.7
-36.1
-36.6
2005 23.9 -35.6 rowwise() %>%
Zeta 2005 24.7 -36.6
2005 24.2 -36.1 unnest_wider(data, col) Turn each element of a list-column into a mutate(n = list(dim(data))) wrap with list to tell mutate
to create a list-column
2005 24.7 -36.6
Index list-columns with [[]]. n_storms$data[[1]] regular column.
starwars %>%
CREATE TIBBLES WITH LIST-COLUMNS select(name, films) %>% Apply a function to a list-column and create a regular column.
unnest_wider(films)
tibble::tribble(…) Makes list-columns when needed.
tribble( ~max, ~seq, n_storms %>%
3, 1:3, max seq name films name ..1 ..2 ..3 rowwise() %>%
Luke <chr [5]> Luke The Empire... Revenge of... Return of... nrow() returns one
4, 1:4,
3
4
<int [3]>
<int [4]>
mutate(n = nrow(data)) integer per row
C-3PO <chr [6]> C-3PO The Empire... Attack of... The Phantom...
5, 1:5) 5 <int [5]>
R2-D2 <chr[7]> R2-D2 The Empire... Attack of... The Phantom...

tibble::tibble(…) Saves list input as list-columns.


tibble(max = c(3, 4, 5), seq = list(1:3, 1:4, 1:5)) Collapse multiple list-columns into a single list-column.
tibble::enframe(x, name="name", value="value") hoist(.data, .col, ..., .remove = TRUE) Selectively pull list components
Converts multi-level list to a tibble with list-cols. out into their own top-level columns. Uses purrr::pluck() syntax for starwars %>% append() returns a list for each
row, so col type must be list
enframe(list('3'=1:3, '4'=1:4, '5'=1:5), 'max', 'seq') selecting from lists. rowwise() %>%
mutate(transport = list(append(vehicles, starships)))
OUTPUT LIST-COLUMNS FROM OTHER FUNCTIONS starwars %>%
dplyr::mutate(), transmute(), and summarise() will output select(name, films) %>%
Apply a function to multiple list-columns.
list-columns if they return a list. hoist(films, first_film = 1, second_film = 2)
mtcars %>% starwars %>% length() returns one
integer per row
group_by(cyl) %>% name films name first_film second_film films rowwise() %>%
summarise(q = list(quantile(mpg))) Luke <chr [5]> Luke The Empire… Revenge of… <chr [3]> mutate(n_transports = length(c(vehicles, starships)))
C-3PO <chr [6]> C-3PO The Empire… Attack of… <chr [4]>
R2-D2 <chr[7]> R2-D2 The Empire… Attack of… <chr [5]>
See purrr package for more list functions.

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at tidyr.tidyverse.org • tibble 3.1.2 • tidyr 1.1.3 • Updated: 2021–08
18
Data visualization with ggplot2 : : CHEAT SHEET
Basics Geoms Use a geom function to represent data points, use the geom’s aesthetic properties to represent variables.
Each function returns a layer.
ggplot2 is based on the grammar of graphics, the idea
that you can build every graph from the same GRAPHICAL PRIMITIVES TWO VARIABLES
components: a data set, a coordinate system, a <- ggplot(economics, aes(date, unemploy)) both continuous continuous bivariate distribution
and geoms—visual marks that represent data points. b <- ggplot(seals, aes(x = long, y = lat)) e <- ggplot(mpg, aes(cty, hwy)) h <- ggplot(diamonds, aes(carat, price))

F M A a + geom_blank() and a + expand_limits() e + geom_label(aes(label = cty), nudge_x = 1, h + geom_bin2d(binwidth = c(0.25, 500))


Ensure limits include values across all plots. nudge_y = 1) - x, y, label, alpha, angle, color, x, y, alpha, color, fill, linetype, size, weight
+ = b + geom_curve(aes(yend = lat + 1,
family, fontface, hjust, lineheight, size, vjust
h + geom_density_2d()
xend = long + 1), curvature = 1) - x, xend, y, yend, e + geom_point() x, y, alpha, color, group, linetype, size
data geom coordinate plot alpha, angle, color, curvature, linetype, size
x=F·y=A system x, y, alpha, color, fill, shape, size, stroke
a + geom_path(lineend = "butt", h + geom_hex()
To display values, map variables in the data to visual linejoin = "round", linemitre = 1) e + geom_quantile() x, y, alpha, color, fill, size
properties of the geom (aesthetics) like size, color, and x x, y, alpha, color, group, linetype, size x, y, alpha, color, group, linetype, size, weight
and y locations.
a + geom_polygon(aes(alpha = 50)) - x, y, alpha, e + geom_rug(sides = “bl") continuous function
F M A color, fill, group, subgroup, linetype, size x, y, alpha, color, linetype, size i <- ggplot(economics, aes(date, unemploy))

+ = b + geom_rect(aes(xmin = long, ymin = lat,


xmax = long + 1, ymax = lat + 1)) - xmax, xmin,
e + geom_smooth(method = lm)
x, y, alpha, color, fill, group, linetype, size, weight
i + geom_area()
x, y, alpha, color, fill, linetype, size
data geom coordinate plot ymax, ymin, alpha, color, fill, linetype, size
x=F·y=A system e + geom_text(aes(label = cty), nudge_x = 1, i + geom_line()
color = F a + geom_ribbon(aes(ymin = unemploy - 900, nudge_y = 1) - x, y, label, alpha, angle, color,
size = A ymax = unemploy + 900)) - x, ymax, ymin, x, y, alpha, color, group, linetype, size
family, fontface, hjust, lineheight, size, vjust
alpha, color, fill, group, linetype, size
i + geom_step(direction = "hv")
Complete the template below to build a graph. x, y, alpha, color, group, linetype, size
required LINE SEGMENTS
ggplot (data = <DATA> ) + common aesthetics: x, y, alpha, color, linetype, size
one discrete, one continuous visualizing error
<GEOM_FUNCTION> (mapping = aes( <MAPPINGS> ), b + geom_abline(aes(intercept = 0, slope = 1)) f <- ggplot(mpg, aes(class, hwy)) df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2)
stat = <STAT> , position = <POSITION> ) + Not b + geom_hline(aes(yintercept = lat)) j <- ggplot(df, aes(grp, fit, ymin = fit - se, ymax = fit + se))
<COORDINATE_FUNCTION> + required, b + geom_vline(aes(xintercept = long))
sensible f + geom_col() j + geom_crossbar(fatten = 2) - x, y, ymax,
<FACET_FUNCTION> + defaults b + geom_segment(aes(yend = lat + 1, xend = long + 1)) x, y, alpha, color, fill, group, linetype, size ymin, alpha, color, fill, group, linetype, size
supplied b + geom_spoke(aes(angle = 1:1155, radius = 1))
<SCALE_FUNCTION> +
f + geom_boxplot() j + geom_errorbar() - x, ymax, ymin,
<THEME_FUNCTION> x, y, lower, middle, upper, ymax, ymin, alpha, alpha, color, group, linetype, size, width
color, fill, group, linetype, shape, size, weight Also geom_errorbarh().
ggplot(data = mpg, aes(x = cty, y = hwy)) Begins a plot ONE VARIABLE continuous
that you finish by adding layers to. Add one geom c <- ggplot(mpg, aes(hwy)); c2 <- ggplot(mpg) f + geom_dotplot(binaxis = "y", stackdir = “center") j + geom_linerange()
function per layer. x, y, alpha, color, fill, group x, ymin, ymax, alpha, color, group, linetype, size
c + geom_area(stat = "bin")
last_plot() Returns the last plot. x, y, alpha, color, fill, linetype, size f + geom_violin(scale = “area") j + geom_pointrange() - x, y, ymin, ymax,
x, y, alpha, color, fill, group, linetype, size, weight alpha, color, fill, group, linetype, shape, size
ggsave("plot.png", width = 5, height = 5) Saves last plot c + geom_density(kernel = "gaussian")
as 5’ x 5’ file named "plot.png" in working directory. x, y, alpha, color, fill, group, linetype, size, weight
Matches file type to file extension. both discrete maps
c + geom_dotplot()
g <- ggplot(diamonds, aes(cut, color)) data <- data.frame(murder = USArrests$Murder,
x, y, alpha, color, fill
state = tolower(rownames(USArrests)))
Aes Common aesthetic values. c + geom_freqpoly()
x, y, alpha, color, group, linetype, size
g + geom_count()
x, y, alpha, color, fill, shape, size, stroke
map <- map_data("state")
k <- ggplot(data, aes(fill = murder))
color and fill - string ("red", "#RRGGBB") k + geom_map(aes(map_id = state), map = map)
e + geom_jitter(height = 2, width = 2)
linetype - integer or string (0 = "blank", 1 = "solid", c + geom_histogram(binwidth = 5) x, y, alpha, color, fill, shape, size + expand_limits(x = map$long, y = map$lat)
2 = "dashed", 3 = "dotted", 4 = "dotdash", 5 = "longdash", x, y, alpha, color, fill, linetype, size, weight map_id, alpha, color, fill, linetype, size
6 = "twodash")
c2 + geom_qq(aes(sample = hwy))
lineend - string ("round", "butt", or "square") x, y, alpha, color, fill, linetype, size, weight THREE VARIABLES
linejoin - string ("round", "mitre", or "bevel") seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2)); l <- ggplot(seals, aes(long, lat))
size - integer (line width in mm) l + geom_contour(aes(z = z)) l + geom_raster(aes(fill = z), hjust = 0.5,
discrete x, y, z, alpha, color, group, linetype, size, weight vjust = 0.5, interpolate = FALSE)
shape - integer/shape name or d <- ggplot(mpg, aes(fl))
a single character ("a") x, y, alpha, fill
d + geom_bar() l + geom_contour_filled(aes(fill = z)) l + geom_tile(aes(fill = z))
x, alpha, color, fill, linetype, size, weight x, y, alpha, color, fill, group, linetype, size, subgroup x, y, alpha, color, fill, linetype, size, width

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at ggplot2.tidyverse.org • ggplot2 3.3.5 • Updated: 2021-08
19
Stats An alternative way to build a layer. Scales Override defaults with scales package. Coordinate Systems Faceting
A stat builds new variables to plot (e.g., count, prop). Scales map data values to the visual values of an r <- d + geom_bar() Facets divide a plot into
fl cty cyl aesthetic. To change a mapping, add a new scale. r + coord_cartesian(xlim = c(0, 5)) - xlim, ylim subplots based on the
n <- d + geom_bar(aes(fill = fl)) The default cartesian coordinate system. values of one or more

+ =
x ..count..
discrete variables.
aesthetic prepackaged scale-specific r + coord_fixed(ratio = 1/2)
scale_ to adjust scale to use arguments ratio, xlim, ylim - Cartesian coordinates with t <- ggplot(mpg, aes(cty, hwy)) + geom_point()
data stat geom coordinate plot
x=x· system n + scale_fill_manual( fixed aspect ratio between x and y units.
y = ..count.. values = c("skyblue", "royalblue", "blue", "navy"), t + facet_grid(cols = vars(fl))
Visualize a stat by changing the default stat of a geom limits = c("d", "e", "p", "r"), breaks =c("d", "e", "p", “r"), ggplot(mpg, aes(y = fl)) + geom_bar() Facet into columns based on fl.
name = "fuel", labels = c("D", "E", "P", "R")) Flip cartesian coordinates by switching
function, geom_bar(stat="count") or by using a stat
x and y aesthetic mappings. t + facet_grid(rows = vars(year))
function, stat_count(geom="bar"), which calls a default range of title to use in labels to use breaks to use in
values to include legend/axis in legend/axis legend/axis Facet into rows based on year.
geom to make a layer (equivalent to a geom function). in mapping
Use ..name.. syntax to map stat variables to aesthetics. r + coord_polar(theta = "x", direction=1)
theta, start, direction - Polar coordinates. t + facet_grid(rows = vars(year), cols = vars(fl))
GENERAL PURPOSE SCALES Facet into both rows and columns.
geom to use stat function geommappings r + coord_trans(y = “sqrt") - x, y, xlim, ylim t + facet_wrap(vars(fl))
Use with most aesthetics Transformed cartesian coordinates. Set xtrans
i + stat_density_2d(aes(fill = ..level..), Wrap facets into a rectangular layout.
scale_*_continuous() - Map cont’ values to visual ones. and ytrans to the name of a window function.
geom = "polygon")
variable created by stat scale_*_discrete() - Map discrete values to visual ones. Set scales to let axis limits vary across facets.
scale_*_binned() - Map continuous values to discrete bins. π + coord_quickmap()
60
π + coord_map(projection = "ortho", orientation t + facet_grid(rows = vars(drv), cols = vars(fl),
c + stat_bin(binwidth = 1, boundary = 10) scale_*_identity() - Use data values as visual ones. = c(41, -74, 0)) - projection, xlim, ylim scales = "free")

lat
x, y | ..count.., ..ncount.., ..density.., ..ndensity.. scale_*_manual(values = c()) - Map discrete values to Map projections from the mapproj package x and y axis limits adjust to individual facets:
manually chosen visual ones.
c + stat_count(width = 1) x, y | ..count.., ..prop.. long
(mercator (default), azequalarea, lagrange, etc.). "free_x" - x axis limits adjust
scale_*_date(date_labels = "%m/%d"), "free_y" - y axis limits adjust
c + stat_density(adjust = 1, kernel = "gaussian") date_breaks = "2 weeks") - Treat data values as dates.
x, y | ..count.., ..density.., ..scaled..
e + stat_bin_2d(bins = 30, drop = T)
scale_*_datetime() - Treat data values as date times.
Same as scale_*_date(). See ?strptime for label formats.
Position Adjustments Set labeller to adjust facet label:
t + facet_grid(cols = vars(fl), labeller = label_both)
x, y, fill | ..count.., ..density.. Position adjustments determine how to arrange geoms fl: c fl: d fl: e fl: p fl: r
X & Y LOCATION SCALES that would otherwise occupy the same space.
e + stat_bin_hex(bins = 30) x, y, fill | ..count.., ..density.. t + facet_grid(rows = vars(fl),
Use with x or y aesthetics (x shown here) s <- ggplot(mpg, aes(fl, fill = drv)) labeller = label_bquote(alpha ^ .(fl)))
e + stat_density_2d(contour = TRUE, n = 100)
x, y, color, size | ..level.. scale_x_log10() - Plot x on log10 scale. ↵c ↵d ↵e ↵p ↵r
scale_x_reverse() - Reverse the direction of the x axis. s + geom_bar(position = "dodge")
e + stat_ellipse(level = 0.95, segments = 51, type = "t") scale_x_sqrt() - Plot x on square root scale. Arrange elements side by side.
l + stat_contour(aes(z = z)) x, y, z, order | ..level..
l + stat_summary_hex(aes(z = z), bins = 30, fun = max) COLOR AND FILL SCALES (DISCRETE)
s + geom_bar(position = "fill")
Stack elements on top of one
Labels and Legends
x, y, z, fill | ..value.. another, normalize height. Use labs() to label the elements of your plot.
n + scale_fill_brewer(palette = "Blues")
l + stat_summary_2d(aes(z = z), bins = 30, fun = mean) For palette choices: e + geom_point(position = "jitter") t + labs(x = "New x axis label", y = "New y axis label",
x, y, z, fill | ..value.. RColorBrewer::display.brewer.all() Add random noise to X and Y position of title ="Add a title above the plot",
each element to avoid overplotting. subtitle = "Add a subtitle below title",
f + stat_boxplot(coef = 1.5) n + scale_fill_grey(start = 0.2, A caption = "Add a caption below plot",
x, y | ..lower.., ..middle.., ..upper.., ..width.. , ..ymin.., ..ymax.. end = 0.8, na.value = "red") e + geom_label(position = "nudge") alt = "Add alt text to the plot",
B
Nudge labels away from points. <aes> = "New <aes>
<AES> <AES> legend title")
f + stat_ydensity(kernel = "gaussian", scale = "area") x, y
| ..density.., ..scaled.., ..count.., ..n.., ..violinwidth.., ..width.. COLOR AND FILL SCALES (CONTINUOUS) s + geom_bar(position = "stack") t + annotate(geom = "text", x = 8, y = 9, label = “A")
Stack elements on top of one another. Places a geom with manually selected aesthetics.
e + stat_ecdf(n = 40) x, y | ..x.., ..y.. o <- c + geom_dotplot(aes(fill = ..x..))
e + stat_quantile(quantiles = c(0.1, 0.9), Each position adjustment can be recast as a function p + guides(x = guide_axis(n.dodge = 2)) Avoid crowded
o + scale_fill_distiller(palette = “Blues”) with manual width and height arguments: or overlapping labels with guide_axis(n.dodge or angle).
formula = y ~ log(x), method = "rq") x, y | ..quantile..
s + geom_bar(position = position_dodge(width = 1)) n + guides(fill = “none") Set legend type for each
e + stat_smooth(method = "lm", formula = y ~ x, se = T, o + scale_fill_gradient(low="red", high=“yellow") aesthetic: colorbar, legend, or none (no legend).
level = 0.95) x, y | ..se.., ..x.., ..y.., ..ymin.., ..ymax..
ggplot() + xlim(-5, 5) + stat_function(fun = dnorm,
o + scale_fill_gradient2(low = "red", high = “blue”,
mid = "white", midpoint = 25) Themes n + theme(legend.position = "bottom")
Place legend at "bottom", "top", "left", or “right”.
n = 20, geom = “point”) x | ..x.., ..y.. n + scale_fill_discrete(name = "Title",
ggplot() + stat_qq(aes(sample = 1:100)) o + scale_fill_gradientn(colors = topo.colors(6)) r + theme_bw() r + theme_classic() labels = c("A", "B", "C", "D", "E"))
x, y, sample | ..sample.., ..theoretical.. Also: rainbow(), heat.colors(), terrain.colors(), White background Set legend title and labels with a scale function.
cm.colors(), RColorBrewer::brewer.pal() with grid lines. r + theme_light()
e + stat_sum() x, y, size | ..n.., ..prop..
e + stat_summary(fun.data = "mean_cl_boot")
h + stat_summary_bin(fun = "mean", geom = "bar")
SHAPE AND SIZE SCALES
r + theme_gray()
Grey background
r + theme_linedraw()
r + theme_minimal()
Zooming
p <- e + geom_point(aes(shape = fl, size = cyl)) (default theme). Minimal theme. Without clipping (preferred):
e + stat_identity() p + scale_shape() + scale_size() r + theme_dark() r + theme_void() t + coord_cartesian(xlim = c(0, 100), ylim = c(10, 20))
e + stat_unique() p + scale_shape_manual(values = c(3:7)) Dark for contrast. Empty theme.
With clipping (removes unseen data points):
r + theme() Customize aspects of the theme such
as axis, legend, panel, and facet properties. t + xlim(0, 100) + ylim(10, 20)
p + scale_radius(range = c(1,6))
p + scale_size_area(max_size = 6) r + ggtitle(“Title”) + theme(plot.title.postion = “plot”) t + scale_x_continuous(limits = c(0, 100)) +
r + theme(panel.background = element_rect(fill = “blue”)) scale_y_continuous(limits = c(0, 100))

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at ggplot2.tidyverse.org • ggplot2 3.3.5 • Updated: 2021-08
20
How Big is Your Your graphics device Your plot margins Getting a square graph

Graph? dev.size() (width, height)


par("din") (r.o.) (width, height) in inches
par("mai") (bottom, left, top, right) in inches
par("mar") (bottom, left, top, right) in lines
par("pty")

An R Cheat Sheet Both the dev.size function and the din Margins provide you space for your axes, axis,
You can produce a square graph manually
by setting the width and height to the same
argument of par will tell you the size of the labels, and titles. value and setting the margins so that the
Introduction graphics device. The dev.size function will sum of the top and bottom margins equal
report the size in A "line" is the amount of vertical space needed the sum of the left and right margins. But a
All functions that open a device for graphics for a line of text. much easier way is to specify pty="s",
1. inches (units="in"), the default
will have height and width arguments to 2. centimeters (units="cm") which adjusts the margins so that the size
control the size of the graph and a If your graph has no axes or titles, you can of the plotting region is always square, even
3. pixels (units="px") remove the margins (and maximize the
pointsize argument to control the relative if you resize the graphics window.
font size. In knitr, you control the size of Like several other par arguments, din is plotting region) with
the graph with the chunk options, fig.width read only (r.o.) meaning that you can ask par(mar=rep(0,4)) Converting units
and fig.height. This sheet will help you with its current value (par("din")) but you
calculating the size of the graph and cannot change it (par(din=c(5,7)) will fail).
For many applications, you need to be able
various parts of the graph within R. Your plotting region to translate user coordinates to pixels or
inches. There are some cryptic shortcuts,
par("pin") (width, height) in inches but the simplest way is to get the range in
par("plt") (left, right, bottom, top) in pct user coordinates and measure the
proportion of the graphics device devoted to
The pin argument par gives you the size the plotting region.
of the plotting region (the size of the device
minus the size of the margins) in inches. user.range <- par("usr")[c(2,4)] -
par("usr")[c(1,3)]
The plt argument gives you the percentage
of the device from the left/bottom edge up region.pct <- par("plt")[c(2,4)] -
to the left edge of the plotting region, the par("plt")[c(1,3)]
right edge, the bottom edge, and the top
edge. The first and third values are region.px <-
equivalent to the percentage of space dev.size(units="px") * region.pct
devoted to the left and bottom margins.
Subtract the second and fourth values from px.per.xy <- region.px / user.range
1 to get the percentage of space devoted to
the right and top margins. To convert a horizontal or distance from the
x-coordinate value to pixels, multiply by
Your x-y coordinates px.per.xy[1]. To convert a vertical distance,
multiply by region.px.per.xy[2]. To convert
par("usr") (xmin, ymin, xmax, ymax) a diagonal distance, you need to invoke
Pyhthagoras.
Your x-y coordinates are the values you use
when plotting your data. This normally is not a.px <- x.dist*px.per.xy[1]
the same as the values you specified with the b.px <- y.dist*px.per.xy[2]
xlim and ylim arguments in plot. By default, R c.px <- sqrt(a.px^2+b.px^2)
adds an extra 4% to the plotting range (see
the dark green region on the figure) so that To rotate a string to match the slope of a line
points right up on the edges of your plot do segment, you need to convert the distances
not get partially clipped. You can override to pixels, calculate the arctangent, and
this by setting xaxs="i" and/or the yaxs="i" in convert from radians to degrees.
par.
segments(x0, y0, x1, y1)
Run par("usr") to find the minimum X value, delta.x <- (x1 – x0) * px.per.xy[1]
the maximum X value, the minimum Y value, delta.y <- (y1 – y0) * px.per.xy[y]
and the maximum Y value. If you assign new angle.radians <- atan2(delta.y, delta.x)
values to usr, you will update the x-y angle.degrees <- angle.radians * 180 / pi
coordinates to the new values. text(x1, y1, "TEXT", srt=angle.degrees)

CC BY Steve Simon, P.Mean Consulting • [email protected] • https://www.rstudio.com/resources/cheatsheets/ Learn more at blog.pmean.com/cheatsheets • Updated: August 2017

21
Panels Character and string sizes If your fonts are too big or too small
par("fig") (width, height) in pct strheight() par("cin") (r.o.) (width, height) in inches Fixing this takes a bit of trial and error.
par("fin") (width, height) in inches par("csi") (r.o.) height in inches
The strheight functions will tell you the height 1. Specify a larger/smaller value for the
par("cra") (r.o.) (width, height) in pixels
If you display multiple plots within a single of a specified string in inches pointsize argument when you open your
par("cxy") (r.o.) (width, height) in xy
graphics window (e.g., with the mfrow (units="inches"), x-y user coordinates graphics device.
coordinates
or mfcol arguments of par or with the layout (units="user") or as a percentage of the
graphics device (units="figure"). 2. Trying opening your graphics device with
function), then the fig and fin arguments will The single value returned by the csi
different values for height and width. Fonts
tell you the size of the current subplot window argument of par gives you the height of a
For a single line of text, strheight will give you that look too big might be better
in percent or inches, respectively. line of text in inches. The second of the two
the height of the letter "M". If you have a string proportioned in a larger graphics window.
values returned by cin, cra, and cxy gives
par("oma") (bottom, left, top, right) in lines with one of more linebreaks ("\n"), the you the height of a line, in inches, pixels, or
strheight function will measure the height of 3. Use the cex argument to increase or
par("omd") (bottom, left, top, right) in pct xy (user) coordinates.
the letter "M" plus the height of one or more decrease the relative size of your fonts.
par("omi") (bottom, left, top, right) in inches
additional lines. The height of a line is The first of the two values returned by the
Each subplot will have margins specified dependent on the line spacing, set by the cin, cra, and cxy arguments to par gives
by mai or mar, but no outer margin around the lheight argument of par. The default line you the approximate width of a single
If your axes don’t fit
entire set of plots, unless you specify them height (lheight=1), corresponding to single character, in inches, pixels, or xy (user)
using oma, omd, or omi. You can place text spaced lines, produces a line height roughly There are several possible solutions.
coordinates. The width, very slightly smaller
in the outer margins using the mtext function 1.5 times the height of "M". than the actual width of the letter "W", is a
with the argument outer=TRUE. 1. You can assign wider margins using the
rough estimate at best and ignores the mar or mai argument in par.
strwidth() variable with of individual letters.
par("mfg") (r, c) or (r, c, maxr, maxc)
The strwidth function will produce different 2. You can change the orientation of the
These values are useful, however, in axis labels with las. Choose among
The mfg argument of par will allow you to widths to individual characters, representing providing fast ratios of the relative sizes of a. las=0 both axis labels parallel
jump to a subplot in a particular row and the proportional spacing used by most fonts (a the differing units of measure
column. If you query with par("mfg"), you will "W" using much more space than an "i"). For b. las=1 both axis labels horizontal
get the current row and column followed by the width of a string, the strwidth function will c. las=2 both axis labels perpendicular
px.per.in <- par("cra") / par("cin")
the maximum row and column. sum up the lengths of the individual characters d. las=3 both axis labels vertical.
px.per.xy <- par("cra") / par("cxy")
in the string. xy.per.in <- par("cxy") / par("cin")

3. change the relative size of the font


a. cex.axis for the tick mark labels.
b. cex.lab for xlab and ylab.
c. cex.main for the main title
d. cex.sub for the subtitle.

CC BY Steve Simon, P.Mean Consulting • [email protected] • https://www.rstudio.com/resources/cheatsheets/ Learn more at blog.pmean.com/cheatsheets • Updated: August 2017

22
R Syntax Comparison : : CHEAT SHEET
Dollar sign syntax Formula syntax Tidyverse syntax
goal(data$x, data$y) goal(y~x|z, data=data, group=w) data %>% goal(x)
SUMMARY STATISTICS: SUMMARY STATISTICS: SUMMARY STATISTICS:
one continuous variable: one continuous variable: one continuous variable:
mean(mtcars$mpg) mosaic::mean(~mpg, data=mtcars) mtcars %>% dplyr::summarize(mean(mpg))

one categorical variable: one categorical variable: one categorical variable:


table(mtcars$cyl) mosaic::tally(~cyl, data=mtcars) mtcars %>% dplyr::group_by(cyl) %>%
dplyr::summarize(n()) the pipe
two categorical variables: two categorical variables:
table(mtcars$cyl, mtcars$am) mosaic::tally(cyl~am, data=mtcars) two categorical variables:
mtcars %>% dplyr::group_by(cyl, am) %>%
one continuous, one categorical: one continuous, one categorical: dplyr::summarize(n())
mean(mtcars$mpg[mtcars$cyl==4]) mosaic::mean(mpg~cyl, data=mtcars)
mean(mtcars$mpg[mtcars$cyl==6]) one continuous, one categorical:
mean(mtcars$mpg[mtcars$cyl==8]) tilde mtcars %>% dplyr::group_by(cyl) %>%
dplyr::summarize(mean(mpg))

PLOTTING: PLOTTING: PLOTTING:


one continuous variable: one continuous variable: one continuous variable:
hist(mtcars$disp) lattice::histogram(~disp, data=mtcars) ggplot2::qplot(x=mpg, data=mtcars, geom = "histogram")

boxplot(mtcars$disp) lattice::bwplot(~disp, data=mtcars) ggplot2::qplot(y=disp, x=1, data=mtcars, geom="boxplot")

one categorical variable: one categorical variable: one categorical variable:


barplot(table(mtcars$cyl)) mosaic::bargraph(~cyl, data=mtcars) ggplot2::qplot(x=cyl, data=mtcars, geom="bar")

two continuous variables: two continuous variables: two continuous variables:


plot(mtcars$disp, mtcars$mpg) lattice::xyplot(mpg~disp, data=mtcars) ggplot2::qplot(x=disp, y=mpg, data=mtcars, geom="point")

two categorical variables: two categorical variables: two categorical variables:


mosaicplot(table(mtcars$am, mtcars$cyl)) mosaic::bargraph(~am, data=mtcars, group=cyl) ggplot2::qplot(x=factor(cyl), data=mtcars, geom="bar") +
facet_grid(.~am)
one continuous, one categorical: one continuous, one categorical:
histogram(mtcars$disp[mtcars$cyl==4]) lattice::histogram(~disp|cyl, data=mtcars) one continuous, one categorical:
histogram(mtcars$disp[mtcars$cyl==6]) ggplot2::qplot(x=disp, data=mtcars, geom = "histogram") +
histogram(mtcars$disp[mtcars$cyl==8]) lattice::bwplot(cyl~disp, data=mtcars) facet_grid(.~cyl)

boxplot(mtcars$disp[mtcars$cyl==4]) ggplot2::qplot(y=disp, x=factor(cyl), data=mtcars,


boxplot(mtcars$disp[mtcars$cyl==6])
boxplot(mtcars$disp[mtcars$cyl==8]) The variety of R syntaxes give geom="boxplot")

WRANGLING: you many ways to “say” the WRANGLING:


subsetting:
mtcars[mtcars$mpg>30, ]
same thing subsetting:
mtcars %>% dplyr::filter(mpg>30)

making a new variable: making a new variable:


read across the cheatsheet to see how different
mtcars$efficient[mtcars$mpg>30] <- TRUE mtcars <- mtcars %>%
syntaxes approach the same problem
mtcars$efficient[mtcars$mpg<30] <- FALSE dplyr::mutate(efficient = if_else(mpg>30, TRUE, FALSE))
RStudio® is a trademark of RStudio, Inc.CC BY Amelia McNamara • [email protected] • @AmeliaMN • science.smith.edu/~amcnamara/ • Updated: 2018-01 23
R Syntax Comparison : : CHEAT SHEET
Syntax is the set of rules that govern what code works and
doesn’t work in a programming language. Most programming
Even more ways to say the same thing
Even within one syntax, there are often variations that are equally valid. As a case study, let’s look at the ggplot2
languages offer one standardized syntax, but R allows package syntax. ggplot2 is the plotting package that lives within the tidyverse. If you read down this column, all the code
developers to specify their own syntax. As a result, there is a large
variety of (equally valid) R syntaxes. here produces the same graphic.

The three most prevalent R syntaxes are: quickplot


1. The dollar sign syntax, sometimes called base R

syntax that look different but produce the same graphic


syntax, expected by most base R functions. It is qplot() stands for quickplot, and allows you to make quick plots. It doesn’t have the full power of ggplot2,

read down this column for many pieces of code in one


characterized by the use of dataset$variablename, and and it uses a slightly different syntax than the rest of the package.
is also associated with square bracket subsetting, as in
dataset[1,2]. Almost all R functions will accept things ggplot2::qplot(x=disp, y=mpg, data=mtcars, geom="point")
passed to them in dollar sign syntax.
2. The formula syntax, used by modeling functions like
lm(), lattice graphics, and mosaic summary statistics. It ggplot2::qplot(x=disp, y=mpg, data=mtcars) !
uses the tilde (~) to connect a response variable and one (or
many) predictors. Many base R functions will accept formula
syntax. ggplot2::qplot(disp, mpg, data=mtcars) ! !
3. The tidyverse syntax used by dplyr, tidyr, and
more. These functions expect data to be the first argument,
which allows them to work with the “pipe” (%>%) from the
magrittr package. Typically, ggplot2 is thought of as part ggplot
of the tidyverse, although it has its own flavor of the syntax
using plus signs (+) to string pieces together. ggplot2 author To unlock the power of ggplot2, you need to use the ggplot() function (which sets up a plotting region) and
Hadley Wickham has said the package would have had add geoms to the plot.
different syntax if he had written it after learning about the
pipe.
ggplot2::ggplot(mtcars) +
Educators often try to teach within one unified syntax, but most R geom_point(aes(x=disp, y=mpg))
programmers use some combination of all the syntaxes.
ggplot2::ggplot(data=mtcars) + plus adds
geom_point(mapping=aes(x=disp, y=mpg)) layers

Internet research tip: ggplot2::ggplot(mtcars, aes(x=disp, y=mpg)) +


geom_point()
If you are searching on google, StackOverflow, or
another favorite online source and see code in a syntax
you don’t recognize: ggplot2::ggplot(mtcars, aes(x=disp)) +
• Check to see if the code is using one of the three geom_point(aes(y=mpg))
common syntaxes listed on this cheatsheet
• Try your search again, using a keyword from the ggformula
syntax name (“tidyverse”) or a relevant package
(“mosaic”) The “third and a half way” to use the formula syntax, but get ggplot2-style graphics

ggformula::gf_point(mpg~disp, data= mtcars)

! Sometimes particular syntaxes work, but are considered formulas in base plots
dangerous to use, because they are so easy to get wrong. For Base R plots will also take the formula syntax, although it's not as commonly used
example, passing variable names without assigning them to a
named argument. plot(mpg~disp, data=mtcars)

RStudio® is a trademark of RStudio, Inc. • CC BY Amelia McNamara • [email protected] • @AmeliaMN • science.smith.edu/~amcnamara/ • Updated: 2018-01 24
Tabular reporting with flextable : : CHEAT SHEET
Basics Format BORDER
brdr <- fp_border(color = "#eb5555", width = 1.5)
Officer
The flextable package provides a framework GENERAL ft <- flextable(data)
for easily create tables for reporting and border_outer(ft, border = brdr)
get_flextable_defaults() : get flextable defaults fp_text() : Text formatting properties
publications. formatting properties color, font.size, bold, italic, underlined, font.family,
Functions are provided to let users create border_inner(ft, border = brdr)
set_flextable_defaults() : modify flextable vertical.align, shading.color
tables, modify, format and define their
defaults formatting properties border_inner_v(ft, border = brdr) fp_par() : Paragraph formatting properties
content.
init_flextable_defaults() : re-init all values text.align, padding, line_spacing, border,
flextable() with the package defaults border_inner_h(ft, border = brdr) shading.color, padding.bottom, padding.top,
padding.left, padding.right, border.bottom,
style(pr_t, pr_p, pr_c) : modify flextable text, ICONS border.left, border.top, border.right
border_remove(ft)
paragraphs and cells formatting properties fp_cell() : Cell formatting properties
(needs officer package)
vline_left(ft, border = brdr) border, border.bottom, border.left, border.top,
pr_t: object of class fp_text border.right, vertical.align, margin, margin.bottom,
data.frame flextable
pr_p object of class fp_par vline_right(ft, border = brdr) margin.top, margin.left, margin.right,
pr_c: object of class fp_cell background.color, text.direction
GENERAL FUNCTION’S STRUCTURE hline_top(ft, border = brdr) fp_border(): border properties object
TEXT
color, style, width
flextable object flextable part Abc font(ft, fontname = "Brush Script MT") hline_bottom(ft, border = brdr) update(x, args): update an object of class fp_*
Abc fontsize(ft, size = 7)
vline(ft, j=1:2, border = brdr)
italic(ft, italic = TRUE)
function(x, i, j, part, args) Abc
Abc bold(ft, bold = TRUE)
hline(ft, i = 1:2, border = brdr)
row & column Abc color(ft, color = "#eb5555")
selectors specific highlight(ft, color = "yellow")
Abc
arguments
Layout
Abc

rotate(ft, rotation = "tbrl") CELL MERGING


TABLE PARTS AND THEIRS DEFAULT VALUES HEADER AND FOOTER
CELL COLWIDTHS
header : colnames 1, 2, 3, 4
add_header_row(ft, values = c("a",
Abc align(ft, align = "center")
"b", "c"), colwidths = c(1, 1, 1), top =
body : data all Abc valign(ft, valign = "top") FALSE)
merge_none(ft)
add_footer_row(ft, values = c("", "",
footer : empty Abc padding(ft, padding = 10) "), colwidths = c(1, 1, 1)) 1: merge_at(ft, i = 1:2, j = 1:2)
IN LINE 2: merge_h(ft)
Abc bg(ft, bg = "#475f77")
Selectors
add_header_lines(ft, values = "line", 3: merge_v(ft)
line_spacing(ft, space = 1.6) top = FALSE) 4: merge_h_range(ft, i = ~ C %in% "0", j1 =
Abc
i: row selector "A", j2 = "B")
j: column selector THEME ft <- theme_*(ft) add_footer_lines(ft, values = "line") fix_border_issues(ft): fix border issues when
cell are merged
FORMULA CHARACTER VECTOR COLNAME
i = ~ col %in% "xxx" j = c("col1", "col2") add_header(ft, A = "a", B = "b", top = CAPTIONS & FOOTNOTES
col : column name col* : column name FALSE)
xxx : value set_caption(ft, caption = "my
INTEGER VECTOR alafoli(ft) booktabs(ft) box(ft) tron(ft)
j = ~ col1 + col2 add_body(ft, A = "a", B = "b", C = "") caption“)
i = 1:3, j = 1:3
col* : column name
add_footer(ft, A = "" , B = "")
LOGICAL VECTOR footnote(ft, j = 1, value =
GENERAL as_paragraph(c("footnote 1")),
i = c(TRUE, FALSE) , j = c(TRUE, FALSE) tron_legacy(ft) vader(ft) vanilla(ft) zebra(ft) set_header_labels(ft, A = "Aaa", B = ref_symbols = c("1"), part =
"Bbb", C = "Ccc") "header")
delete_part( ft, part = "body")
ArData. • ardata.fr • Learn more at ardata-fr.github.io/flextable-book/ • package version 0.6.4 • Updated: 2021-03
25
Tabular reporting with flextable : : CHEAT SHEET
Table size Cell content
ft <- flextable(data) ncol_keys(ft): 3 SIMPLE FORMATTING
flextable_dim(ft):
$widths
heights

[1] 2.25
$heights nrow_part(ft,
[1] 1.75 part = "body"):
$aspect_ratio 6
[1] 0.78
widths
width: 0.75
dim(ft):
$widths
A B C height: 0.25
0.75 0.75 0.75
$heights
[1] 0.25 0.25 0.25
0.25 0.25 0.25 0.25
dim_pretty(ft): MULTI CONTENT
$widths
FUNCTION COMPOSE ft <- flextable(data)
[1] 0.22 0.22 0.22 ft <- compose(ft, value = as_paragraph(
$heights compose(x, i, j, value = …, part, use_dot)
[1] 0.22 0.22 0.22 0.22 0.22 0.22 0.22 as_chunk("chunk"), chunk
autofit(ft, add_w = w, add_h = h) as_bracket("bracket") (bracket)
as_paragraph( Chunk 1 Chunk 2 Image 1 ) as_b("bold"), bold
w = 0, h = 0 w = 0.2, h = 0 highlight
as_hithlight("highlight", color = "yellow")
width: 0.22 width: 0.42 as_chunk(props), as_sub(), italic
as_i("italic"),
as_bracket(), as_sup(),
as_b(), colorize(), as_sub("sub"), sub
height: 0.22
as_highlight(), hyperlink_text(), as_sup("sup"), sup

as_i(), as_image() colorize("colorize", color = "#eb5555"), colorize


hyperlink_text(hyperlink, url = "http://link"), hyperlink
use_dot(): by default use_dot=FALSE; if use_dot=TRUE, value is
evaluated within a data.frame augmented of a column named . as_image(src, width = 0.2, height = 0.2)))
containing the jth column
width(ft, i = 1, width = 0.5) ft <- hrule(ft, rule = "exact",

width: 0.5 width: 0.75


part = "header")
height(ft, height = 0.40, part
Rendering INTERACTIVE SESSION
print(ft, preview = "docx")
WITH OFFICER
ph_with(ppt, value = ft) (PowerPoint)
= "header") flextable default format is HTML output ppt : an rpptx object
printed in the rstudio viewer pane. print(ft, preview = "pptx")
height: 0.4 body_add_flextable(value = ft) (Word)
flextable objects can be rendered in HTML RMARKDOWN DOCUMENTS
height: 0.25 format, Microsoft Word, Microsoft ```{r} IN SHINY
PowerPoint and PDF. library(flextable) library(shiny)
ft <- flextable(ft) library(flextable)
ft <- height(ft, i = 1, height = 0.40, part = "body") ft ft <- flextable(data)
ft <- height(ft, i = 4, height = 0.30, part = "body") ``` # In UI
LOOPING IN RMARKDOWN WITH FOR uiOutput("ft")
ft <- hrule(ft, rule = “auto", part = "header") SIMPLE EXPORT
ft <- hrule(ft, i = 1, rule = "exact", part = save_as_html(ft, "ft.html") flextable_to_rmd(ft)
# In server
"body"): size exactly at 0.4 save_as_docx(ft, "ft.docx") output$ft <- renderUI({
ft <- hrule(ft, i = 4, rule = "atleast", part = save_as_pptx(ft, "ft.pptx") htmltools_value(ft)
"body"): size atleast at 0.3 save_as_image(ft, "ft.png") })

ArData. • ardata.fr • Learn more at ardata-fr.github.io/flextable-book/ • package version 0.6.4 • Updated: 2021-03
26

You might also like