0% found this document useful (0 votes)
14 views

R Code Cheat Sheet

The document provides R code for common probability distributions and statistical analyses. It lists functions for the binomial, hypergeometric, Poisson, exponential, and normal distributions to calculate probabilities. It also provides functions for descriptive statistics, histograms, boxplots, scatterplots, correlation, regression, and checking regression assumptions with plots.

Uploaded by

Ali Bissenov
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

R Code Cheat Sheet

The document provides R code for common probability distributions and statistical analyses. It lists functions for the binomial, hypergeometric, Poisson, exponential, and normal distributions to calculate probabilities. It also provides functions for descriptive statistics, histograms, boxplots, scatterplots, correlation, regression, and checking regression assumptions with plots.

Uploaded by

Ali Bissenov
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

R Code

The following is a brief guide describing which functions do what. Code related to Unit 2 and Unit 3
is listed below!

2.2 Code
Binomial
For 𝑿 following a binomial distribution with 𝑛 trials and probability of success 𝑝.
To find 𝑷(𝑿 = 𝒂):
dbinom(a, size = n, prob = p)

To find 𝑷(𝑿 ≤ 𝒂):


pbinom(a, size = n, prob = p)

To find 𝑷(𝒂 ≤ 𝑿 ≤ 𝒃):


pbinom(b, size = n, prob = p) - pbinom(a-1, size = n, prob = p)

Hypergeometric
For 𝑿 following a hypergeometric distribution with 𝑚 successes in the population, 𝑛 failures in the
population, and a sample of 𝑘 observations from the population.
To find 𝑷(𝑿 = 𝒂):
dhyper(a, m, n, k)

To find 𝑷(𝑿 ≤ 𝒂):


phyper(a, m, n, k)

To find 𝑷(𝒂 ≤ 𝑿 ≤ 𝒃):


phyper(b, m, n, k) - phyper(a-1, m, n, k)

Poisson
For 𝑿 following a Poisson distribution with a rate parameter 𝜆.
To find 𝑷(𝑿 = 𝒂):
dpois(a, lambda = lambda)

To find 𝑷(𝑿 ≤ 𝒂):


ppois(a, lambda = lambda)

To find 𝑷(𝒂 ≤ 𝑿 ≤ 𝒃):


ppois(b, lambda = lambda) - ppois(a-1, lambda = lambda)
2.3 Code
Exponential
For 𝑿 following an exponential distribution with parameter 𝛽.
To find 𝑷(𝑿 ≤ 𝒂):
pexp(a, rate = 1/beta)

To find 𝑷(𝒂 ≤ 𝑿 ≤ 𝒃):


pexp(b, rate = 1/beta) - pexp(a, rate = 1/beta)

To find 𝒙𝟎 such that 𝑷(𝑿 ≤ 𝒙𝟎 ) = 𝒑:


qexp(p, rate = 1/beta)

Normal
For 𝑿 following a normal distribution with mean 𝜇 and standard deviation 𝜎.
To find 𝑷(𝑿 ≤ 𝒂):
pnorm(a, mean = mu, sd = sigma)

To find 𝑷(𝒂 ≤ 𝑿 ≤ 𝒃):


pnorm(b, mean = mu, sd = sigma) - pnorm(a, mean = mu, sd = sigma)

To find 𝒙𝟎 such that 𝑷(𝑿 ≤ 𝒙𝟎 ) = 𝒑:


qnorm(p, mean = mu, sd = sigma)
3.1 Code
Consider a vector of data named dataset.

EDA Summary Values


Mean: mean(dataset)
Median: median(dataset)
Variance: var(dataset)
Standard Deviation: sd(dataset)
1st Quartile (Q1): quantile(dataset, 0.25)
3rd Quartile (Q3): quantile(dataset, 0.75)
pth percentile: quantile(dataset, p) [e.g., 80th percentile: quantile(dataset, 0.80)]

EDA Graphics and Plots


Histogram: hist(dataset)
Boxplot: boxplot(dataset)

3.2 Code
Consider a set of data named dataset with columns named X and Y. Note that in order to use the
column names in any R function, you must attach the data using attach(dataset).

Create a Scatterplot: plot(X, Y) [X will be on the x-axis, Y will be on the y-axis]


Correlation: cor(Y, X) or cor(X, Y) [both will produce the same value]
Coefficient of Determination: cor(Y, X)^2 or cor(X, Y)^2
Regression: lm(Y ~ X) where Y is the response variable and X is the explanatory variable

Plotting with Regression


Note that it is often easiest to give your lm() function a name to refer to when you plot certain
things. So let’s start by giving our model the name “fit”:
fit = lm(Y ~ X)

A Scatterplot with the Regression Line Plotted:


plot(Y~X)
abline(fit)

Checking Model Assumptions with Plots:


plot(fit)

This will open a graphics window that is blank; click in the window to bring up the first of four plots.
The plots will advance with each click in the window. The first plot is your “Residuals vs. Fitted”
plot (used to check homoscedasticity). The second plot is your “Normal Q-Q” plot (used to check
normality).The third (Scale-Location) and fourth (Residuals vs. Leverage) plots will not be
examined in this course.

You might also like