0% found this document useful (0 votes)
14 views7 pages

STAT 359 R Commands Study Guide

STAT359StudyGuide

Uploaded by

nilsdmikkelsen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views7 pages

STAT 359 R Commands Study Guide

STAT359StudyGuide

Uploaded by

nilsdmikkelsen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STAT 359 Study Guide

Nils Dosaj Mikkelsen


December 16, 2020

1 Slide 1 - Review
1. Here’s a list of some basic R commands:

(a) [Link]: reads tabular data into R (txt,pdf,prn etc.)


(b) names(dataframe): lists all column names for data frame
(c) attach(dataframe): allows data frame columns to be called di-
rectly
(d) detach(dataframe): detaches attached data frame
(e) summary(dataframe): provide summary statistics of the data frame
(quantiles,mean, attribute frequency etc.)
(f) mean(data): computes the mean value
(g) sum(data): returns the sum of all values in data
(h) median(data): returns middle value of the data
(i) var(data): returns the variance of the data
(j) sd(data): returns the standard deviation of the data
(k) data[1]: returns 1st element of data (R is not indexed )
(l) data[data < 5]: returns all entries that meet this condition
(m) length(data): returns number of entries in data
(n) data[,3]: returns 3rd column (all rows)
(o) data[5,]: returns 5th row (all columns)
(p) data[2:4,5] returns rows 2-4 (inclusive) from 5th column
(q) data[data > 4 & data < 8]: logical connectors (and)
(r) data[data > 3 | data < 6]: logical connectors (or)

1
(s) order(data): sorts data in ascending order
(t) rev(order(data)): sorts data in descending order
(u) seq(s,e,step): create an inclusive sequential vector of values from
s − e incremented by step
(v) rep(val,num): create a vector containing val, num times.

2. Distribution commands:

(a) rnorm(n): generates n random normally distributed values


(b) pnorm(0) = 0.5: enter number of standard deviations, returns
percentage (Z-scores)
(c) qnorm(0.5) = 0: enter percentage, returns standard deviations
from mean (Confidence Intervals)

3. Remove missing values from data vector with:

(a) assign column to variable name: var = df$column


(b) remove mising values: var = var[![Link](var)]

4. We can obtain confidence interval for some 100(1 − α) as:

(a) set alpha e.g. 0.05 = 95% CI


(b) Calculate upper and lower bounds:

x1 −x2 ±qnorm(1−α/2)∗sqrt(var(X1 /length(X1 )+(var(X2 )/length(X2 ))

(c) If the interval contains the value 0, we cannot reject a null hy-
pothesis H0 : µ1 = µ2

5. We can also determine if the means are different with the t-test

6. T obs is the observed value of the test statistic.

7. The p-value is the probability, under the assumption that H0 is true,


that the test statistic is at least as extreme as that observed.

8. To test H0 : µ1 = µ2 as: 2 ∗ (1 − pnorm([Link]))

2
2 Slide 2 - Two Sample T-Test and Boot-
strapping
1. Bootstrapping: A resampling method that uses random sampling
with replacement. Bootstrapping is handy with small sample sizes.
left with less

2. Monte Carlo Method: Relying on repeated random sampling to


obtain numerical results.

3 Slide 3 - Review of Probability Distribu-


tions and Q-Q Plots
1. Chi-square Distribution: If Z ∼ N (0, 1) we say that the random
variable defined by X = Z 2 is χ2(1)
k
Zi2
P
More generally if Z1 , . . . , Zk are independent N (0, 1) then X =
i=1
is said to be χ2(k)
Ex.
To compute P (X ≥ 4) if X ∼ χ23 : 1 − pchisq(q = 4, df = 3)
To compute the quantile p(X ≤ q0.7 ) = 0.7: qchisq(p = 0.7, df = 4)

2. T-Distribution: If Z ∼ N (0, 1) and W ∼ χ2n and Z and W are


independent then: X = √ZW to find a tn distribution.
n

The t-distribution is shaped like the normal distribution but it has


heavier tails. (meaning they don’t bottom out as fast, which seems
counter-intuitive).

3. A bow shape in a Q-Q plot indicates a chi-square distribution.

4 Slide 4 - Skewness and Kurtosis


1. Skewness: A measure of asymmetry within the distribution of the
data. Skewness greater than zero indicates that the right tail is longer
than the left tail (meaning at the left tail comes down earlier). Skewness

3
equal to zero corresponds to a symmetric distribution.

skew = f unction(x){
m3 = sum((x − mean(x))3 )/length(x)
s3 = sqrt(var(x))3
m3/s3}

2. Kurtosis: The fourth moment of a distribution. Kurtosis measures


the peakedness along with the heaviness of the tails of a distribution.
Heavy tails mean that there is a larger probability of getting very large
values.
The normal distribution is taken as a reference and therefore the kur-
tosis of a normal distribution is 0.
Distributions with positive kurtosis have heavier than normal tails. An
example would be the t-distribution.
Distributions with negative kurtosis have a flatter shape in the middle.
An example would be a uniform distribution.

kurtosis = f unction(x){
m4 = sum((x − mean(x))4 )/length(x)
s4 = var(x)2
m4/s4 − 3}

5 Slide 5 - More on Two Sample Testing: t-


tests
1. two sample t-test: An alternative approach for testing the equality of
population means in the small sample setting. t-tests assume that the
data arises from a normal distribution. Other assumptions are made
depending on the type of t-test being used:

(a) pooled t-test: assumption of equal variance


(b) Welch t-test: no assumption of equal variance
(c) paired t-test: two samples that are dependent (pairs of data)

4
6 Slide 6 - Paired Data: Parametric and Non-
parametric Methods
1. Paired t-test: Used for testing the difference in means between pair
to data (such as before and after measurements). The paired t-test
assumes normality.

2. The Signed Rank Test: An alternative to the paired t-test which


does not assume normality. Test statistics are derived from the ranks
of the data values. The signed rank test is robust outliers.
Ranks are generated by computing the difference between the data
pairs. The smallest difference is ranked 1. ties are split (e.g. a tie for
3 results in a rank of 3.5 for both pairs).
To compute the test statistic compute: (Before - After) for every data
pair and sum all positive values (i.e. where After is less than Before).
To use: [Link](paired = T rue)

7 Slide 7 - The Mann-Whitney Test: A Non-


parametric Sample Procedure
1. The Mann-Whitney Test: Similar to The signed rank test in that
it does not assume normality. The Mann-Whitney test assumes that
the data is independent and not paired. The Mann-Whitney test is
based on ranks, where we group m + n observations and rank them
from 1, . . . , m + n. The test statistic is then computed as the sum of
the ranks of the first sample.
To use: [Link](paired = F alse)

8 Slide 8 - Analysis of Variance


1. ANOVA generalizes the t-test for J ≥ 2 populations assuming a sample
size of n is drawn from each.
Very often the J populations will correspond to the J different levels
of an experimental factor that is manipulated in an experiment with n
observations taken at each of its J levels.

5
First, we want to detect if the means are all equal. In the case where
the means are not all equal, a secondary objective is to investigate how
the means differ from each other.

2. When the error bars are based on the standard error of the mean,
overlap ensures us that there is insufficient evidence of a population
difference. In this same case, non-overlap does not necessarily indicate
sufficient evidence of a difference in population means.

3. When the error bars are based on confidence intervals, overlapping


confidence intervals do not imply that there is insufficient evidence of a
population difference, as in our example. When the confidence intervals
are non-overlapping, this does indicate evidence that the population
means are different.

9 Slide 9 - ANOVA for Factorial Experiments


1.

10 Slide 10 -
1.

11 Slide 11 -
1.

12 Slide 12 -
1.

13 Slide 13 -
1.

14 Slide 14 -
1.

6
15 Slide 15 -
1.

16 Slide 16 -
1.

17 Slide 17 -
1.

18 Slide 18 -
1.

19 Fundamentals
1.

20 Dataframes
1.

21 Central Tendency
1.

22 Variance
1.

You might also like