0% found this document useful (0 votes)
19 views2 pages

BAN5

Uploaded by

W-K P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views2 pages

BAN5

Uploaded by

W-K P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Install: Use install install.packages() to install: tidyverse, ggplot2, dplyr.

| PDF R markdown named:


u########

dplyr commands: filter() chooses rows based on column values. | arrange(sort()) changes the order
of the rows. | select() changes whether or not a column is included. | rename() changes the name of
columns. | mutate() changes the values of columns and creates new columns. |
summarise(summary() / str() / head()) collapses a group into a single row. | group_by() allows to
group by a modified column. | ungroup() removes grouping | count() count values | Find the
"previous" lag() or "next" lead() values, comparing values behind of or ahead of the current values. |
Join dplyr commands with %>% | #uses data.frame() data structures# ggplot() commands:
ggplot(data, mapping = aes(x-axis, y-axis)) | geom includes line, point, bar. | add color or size. |
facet_wrap groups plots. | Join ggplot commands with + | #graphs#

Histograms: created using the hist(x = data, main = , xlab = ) function. | Box graph: used to display
information in form of distribution of data, based on five sets (minimum, first quartile, median, third
quartile, and maximum): boxplot(x, horizontal, xlab, main) | plot(x = , y = , pch = , col = rgb( , , , ),
main = , xlab = , ylab = ): Generic command for plotting. | adds lines: abline(a = , b = , h = , v = , reg
= , coef = , untf = ) | lines(density(), lwd = , col = ) | #16#rgd uses color positions like examples:
(0,0,0,0.02)# | #graphs distribution# browseVignettes() used for more info about packages.
#Intro/Program# Help: Use ? (before command) for R Documentation, and use args() for information
about command layout.

Relationships: ! means exclude, == means same, >= means bigger/equal, <= means smaller/equal, &
means AND, | means OR, $ specify column of table,

Clear environment with rm() and load packages with library().

Import: read_delim() specify the type of delimiter such as ”|”, skip & locale can be used. |
read_fwf(file = , col_types = , col_positions = fwf_positions(start = , end = , col_names = )) uses
widths & positions to import selected columns, column names can be given for multiple columns. |
read_csv(file = , col_names = , col_types = , skip = ) | #column types always needed# | #quotation
marks# | #sn#municipality#hhincome# | #col types formats examples: ncnn OR # | #skip is used to
skip rows like example 3 or 8#

Remove empty entries from data with na.omit(). | Use unique() to remove duplicate rows.

Create a lookup table: Create variable #lut#. | Create named list with values #use quotation marks#. |
Use variable positions to add new column.

The prop.table() used to calculate value of each cell in a table as a proportion of all values:
prop.table(x = data, margin = #1 = row, 2 = column, default is NULL #)

Output commands: concatenating the list, cat() performs much less conversion than print(). | paste():
Takes multiple elements from the multiple vectors and concatenates them into a single element. | #\n
create a new line#

The function pnorm(), compute probabilities from known bounding values. | The function qnorm()
aims to do the opposite. | dnorm() gives the density | pnorm() gives the distribution function |
qnorm() gives the quantile function | rnorm() generates random deviates.

prop.test(x = , n = , conf.level = ) can be used for testing the null that the proportions (probabilities of
success) in several groups are the same, or that they equal certain given values.

t.test(x = data, y = data, conf.level = , alternative = , paired = ) Student's t-Test #T-distribution#:


Performs one and two sample t-tests on vectors of data.

table() uses cross-classifying factors to build a contingency table of the counts at each combination of
factor levels. class = identify class. | convert = convert class. | colnames = rename columns. |
as.POSIXlt = used to convert time & date. | nrow = count rows. | length = length of object. | write =
create file. | table = Table Creation. | merge = Merge Data Frames. | paste = Concatenate Strings. |
cbind = Combine R Objects by Rows or Columns. | diff = Lagged differences.
| difftime = Time Intervals / Differences between 2 times. | round = Rounding of Numbers. | scan =
Read Data Values. | read.table = Reads a file in table format and creates a data frame from it. |
complete.cases = same as na.omit

Calculation CI: #percentage from data# pData calc by s / n #n = population#s = sample# | #Standard
Error# SE calc by ((p * (1 - p)) / n)^(1/2) | #Confidence Level# CL | z calc by #use qnorm()#p calc by
(1 - CL) / 2# | #Calculated confidence interval# CI calc by (p – (z * SE), p + (z * SE))

Categorical data: independence? | at least 10 successes and failures each. | normal-D? | use S, F,
hist() to test data set inference |

Confidence interval: S = successes | F = failures | n = data size | P_data = probability from data
#observed# | SE
#Standard Error# is calc by ((P_data * (1 - P_data)) / n)^(1/2) | CL = confidence level | P_given is calc
by ((1 - CL) / 2) #assumed# | Z uses qnorm() | CI = Calculated confidence interval

Hypothesis test: H0 = null H #status quo## | HA = alternative H #research Q## | assume H0 TRUE till
proven FALSE | reject H0 | SE is calc by ((P_given * (1 - P_given)) / n)^(1/2) | Z is calc by ((P_data –
P_Given) / SE) | use pnorm() and convert to percentage to get P_value | if P_value is smaller than
0.05 | reject H0 | #optional# use prop.test(x = , n = , conf.level = )

Numerical data: independence? | sample size smaller than 30. | near normal-D? | use n, hist(),
boxplot() to test data set inference

Single mean: t-D | df calc by n – 1 | pop mean calc by #sample mean# x +/- (t*) SE | SE calc by (s / (n
^ (1/2)))

Paired: 2 sets | connection | H0 is mean diff = 0 | HA is mean diff not = 0 | SE diff calc by (s diff / (n diff
^ (1/2))) | T_value calc by ((mean diff – 0) / SE diff)

Not paired: 2 sets | each set must meet inference | mean 1 – mean 2 | pop mean calc by #diff
between samples means#
(x1 – x2) +/- (t*) SE | SE diff calc by ((((s1 ^2 / n1) + (s2 ^2 / n2))^ (1/2))) | t* is smallest data set – 1
#n1 – 1 OR n2 - 1# | T_value calc by ((x1 – x2 – 0) / SE diff)

Confidence interval: X calc by mean() | SD calc by sd() | n is data set size | SE calc by sd / (n ^(1/2)) |
abs() used to make + | t uses qt(p = , df = , lower.tail = ) | CI calc by (x – (t*SE), x + (t*SE)) |
#optional# use t.test(x = , conf.level = )

Hypothesis test: X2 calc by mean() | SD2 calc by sd() | n2 is data set 2 size | df calc by smallest data
set size - 1 | SE diff calc by ((sd^2)/n. + (sd2^2)/n2) ^ (1/2) | T_value calc by ((x – x2)-0)/SE diff |
P_value calc by pt(q = , df = , lower.tail = ) and times 2 | if P_value is smaller than 0.05 | reject H0 |
#optional# use prop.test(x = , y = , alternative = , paired = )

You might also like