0% found this document useful (0 votes)
20 views

Statistics - A.Y. 2018-2019: BIEF - Class 22

stata

Uploaded by

ema
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Statistics - A.Y. 2018-2019: BIEF - Class 22

stata

Uploaded by

ema
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

30001 Statistics – a.y.

2018-2019

BIEF – Class 22
Lecture 3

R manual
Previously on “Statistics 30001”
• Identify types of data and levels of
measurement
• Appropriately create and interpret tables and
graphs to describe categorical variables:
• frequency distribution,
• bar chart,
• pie chart
Basics and approach
• We must type commands
> …[ENTER]

• Objects
> X <- 3 [ENTER]
> X [ENTER]
[1] 3
Basics and approach
• Dataset format: *.rda or *.rds or *.rdata
• To open a dataset in R:
> load(“path-and-file.rda”)
– In Rstudio Menu File – Command Open File … or Import Dataset3

• To save a dataset in R:
> save(Dataframe, file = “path-and-
file.rda”)
– Even in RStudio
Basics and approach
• Dataframe vs Datafile
> D<-Dataset_Movie_v3
> str(D)
'data.frame': 2868 obs. of 46 variables:
$ id_imdb : chr "tt0120338" "tt0499549" "tt2488496" "tt0107290" ...
$ movie_title : chr "Titanic" "Avatar" "Star Wars: The Force Awakens" "Jurassic Park"
...
$ year : int 1997 2009 2015 1993 2015 2012 2015 2003 1999 2011 ...
$ year_bins : Factor w/ 3 levels "1980-1999","2000-2009",..: 1 2 3 1 3 3 3 2 1 3 ...
$ studio_distrib : Factor w/ 11 levels "20th Century Fox",..: 7 1 10 9 9 10 9 11 1 11 ...
. . . .
> focus<-data.frame(Dataset_Movie_v3$movie_title,
Dataset_Movie_v3$movie_genre)
> str(focus)
'data.frame': 2868 obs. of 2 variables:
$ Dataset_Movie_v3.movie_title: Factor w/ 2854 levels "10 Cloverfield Lane",..: 2627 214 1947 1139
1141 2066 802 2351 1943 916 ...
$ Dataset_Movie_v3.main_genre : Factor w/ 8 levels "Action","Adventure",..: 7 1 1 2 1 1 1 2 1 2
...
Basics and approach
• Object vector
> classes <- c(1, 4, 254)
> str(classes)
num [1:3] 1 4 254
> classes
[1] 1 4 254
> classes[2]
[1] 4
> classes[c(1, 3)]
[1] 1 254
Basics and approach
• Matrices
> A <- matrix(data = c(4, 2, 0, 1, -3, 0.9), nrow = 3,
ncol = 2, byrow = TRUE)
> A
[,1] [,2]
[1,] 4 2.0
[2,] 0 1.0
[3,] -3 0.9
Basics and approach
• Matrices
> B <- matrix(data = c(4, 2, 0, 1, -3, 0.9), nrow = 3,
ncol = 2, byrow = FALSE)
> B
[,1] [,2]
[1,] 4 1.0
[2,] 2 -3.0
[3,] 0 0.9
Basics and approach
• Matrices
> A <- matrix(data = c(4, 2, 0, 1, -3, 0.9), nrow = 3,
ncol = 2, byrow = TRUE)
> A
[,1] [,2]
[1,] 4 2.0
[2,] 0 1.0
[3,] -3 0.9
> A[2, 1] # element in row 2 and column 1
[1] 0
> A[c(1, 3), 2] # elements in row 1 or 3 and column 2
[1] 2.0 0.9
> A[, 1] # all elements in column 1
[1] 4 0 -3
Basics and approach
• Object factor
a categorical variable, even if
categories are coded as numbers
> gender <- factor(c("f", "f", "f", "m", "m", "f", "m", "f",
"f", "m"))
> gender
[1] f f f m m f m f f m

> str(Dataset_Movie_v3$major_distrib)
int [1:2868] 1 1 1 1 1 1 1 1 1 1 ...
> major_recoded <- factor(Dataset_Movie_v3$major_distrib)
> str(major_recoded)
Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
Get help

> help(command)
Basics and approach

•More about this on the


HANDBOOK…
•Let’s try!
Frequency tables
Command in Rstudio (and in R)

> table(dataframe$variable)

Example:
In the Case of the Motion Picture Industry (dataset
«Dataset_Movie_v3.rda»), which is the distribution
of the movies by genre (variable «main_genre»)?
Frequency tables
Example:
In the Case of the Motion Picture Industry (dataset «Dataset_Movie_v3.rda»),
which is the distribution of the movies by genre (variable «main_genre»)?

1. Download «Dataset_Movie_v3.rda» from BB


2. Open Rstudio
3. File – Open: «Dataset_Movie_v3.rda»
4. > table(Dataset_Movie_v3$main_genre)
Poor result:
• Horizontal table
• Without relative frequencies
Pie chart
Command in Rstudio (and in R)

> pie(table(dataframe$variable))

Example:
In the case of the Motion Picture Industry (dataset
«Dataset_Movie_v3.rda»), which is the distribution
of the movies by genre (variable «main_genre»)?
Pie chart
Example:
In the Case of Motion Picture Industry (dataset «Dataset_Movie_v3.rda»), which is
the distribution of the movies by genre (variable «main_genre»)?

1. Basic version:
>_pie(table(Dataset_Movie_v3$main_ge
nre))

2. Improvement:
>_pie(table(Dataset_Movie_v3$main_ge
nre), main = “Pie chart for the Main
Genre of the Movies”)
Pie chart
Example:
In the Case of the Motion Picture Industry (dataset «Dataset_Movie_v3.rda»),
which is the distribution of the movies by genre (variable «main_genre»)?
Why R and not Excel?
• It depends on your purposes.
• Excel is better for data management, for
presenting and reporting results to the
mainstream.
• R is better for exploring data, analyzing
dataset, finding evidence: more powerful,
faster, more statistical tools.
• They can be combined!
Recap
Upcoming

How to process data for describing numerical


variables by tables and graphs?
DISCLAIMER
This material is carried out by using part of
slides provided by Pearson Education as
appendix of the Textbook.
Therefore it must be used only for didactic
purpose and it cannot be published, hired or
sold.
The use of the material and any violations of
Copyright © Pearson Education is under
your responsibility.

You might also like