0% found this document useful (0 votes)
46 views

Chapter 2 Data Structures in R

Here are the steps to solve this problem: 1. Create a data frame with the raw data: data <- data.frame(Height=c(175,165,150,155,168,153,165,177,180,164,150), Weight=c(80,85,50,55,63,45,74,90,86,74,53), Hours=c(8,7,12,10,9,7,6,7,8,8,11)) 2. Calculate BMI and add to data frame: data$BMI <- data$Weight/(data$Height/100)^2 3. Find summary statistics: summary(data

Uploaded by

nailofar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Chapter 2 Data Structures in R

Here are the steps to solve this problem: 1. Create a data frame with the raw data: data <- data.frame(Height=c(175,165,150,155,168,153,165,177,180,164,150), Weight=c(80,85,50,55,63,45,74,90,86,74,53), Hours=c(8,7,12,10,9,7,6,7,8,8,11)) 2. Calculate BMI and add to data frame: data$BMI <- data$Weight/(data$Height/100)^2 3. Find summary statistics: summary(data

Uploaded by

nailofar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

DSC551 (PROGRAMMING FOR DATA SCIENCE)

CHAPTER 2
DATA STRUCTURES IN R PROGRAMMING

PREPARED BY: DR NIK NUR FATIN FATIHAH BINTI SAPRI


1 • VECTORS

TYPE OF DATA
2 • FACTORS
STRUCTURES
IN R 3 • MATRICES

4 • ARRAY
A way of organizing data

5
for use in the computer.
• DATA FRAME

6 • LIST
1 • VECTORS A vector is simply a list of items that are of
the same type.

# Vector of strings
fruits <- c("banana", "apple", "orange")

# Vector of numerical values


numbers <- c(1, 2, 3)

# Vector of logical values


log_values <- c(TRUE, FALSE, TRUE, FALSE)
1 • VECTORS How to create vectors?

Vectors R function Use of Function Example


For example;
numbers <-
Combine/concatenate c( ) c(values)
c(1, 2, 3)

For example;
seq(from, to) seq(1,10)
seq(from, to, by=) seq(1,10,by=2)
seq(1,3,by=0.5)
Sequence seq( ) seq(20,0,by=-5)
seq(from, to, length=) seq(0,20,length=4)
seq(along) seq(5)
seq(1:5)

For example;
Replication rep( )
rep(value,no. of replication) rep(5,5)
1 • VECTORS Selecting element from vector:
x=c(1,4,5,3,9,10,12)

Functions Output Notes


X[5]
To print 1st – 3rd element
X[5]=100
y=x<8
X[-2]
length(x)
edit(x)
To remove element in 1st and 2nd place
X[-c(2,4)]
Functions in vector. Given x=c(1,3,5,7,9) and
1 • VECTORS y=c(-1,-3,-5,7,-9)

Functions Description Example/Output

mean(x) To compute mean

var(x) To compute variance

sd(x) To compute standard deviation

sum(x) To compute the summation

min(x) To find the minimum value

max(x) To find the maximum value

diff(x) To compute the difference between two vectors

identical(x,y) To check the element inside two vectors are similar


2 • FACTORS A factor used for categorical data. Can be
created using factor() function

#categorize the music genre


music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))

#categorize the opinion


Opinion=factor(c(“yes”,”yes”,”no”,”yes”,”no”))
Opinion_1=factor(c(“yes”,”yes”,”no”,”yes”,”no”),label=c(1,2))
3 • MATRICES
A matrix is a two dimensional data set with rows
and columns. Can be created using matrix()
function

Three ways in creating matrix: 1) matrix(data,nrow,ncol) function


2) dim() function
3) rbind() and cbind () function
3 • MATRICES

Matrices R function Example


m = matrix(1:6, nrow = 2, ncol = 3)
#By default, matrices constructed by column
Matrix matrix(data, nrow, ncol)
n = matrix(1:6, nrow = 2,ncol = 3,byrow = T)
x = c(2,4,6,8,10,12) #a vector
Dimension dim( )
dim(x) = c(2,3)
k = 1:3
Row bind rbind( ) l = 10:12
Column bind cbind( ) matrix1 = rbind(k,l) #row
matrix2 = cbind(k,l) #column
3 • MATRICES
Matrices can do computation. Given
x=matrix(4:7,nrow=2)

Matrices can do computation

Functions Description Example/Output

t(x) To find transpose matrix

det(x) To compute determinant

diag(x) To find diagonal

solve(x) To compute inverse matrix

rowMeans(x) To compute row means

colMeans(x) To compute column means


4 • ARRAY Array is multi dimensional data set with rows,
columns and groups. Can be created using array()
function

# An array with one dimension with values ranging from 1 to 24


thisarray <- c(1:24)
thisarray

# An array with more than one dimension


multiarray <- array(thisarray, dim = c(4, 3, 2))
multiarray
5
 Data frames are used to store tabular data in R
• DATA FRAME (data set).
 Can be viewed as a data table with rows for
cases and columns for variables (numeric,
character, logical, and etc.)
 All of the columns in a data frame must be of
the same length
 May contain columns of different data types.
 Cannot do matrix multiplication on a data frame.
 Can be created using data.frame() function

#Example
names = c(“ali", “abu", “siti", “sofea")
gender = c("male", "male", "female", "female")
age = c(25, 21, 30, 27)
occupation = c(“doctor", “lawyer", “doctor", “lawyer”)
Data=data.frame(names,gender,age,occupation)

Observe the above R code. Is there anything you would like to suggest?
6 • LIST A list in R can contain many different data
types inside it. Can be created using
list() function.

# List of strings
thislist <- list("apple", "banana", "cherry")

# to change the item inside list


thislist <- list("apple", "banana", "cherry")
thislist[1] <- "blackcurrant“
Exercises
Height(cm) Weight(kg) No.Hours sleeping BMI
175 80 8
165 85 7
150 50 12
155 55 10
168 63 9
153 45 7
165 74 6
177 90 7
180 86 8
164 74 8
150 53 11

1) Create a data frame for the above raw data


2) Calculate the bmi and add it into the data frame
3) Find the summary statistics for the data.

You might also like