UNIT-3-2
UNIT-3-2
Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in the columns
which have a limited number of unique values. Like "Male, "Female" and True,
False etc. They are useful in data analysis for statistical modeling.
Factors are created using the factor () function by taking a vector as input.
Demography: Male/Female
To create a factor, use the factor() function and add a vector as argument:
Example
# Create a factor
music_genre
Result:
You can see from the example above that that the factor has four levels
(categories): Classic, Jazz, Pop and Rock.
levels(music_genre)
Result:
You can also set the levels, by adding the levels argument inside the factor()
function
Example
levels(music_genre)
Result:
Factor Length
Use the length() function to find out how many items there are in the factor:
Example
length(music_genre)
Result:
[1] 8
Access Factors
To access the items in a factor, refer to the index number, using [] brackets:
Example
music_genre[3]
Result:
[1] Classic
Example
music_genre[3]
Result:
[1] Pop
Note that you cannot change the value of a specific item if it is not already
specified in the factor. The following example will produce an error:
Example
Trying to change the value of the third item ("Classic") to an item that does not
exist/not predefined ("Opera"):
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz",
"Rock", "Jazz"))
music_genre[3]
Result:
Warning message:
However, if you have already specified it inside the levels argument, it will work:
Example
music_genre[3]
Result:
[1] Opera
[1] East West East North North East West West West East North
Levels: East North West
[1] East West East North North East West West West East North
Levels: East West North
We can generate factor levels by using the gl() function. It takes two integers as
input which indicates how many levels and how many times each level.
Syntax
gl(n, k, labels)
Example
Data Frames can have different types of data inside it. While the first column can
be character, the second and third can be numeric or logical. However, each
column should have the same type of data.
Example
# Create a data frame
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45))
# Print the data frame
Data_Frame
Use the summary() function to summarize the data from a Data Frame:
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame
summary(Data_Frame)
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame[1]
Data_Frame[["Training"]]
Data_Frame$Training
Training
1 Strength
2 Stamina
3 Other
[1] Strength Stamina Other
Levels: Other Stamina Strength
[1] Strength Stamina Other
Levels: Other Stamina Strength
Add Rows
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Use the c() function to remove rows and columns in a Data Frame:
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Pulse Duration
2 150 30
3 120 45
Use the dim() function to find the amount of rows and columns in a Data Frame:
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
dim(Data_Frame)
[1] 3 3
You can also use the ncol() function to find the number of columns and nrow() to
find the number of rows:
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
ncol(Data_Frame)
nrow(Data_Frame)
[1] 3
[1] 3
Use the length() function to find the number of columns in a Data Frame (similar
to ncol()):
Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
length(Data_Frame)
[1] 3
Use the rbind() function to combine two or more data frames in R vertically:
Example
Data_Frame1 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Example
Data_Frame3 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
In data analysis you can sort your data according to a certain variable in the
dataset. In R, we can use the help of the function order(). In R, we can easily sort a
vector of continuous variable or factor variable. Arranging the data can be
of ascending or descending order.
Argument:
EXAMPLE
Lists are the R objects which contain elements of different types like − numbers,
strings, vectors and another list inside it. A list can also contain a matrix or a
function as its elements. List is created using list() function.
A list in R can contain many different data types inside it. A list is a collection of
data which is ordered and changeable.
Example
# List of strings
thislist <- list("apple", "banana", "cherry")
[[3]]
[1] "cherry"
Access Lists
You can access the list items by referring to its index number, inside
brackets. The first item has index 1, the second item has index 2, and so
on:
Example
thislist <- list("apple", "banana", "cherry")
thislist[1]
Output
[[1]]
[1] "apple"
Example
thislist <- list("apple", "banana", "cherry")
thislist[1] <- "blackcurrant"
[[2]]
[1] "banana"
[[3]]
[1] "cherry"
List Length
To find out how many items a list has, use the length() function:
Example
thislist <- list("apple", "banana", "cherry")
length(thislist)
Output
[1] 3
To find out if a specified item is present in a list, use the %in% operator:
Example
Check if "apple" is present in the list:
thislist <- list("apple", "banana", "cherry")
To add an item to the end of the list, use the append() function:
Example
Add "orange" to the list:
thislist <- list("apple", "banana", "cherry")
append(thislist, "orange")
Output
[[1]]
[1] "apple"
[[2]]
[1] "banana"
[[3]]
[1] "cherry"
[[4]]
[1] "orange"
Example
Add "orange" to the list after "banana" (index 2):
thislist <- list("apple", "banana", "cherry")
[[2]]
[1] "banana"
[[3]]
[1] "orange"
[[4]]
[1] "cherry"
You can also remove list items. The following example creates a new,
updated list without an "apple" item:
Example
Remove "apple" from the list:
thislist <- list("apple", "banana", "cherry")
[[2]]
[1] "cherry"
Range of Indexes
You can specify a range of indexes by specifying where to start and
where to end the range, by using the : operator:
Example
Return the second, third, fourth and fifth item:
thislist <-
list("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")
(thislist)[2:5]
Output
[[1]]
[1] "banana"
[[2]]
[1] "cherry"
[[3]]
[1] "orange"
[[4]]
[1] "kiwi"
Note: The search will start at index 2 (included) and end at index 5
(included).
Remember that the first item has index 1.
The most common way is to use the c() function, which combines two
elements together:
Example
list1 <- list("a", "b", "c")
list2 <- list(1,2,3)
list3 <- c(list1,list2)
list3
Output
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
[1] "c"
[[4]]
[1] 1
[[5]]
[1] 2
[[6]]
[1] 3
unlist() Function
Syntax: unlist(list)
Parameters:
list: It is a list or Vector
use.name: Boolean value to prserve or not the position names
Example 1: Converting list numeric vector into a single vector
# R program to illustrate
# Creating a list.
l2 = c(1, 2, 3),
Output:
l11 l12 l13 l14 l21 l22 l23 l31 l32 l33 l34 l35 l36 l37
1 3 5 7 1 2 3 1 1 10 5 8 65 90
Here in the above code we have unlisted my_list using unlist() and convert it to a
single vector.
As illustrated above, the list will dissolve and every element will be in the same
line as shown above.