0% found this document useful (0 votes)
36 views36 pages

R Lists and Data Frames Explained

The document provides an overview of lists and data frames in R, highlighting their structures, creation, and manipulation. Lists can contain various R objects and support naming and nesting, while data frames are structured collections of observations with equal-length vectors. The document also covers subsetting and combining data frames, along with managing data types and logical record subsets.

Uploaded by

Chaya Anu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views36 pages

R Lists and Data Frames Explained

The document provides an overview of lists and data frames in R, highlighting their structures, creation, and manipulation. Lists can contain various R objects and support naming and nesting, while data frames are structured collections of observations with equal-length vectors. The document also covers subsetting and combining data frames, along with managing data types and logical record subsets.

Uploaded by

Chaya Anu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Lists and

Data
Frames
Lists of Object
 The list is an incredibly useful data structure.
 It can be used to group together any mix of R structures and objects.
 A single list could contain a numeric matrix, a logical array, a single
character string, and a factor of object.
 You can even have a list as a component of another list.
Definition and Component Access
● Creating a list is much like creating a vector.
● You supply the elements that you want to include to this
list function, separated by commas.
Lists
Accessing Lists
Accessing Lists

• Note that the result bar is itself a list with the two components stored in the order in
which they were requested.
Naming

● You can name list


components to
make the elements
more recognizable
and easy to work
with.
● Just like the
information stored
about factor levels,
a name is an R
attribute.
Lists
Subsetting

● Subsetting named members also works the same way.

● The function all returns a TRUE only if all of the logicals are TRUE,
and returns FALSE otherwise.
● This confirms that these two ways of extracting the second column of
the matrix in foo provide an identical result.
Naming List while Creating
● To name the components of a list as it’s being created, assign a label to each component in the list
command. Using some components of foo, create a new, named list.

R> Mylist <- list(tom=c(foo[[2]],T,T,T,F), john="g’day mate", harry = foo$mymatrix*2)


R> Mylist
$tom
[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
$john
[1] "g'day mate“
$harry
[,1] [,2]
[1,] 2 6
[2,] 4 8
Naming List while Creating

● The object Mylist now contains the three named components tom, john, and harry.
R> names(Mylist)
[1] "tom" “john" "harry“

● NOTE
● When using the names function, the component names are always provided and
returned as character strings in double quotes. However, if you’re specifying names
when a list is created (inside the list function) or using names to extract members with
the dollar operator, the names are entered without quotes (in other words, they are not
given as strings).
Nested Lists
● As noted earlier, a member of a list can itself be a list.
● When nesting lists like this, it’s important to keep track of the depth of any member
for subsetting or extraction later.
● Note that you can add components to any existing list by using the dollar operator and
a new name.
● Here’s an example using foo and Mylist from earlier.
R> Mylist$bobby <- foo
R> Mylist
$tom
[1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE

$john
$bobby$mylogicals
[1] "g'day mate“
[1] TRUE FALSE TRUE TRUE
$harry
$bobby$mystring
[,1] [,2]
[1] “hello you!”
[1,] 2 6
[2,] 4 8

$bobby
$bobby$mymatrix
[,1] [,2]
[1,] 1 3
[2,] 2 4
Nesting

● Here you’ve defined a fourth component to the list Mylist called bobby.
● The member bobby is assigned the entire list foo.
● As you can see by printing the new Mylist, there are now three components in
bobby.
● Naming and indexes are now both layered, and you can use either (or combine
them) to retrieve members of the inner list.
Nesting

R> Mylist$bobby$mylogicals[1:3]
[1] TRUE FALSE TRUE

R> Mylist[[4]][[2]][1:3]
[1] TRUE FALSE TRUE

R> Mylist[[4]]$mylogicals[1:3]
[1] TRUE FALSE TRUE
Conclusion on Lists

● Lists are often used to return output from various R functions.


● But they can quickly become rather large objects in terms of system
resources to store.
● It’s generally recommended that when you have only one type of
data, you should stick to using basic vector, matrix, or array structures
to record and store the observations.
DATA FRAMES
● A data frame is R’s most natural way of presenting a data set with a collection of
recorded observations for one or more variables.

● Like lists, data frames have no restriction on the data types of the variables; you can
store numeric data, factor data, and so on.

● The R data frame can be thought of as a list with some extra rules attached.

● The most important distinction is that in a data frame (unlike a list), the members
must all be vectors of equal length.

● The data frame is one of the most important and frequently used tools in R for
statistical data analysis.
Construction
● To create a data frame from scratch, use the [Link] function.
● You supply your data, grouped by variable, as vectors of the same length—the same
way you would construct a named list.
Construction
● The returned object should make it clear why vectors passed to [Link] must
be of equal length: vectors of differing lengths wouldn’t make sense in this
context.
● If you pass vectors of unequal length to [Link], then R will attempt to
recycle any shorter vectors to match the longest, throwing your data off and
potentially allocating observations to the wrong variable.
● Notice that data frames are printed to the console in rows and columns—they
look more like a matrix than a named list.
● This natural spreadsheet style makes it easy to read and manipulate data sets.
● Each row in a data frame is called a record, and each column is a variable
SUBSETTING
SUBSETTING
SUBSETTING

• You can report the size of a data frame—


the number of records and variables—just
as you’ve seen for the dimensions of a
matrix.
• The nrow function retrieves the number of
rows (records), ncol retrieves the number
of columns (variables), and dim retrieves
both.
DATA FRAMES
● R’s default behavior for character vectors passed to [Link] is to convert each variable
into a factor object. Observe the following:

● Notice that this variable has levels, which shows it’s being treated as a factor. But this isn’t
what you intended when you defined mydata earlier- explicitly defined gender to be a
factor but left person as a vector of character strings.
● To prevent this automatic conversion of character strings to factors when using
[Link], set the optional argument stringsAsFactors to FALSE (otherwise, it defaults
to TRUE).
● Reconstructing mydata with this in place looks like this:
DATA FRAMES
Adding Data Columns and Combining
Data Frames
● Recall the rbind and cbind functions from matrix, which let you append rows
and columns, respectively.
● These same functions can be used to extend data frames intuitively.
● For example, suppose you had another record to include in mydata: the age
and sex of another individual, Brian.
● The first step is to create a new data frame that contains Brian’s information.
Combining Data Frames
● To avoid any confusion, it’s important to make sure the variable names
and the data types match the data frame you’re planning to add this to.
Adding Data Columns
● Adding a variable to a data frame is also quite straightforward.
● Let’s say you’re now given data on the classification of how funny these six individuals are, defined
as a “degree of funniness.”
● The degree of funniness can take three possible values: Low, Med (medium), and High.
● Suppose Peter, Lois, and Stewie have a high degree of funniness, Chris and Brian have a medium
degree of funniness, and Meg has a low degree of funniness.
● In R, you’d have a factor vector like this:
Adding Data Columns

● The first line creates the basic


character vector as funny, and the
second line overwrites funny by
turning it into a factor.
● The order of these elements must
correspond to the records in your data
frame.
● Now, you can simply use cbind to
append this factor vector as a column
to the existing mydata.
Adding Data Columns
● The rbind and cbind functions aren’t
the only ways to extend a data frame.
● One useful alternative for adding a
variable is to use the dollar operator,
much like adding a new member to a
named list, as in Section List.
● Suppose now you want to add another
variable to mydata by including a
column with the age of the individuals
in months, not years, calling this new
variable [Link].
Logical Record Subset

● In Section logical values, you saw how to use logical flag vectors to subset data
structures.
● This is a particularly useful technique with data frames, where you’ll often want
to examine a subset of entries that meet certain criteria.
● For example, when working with data from a clinical drug trial, a researcher
might want to examine the results for just male participants and compare them to
the results for females.
● Or the researcher might want to look at the characteristics of individuals who
responded most positively to the drug
Logical Record Subset
● Let’s continue to work with mydata. Say you want to examine all records corresponding to males.
● From Section factor, you know that the following line will identify the relevant positions in the gender
factor vector:
Logical Record Subset

● since you know you are selecting the males only, you could omit gender from
the result using a negative numeric index in the column dimension.
Logical Record Subset

● If you don’t have the column number or if you want to have


more control over the returned columns, you can use a
character vector of variable names instead.
Logical
Record
Subset

You might also like