0% found this document useful (0 votes)

33 views18 pages

DP Unit1 Notes

The document provides an overview of data preparation and visualization techniques in R, covering data import from various sources, data cleaning using dplyr and tidyr, and basic visualization with ggplot2. It includes examples of importing data, selecting variables and observations, creating new variables, summarizing data, and reshaping datasets. Additionally, it introduces ggplot2 concepts such as geoms, scales, facets, labels, and themes for effective data visualization.

Uploaded by

usearch595

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views18 pages

DP Unit1 Notes

Uploaded by

usearch595

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Data Preparation And Visualization

UNIT I NOTES
Data preparation:
1. Importing data:
R can import data from almost any source, including text files, excel
spreadsheets, statistical packages, and database management systems
(DBMS). We’ll illustrate these techniques using the Salaries dataset,
containing the 9 month academic salaries of college professors at a single
institution in 2008-2009.
1.1 Text files:
The readr package provides functions for importing delimited text
files into R data frames.
library(readr)

# import data from a comma delimited file

Salaries <- read_csv("[Link]")

# import data from a tab delimited file

Salaries <- read_tsv("[Link]")

1.2 Excel spreadsheet:

The readxl package can import data from Excel workbooks. Both
xls and xlsx formats are supported.

library(readxl)

# import data from an Excel workbook

Salaries <- read_excel("[Link]", sheet=1)

1.3 Statistical Packages:

The haven package provides functions for importing

data from a variety of statistical packages.
library(haven)

# import data from Stata

Salaries <- read_dta("[Link]")

# import data from SPSS

Salaries <- read_sav("[Link]")

# import data from SAS

Salaries <- read_sas("salaries.sas7bdat")

1.4 Databases:
Importing data from a database requires additional steps and is beyond the scope of
this book. Depending on the database containing the data, the following packages can
help: RODBC, RMySQL, ROracle, RPostgreSQL, RSQLite, and RMongo. In the
newest versions of RStudio, you can use the Connections pane to quickly access the data
stored in database management systems .

2. Cleaning Data:
The processes of cleaning your data can be the most time-
consuming part of any data analysis. The most important steps are
considered below. While there are many approaches, those using
the dplyr and tidyr packages are some of the quickest and easiest to learn.

Package Function Use

dplyr select select variables/columns

dplyr filter select observations/rows

dplyr mutate transform or recode variables

dplyr summarize summarize data

dplyr group_by identify subgroups for further processing

tidyr gather convert wide format dataset to long format

tidyr spread convert long format dataset to wide format

Examples in this section will use the Starwars dataset from
the dplyr package. The dataset provides descriptions of 87 characters from
the Starwars universe on 13 variables.

2.1 selecting variables:

The select function allows you to limit your dataset to specified

variables (columns) .

library(dplyr)

# keep the variables name, height, and gender

newdata <- select(starwars, name, height, gender)

# keep the variables name and all variables

# between mass and species inclusive
newdata <- select(starwars, name, mass:species)

# keep all variables except birth_year and gender

newdata <- select(starwars, -birth_year, -gender)

2.2 selecting observations:

The filter function allows you to limit your dataset to observations (rows)
meeting a specific criteria. Multiple criteria can be combined with
the & (AND) and | (OR) symbols.
library(dplyr)

# select females
newdata <- filter(starwars,
gender == "female")

# select females that are from Alderaan

newdata <- select(starwars,
gender == "female" &
homeworld == "Alderaan")

# select individuals that are from Alderaan, Coruscant, or Endor

newdata <- select(starwars,
homeworld == "Alderaan" |
homeworld == "Coruscant" |
homeworld == "Endor")
# this can be written more succinctly as
newdata <- select(starwars,
homeworld %in%
c("Alderaan", "Coruscant", "Endor"))

2.3 Creating/recording variables:

The mutate function allows you to create new variables or transform
existing ones.
library(dplyr)

# convert height in centimeters to inches,

# and mass in kilograms to pounds
newdata <- mutate(starwars,
height = height * 0.394,
mass = mass * 2.205)

The ifelse function (part of base R) can be used for recoding data. The format
is ifelse(test, return if TRUE, return if FALSE).
library(dplyr)

# if height is greater than 180 then heightcat = "tall",

# otherwise heightcat = "short"

newdata <- mutate(starwars,

heightcat = ifelse(height > 180,
"tall",
"short")

# convert any eye color that is not black, blue or brown, to other.
newdata <- mutate(starwars,
eye_color = ifelse(eye_color %in%
c("black", "blue", "brown"),
eye_color,
"other")

# set heights greater than 200 or less than 75 to missing

newdata <- mutate(starwars,
height = ifelse(height < 75 | height > 200,
NA,
height)

2.4 Summarizing data:

The summarize function can be used to reduce multiple values down to
a single value (such as a mean). It is often used in conjunction with
the by_group function, to calculate statistics by group. In the code
below, the [Link]=TRUE option is used to drop missing values before
calculating the means.

library(dplyr)

# calculate mean height and mass

newdata <- summarize(starwars,
mean_ht = mean(height, [Link]=TRUE),
mean_mass = mean(mass, [Link]=TRUE))
newdata

2.5 Using pipes:

Packages like dplyr and tidyr allow you to write your code in a
compact format using the pipe %>% operator.

library(dplyr)

# calculate the mean height for women by species

newdata <- filter(starwars,
gender == "female")
newdata <- group_by(species)
newdata <- summarize(newdata,
mean_ht = mean(height, [Link] = TRUE))

# this can be written as more succinctly as

newdata <- starwars %>%
filter(gender == "female") %>%
group_by(species) %>%
summarize(mean_ht = mean(height, [Link] = TRUE))

2.6 Processing data:

Date values are entered in R as character values. For example,

consider the following simple dataset recording the birth date of 3
individuals.

df <- [Link](
dob = c("11/10/1963", "Jan-23-91", "[Link]")
)
# view struction of data frame
str(df)

2.7 Reshaping data:

Some graphs require the data to be in wide format, while some graphs
require the data to be in long format.

id name sex height w

01 Bill Male 70

02 Bob Male 72

03 Mary Female 62

library(tidyr)
long_data <- pivot_longer(wide_data,
cols = c("height", "weight"),
names_to = "variable",
values_to ="value")

id name sex variable

01 Bill Male height

01 Bill Male weight

02 Bob Male height

02 Bob Male weight

03 Mary Female height

03 Mary Female weight

library(tidyr)
wide_data <- pivot_wider(long_data,
names_from = "variable",
values_from = "value")

2.8 Missing data:

Real data is likely to contain missing values. There are three basic
approaches to dealing with missing data: feature selection, listwise
deletion, and imputation. Let’s see how each applies to
the msleep dataset from the ggplot2 package. The msleep dataset
describes the sleep habits of mammals and contains missing values on
several variables.

3. Introduction to ggplot2:

This chapter provides an brief overview of how the ggplot2 package

works. It introduces the central concepts used to develop an informative
graph by exploring the relationships contained in insurance dataset.

3.1 worked example:

The functions in the ggplot2 package build up a graph in layers. We’ll

build a a complex graph by starting with a simple graph and adding additional
elements, one at a time.
The example explores the relationship between smoking, obesity, age, and
medical costs using data from the Medical Insurance Costs dataset

# load the data

url <- "[Link]
insurance <- [Link](url)

# create an obesity variable

insurance$obese <- ifelse(insurance$bmi >= 30,
"obese", "not obese")

3.1.1 ggplot:
The first function in building a graph is the ggplot function. It
specifies the data frame to be used and the mapping of the variables
to the visual properties of the graph. The mappings are placed within
the aes function, which stands for aesthetics. Let’s start by looking at
the relationship between age and medical expenses.

# specify dataset and mapping

library(ggplot2)
ggplot(data = insurance,
mapping = aes(x = age, y = expenses))

3.1.2 Geoms:
Geoms are the geometric objects (points, lines, bars, etc.) that can be
placed on a graph. They are added using functions that start
with geom_. In this example, we’ll add points using
the geom_point function, creating a scatterplot.
In ggplot2 graphs, functions are chained together using the + sign to
build a final plot.

# add points
ggplot(data = insurance,
mapping = aes(x = age, y = expenses)) +
geom_point()
A number of parameters (options) can be specified in a geom_ function.
Options for the geom_point function include color, size, and alpha. These
control the point color, size, and transparency, respectively. Transparency
ranges from 0 (completely transparent) to 1 (completely opaque). Adding a
degree of transparency can help visualize overlapping points.

# make points blue, larger, and semi-transparent

ggplot(data = insurance,
mapping = aes(x = age, y = expenses)) +
geom_point(color = "cornflowerblue",
alpha = .7,
size = 2)
3.1.3 Grouping:

In addition to mapping variables to the x and y axes, variables can be

mapped to the color, shape, size, transparency, and other visual
characteristics of geometric objects. This allows groups of
observations to be superimposed in a single graph.

# indicate sex using color

ggplot(data = insurance,
mapping = aes(x = age,
y = expenses,
color = smoker)) +
geom_point(alpha = .5,
size = 2) +
geom_smooth(method = "lm",
se = FALSE,
size = 1.5)
The color = smoker option is place in the aes function, because we are
mapping a variable to an aesthetic (a visual characteristic of the graph). The
geom_smooth option (se = FALSE) was added to suppresses the confidence
intervals.

3.1.4 Scales:

Scales control how variables are mapped to the visual characteristics

of the plot. Scale functions (which start with scale_) allow you to
modify this mapping. In the next plot, we’ll change the x and y axis
scaling, and the colors employed.

# modify the x and y axes and specify the colors to be used

ggplot(data = insurance,
mapping = aes(x = age,
y = expenses,
color = smoker)) +
geom_point(alpha = .5,
size = 2) +
geom_smooth(method = "lm",
se = FALSE,
size = 1.5) +
scale_x_continuous(breaks = seq(0, 70, 10)) +
scale_y_continuous(breaks = seq(0, 60000, 20000),
label = scales::dollar) +
scale_color_manual(values = c("indianred3",
"cornflowerblue"))

3.1.5 facets:

Facets reproduce a graph for each level a given variable (or pair of
variables). Facets are created using functions that start with facet_.
Here, facets will be defined by the two levels of the obese variable.

# reproduce plot for each obsese and non-obese individuals

ggplot(data = insurance,
mapping = aes(x = age,
y = expenses,
color = smoker)) +
geom_point(alpha = .5) +
geom_smooth(method = "lm",
se = FALSE) +
scale_x_continuous(breaks = seq(0, 70, 10)) +
scale_y_continuous(breaks = seq(0, 60000, 20000),
label = scales::dollar) +
scale_color_manual(values = c("indianred3",
"cornflowerblue")) +
facet_wrap(~obese)

3.1.6 labels:
Graphs should be easy to interpret and informative labels are a key element in
achieving this goal. The labs function provides customized labels for the axes
and legends. Additionally, a custom title, subtitle, and caption can be added.
# add informative labels
ggplot(data = insurance,
mapping = aes(x = age,
y = expenses,
color = smoker)) +
geom_point(alpha = .5) +
geom_smooth(method = "lm",
se = FALSE) +
scale_x_continuous(breaks = seq(0, 70, 10)) +
scale_y_continuous(breaks = seq(0, 60000, 20000),
label = scales::dollar) +
scale_color_manual(values = c("indianred3",
"cornflowerblue")) +
facet_wrap(~obese) +
labs(title = "Relationship between patient demographics and medical
costs",
subtitle = "US Census Bureau 2013",
caption = "source: [Link]
x = " Age (years)",
y = "Annual expenses",
color = "Smoker?")

3.1.7 Themes:
Finally, we can fine tune the appearance of the graph using themes. Theme
functions (which start with theme_) control background colors, fonts, grid-
lines, legend placement, and other non-data related features of the graph. Let’s
use a cleaner theme.
# use a minimalist theme
ggplot(data = insurance,
mapping = aes(x = age,
y = expenses,
color = smoker)) +
geom_point(alpha = .5) +
geom_smooth(method = "lm",
se = FALSE) +
scale_x_continuous(breaks = seq(0, 70, 10)) +
scale_y_continuous(breaks = seq(0, 60000, 20000),
label = scales::dollar) +
scale_color_manual(values = c("indianred3",
"cornflowerblue")) +
facet_wrap(~obese) +
labs(title = "Relationship between age and medical expenses",
subtitle = "US Census Data 2013",
caption = "source: [Link]
x = " Age (years)",
y = "Medical Expenses",
color = "Smoker?") +
theme_minimal()

3.2 Placing the data AND mapping option:

Plots created with ggplot2 always start with the ggplot function. In the
examples above, the data and mapping options were placed in this function.
In this case they apply to each geom_ function that follows. You can also
place these options directly within a geom. In that case, they only apply
only to that specific geom.

# placing color mapping in the ggplot function

ggplot(insurance,
aes(x = age,
y = expenses,
color = smoker)) +
geom_point(alpha = .5,
size = 2) +
geom_smooth(method = "lm",
se = FALSE,
size = 1.5)

Since the mapping of the variable smoker to color appears in

the ggplot function, it applies to both geom_point and geom_smooth. The
point color indicates the smoker status, and a separate colored trend line is
produced for smokers and non-smokers. Compare this to:
# placing color mapping in the geom_point function
ggplot(insurance,
aes(x = age,
y = expenses)) +
geom_point(aes(color = smoker),
alpha = .5,
size = 2) +
geom_smooth(method = "lm",
se = FALSE,
size = 1.5)

3.3 Graphs as Objects:

A ggplot2 graph can be saved as a named R object (like a data frame),

manipulated further, and then printed or saved to disk.

# create scatterplot and save it

myplot <- ggplot(data = insurance,
aes(x = age, y = expenses)) +
geom_point()

# plot the graph

myplot
# make the points larger and blue
# then print the graph
myplot <- myplot + geom_point(size = 2, color = "blue")
myplot

# print the graph with a title and line of best fit

# but don't save those changes
myplot + geom_smooth(method = "lm") +
labs(title = "Mildly interesting graph")

# print the graph with a black and white theme

# but don't save those changes
myplot + theme_bw()

This can be a real time saver (and help you avoid carpal tunnel syndrome). It
is also handy when saving graphs programmatically.

Data Visualization Notes-2
No ratings yet
Data Visualization Notes-2
223 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
R Script for Data Import, Export & Visualization
No ratings yet
R Script for Data Import, Export & Visualization
57 pages
R Programming Assignments Overview
No ratings yet
R Programming Assignments Overview
12 pages
R Data Types and Plotting Techniques
No ratings yet
R Data Types and Plotting Techniques
9 pages
R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
Data Manipulation and Visualization in R
No ratings yet
Data Manipulation and Visualization in R
58 pages
R File Code
No ratings yet
R File Code
16 pages
DSCI 100 Cheat Sheet
No ratings yet
DSCI 100 Cheat Sheet
3 pages
R Statistical Analysis: Matrices & Data Frames
No ratings yet
R Statistical Analysis: Matrices & Data Frames
21 pages
R Guru Cheat Sheet
No ratings yet
R Guru Cheat Sheet
2 pages
Nutrition Calculator for Recipes
No ratings yet
Nutrition Calculator for Recipes
16 pages
Data Processing Techniques in R
No ratings yet
Data Processing Techniques in R
3 pages
R Workshop: Data Manipulation & Analysis
No ratings yet
R Workshop: Data Manipulation & Analysis
3 pages
ANOVA Analysis with R Programming
No ratings yet
ANOVA Analysis with R Programming
32 pages
Data Manipulation Techniques in R
No ratings yet
Data Manipulation Techniques in R
32 pages
Week5 Slides
No ratings yet
Week5 Slides
72 pages
Unit 2
No ratings yet
Unit 2
76 pages
Data Exploration and Analysis Techniques
No ratings yet
Data Exploration and Analysis Techniques
23 pages
R Data Analytics Lab Experiments
No ratings yet
R Data Analytics Lab Experiments
25 pages
Data Import, Cleaning, and Analysis Guide
No ratings yet
Data Import, Cleaning, and Analysis Guide
33 pages
Advanced R Data Manipulation Techniques
No ratings yet
Advanced R Data Manipulation Techniques
5 pages
R Data Structures and Plotting Basics
No ratings yet
R Data Structures and Plotting Basics
14 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
Lecture 5 (Managing and Understanding Data)
No ratings yet
Lecture 5 (Managing and Understanding Data)
9 pages
RStudio Tips and Common Functions Guide
No ratings yet
RStudio Tips and Common Functions Guide
7 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
Advance Data Exploration 27 Feb
No ratings yet
Advance Data Exploration 27 Feb
32 pages
Day 2
No ratings yet
Day 2
5 pages
Fda SSIGNMENT 02
No ratings yet
Fda SSIGNMENT 02
13 pages
R Basics: Math, Data Frames, Analysis
No ratings yet
R Basics: Math, Data Frames, Analysis
18 pages
R Topicscovered
No ratings yet
R Topicscovered
22 pages
R For Health Data Science
100% (2)
R For Health Data Science
365 pages
Basics of Data Analysis and Graphics in
No ratings yet
Basics of Data Analysis and Graphics in
103 pages
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
No ratings yet
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
8 pages
R Plots: Box, Bar, Scatter, Histogram, Pie
No ratings yet
R Plots: Box, Bar, Scatter, Histogram, Pie
21 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
R Software Statistical Computing Guide
No ratings yet
R Software Statistical Computing Guide
67 pages
R Statistical Modelling Lab Manual
No ratings yet
R Statistical Modelling Lab Manual
23 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
Rtips: Essential R Programming Tips
No ratings yet
Rtips: Essential R Programming Tips
72 pages
Data Science Practical Completion Report
No ratings yet
Data Science Practical Completion Report
31 pages
R Studio Basics: Data Mining & Operations
No ratings yet
R Studio Basics: Data Mining & Operations
7 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Unit 2
No ratings yet
Unit 2
32 pages
R Basics: Data Frames, Functions & Plots
No ratings yet
R Basics: Data Frames, Functions & Plots
10 pages
Data Science for Civil Engineers: Exploration
No ratings yet
Data Science for Civil Engineers: Exploration
13 pages
S24 Stats10 Lab1-1
No ratings yet
S24 Stats10 Lab1-1
8 pages
R Programming Basics and Data Structures
No ratings yet
R Programming Basics and Data Structures
15 pages
Essential R Codes for Data Analysis
No ratings yet
Essential R Codes for Data Analysis
13 pages
Essential R Packages and Functions Guide
No ratings yet
Essential R Packages and Functions Guide
9 pages
Advanced R Data Analysis Training PDF
100% (1)
Advanced R Data Analysis Training PDF
72 pages
Advance R Prog.-1
No ratings yet
Advance R Prog.-1
24 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
R Lab: Basic Graphs & Statistics
No ratings yet
R Lab: Basic Graphs & Statistics
7 pages
Physics Short Notes
No ratings yet
Physics Short Notes
56 pages
Question 8
No ratings yet
Question 8
2 pages
Nature and Elements of Communication
No ratings yet
Nature and Elements of Communication
46 pages
CÂU HỎI ÔN TẬP
No ratings yet
CÂU HỎI ÔN TẬP
2 pages
1959 International Relations The Long Road To Theory
No ratings yet
1959 International Relations The Long Road To Theory
33 pages
6th To 8th Exam Schedule 2025-26
No ratings yet
6th To 8th Exam Schedule 2025-26
2 pages
Win 11 Security - Part 1
No ratings yet
Win 11 Security - Part 1
7 pages
Capacitance Type 2 PART 1 of 2 ENG
No ratings yet
Capacitance Type 2 PART 1 of 2 ENG
43 pages
Sports Writing Techniques for Filipinos
No ratings yet
Sports Writing Techniques for Filipinos
90 pages
Manual For Breast Massage
100% (2)
Manual For Breast Massage
17 pages
Corporate Trainer & Coach Sangameshwar Swamy
No ratings yet
Corporate Trainer & Coach Sangameshwar Swamy
3 pages
Stata Cheatsheet Top30
No ratings yet
Stata Cheatsheet Top30
1 page
The Complete Breakout Trader Day Trading John Connors PDF
86% (37)
The Complete Breakout Trader Day Trading John Connors PDF
118 pages
OWASP Top 10 Vulnerabilities 2021
No ratings yet
OWASP Top 10 Vulnerabilities 2021
3 pages
Hydraulic Scheme AD-S 30220 Parts List
100% (1)
Hydraulic Scheme AD-S 30220 Parts List
2 pages
Baf 3201
No ratings yet
Baf 3201
126 pages
AG Aluminium Filter Housing Datasheet
No ratings yet
AG Aluminium Filter Housing Datasheet
4 pages
Instruction For AVIC F-Series In-Dash 2.00600 Firmware Update
100% (1)
Instruction For AVIC F-Series In-Dash 2.00600 Firmware Update
7 pages
Japanese Candles
No ratings yet
Japanese Candles
3 pages
15 Illus
No ratings yet
15 Illus
17 pages
PATNI Computers Placement Paper Guide
No ratings yet
PATNI Computers Placement Paper Guide
11 pages
Sikafloor®-160-161-263SL-264 HC Comp. B-1
No ratings yet
Sikafloor®-160-161-263SL-264 HC Comp. B-1
12 pages
BP World Energy Review 2004 Data
No ratings yet
BP World Energy Review 2004 Data
39 pages
Module 3 in KM
No ratings yet
Module 3 in KM
19 pages
Coagulation Physiology & Hemorrhagic Disorders
No ratings yet
Coagulation Physiology & Hemorrhagic Disorders
36 pages
Inplant Trainging Report
No ratings yet
Inplant Trainging Report
8 pages
Kitchen Appliances and Cooktops Guide
No ratings yet
Kitchen Appliances and Cooktops Guide
17 pages
ISPS Code Overview for Maritime Operators
No ratings yet
ISPS Code Overview for Maritime Operators
11 pages
Piagnoni: Followers of Savonarola
No ratings yet
Piagnoni: Followers of Savonarola
2 pages
Grade 9: Atomic Structure & Bonding
No ratings yet
Grade 9: Atomic Structure & Bonding
4 pages

DP Unit1 Notes

Uploaded by

DP Unit1 Notes

Uploaded by

Data Preparation And Visualization

# import data from a comma delimited file

# import data from a tab delimited file

1.2 Excel spreadsheet:

# import data from an Excel workbook

1.3 Statistical Packages:

The haven package provides functions for importing

# import data from Stata

# import data from SPSS

# import data from SAS

Package Function Use

dplyr select select variables/columns

dplyr filter select observations/rows

dplyr mutate transform or recode variables

dplyr summarize summarize data

dplyr group_by identify subgroups for further processing

tidyr gather convert wide format dataset to long format

tidyr spread convert long format dataset to wide format

2.1 selecting variables:

The select function allows you to limit your dataset to specified

# keep the variables name, height, and gender

# keep the variables name and all variables

# keep all variables except birth_year and gender

2.2 selecting observations:

# select females that are from Alderaan

# select individuals that are from Alderaan, Coruscant, or Endor

2.3 Creating/recording variables:

# convert height in centimeters to inches,

# if height is greater than 180 then heightcat = "tall",

newdata <- mutate(starwars,

# set heights greater than 200 or less than 75 to missing

2.4 Summarizing data:

# calculate mean height and mass

2.5 Using pipes:

# calculate the mean height for women by species

# this can be written as more succinctly as

2.6 Processing data:

Date values are entered in R as character values. For example,

2.7 Reshaping data:

id name sex height w

id name sex variable

01 Bill Male height

01 Bill Male weight

02 Bob Male height

02 Bob Male weight

03 Mary Female height

03 Mary Female weight

2.8 Missing data:

This chapter provides an brief overview of how the ggplot2 package

3.1 worked example:

The functions in the ggplot2 package build up a graph in layers. We’ll

# load the data

# create an obesity variable

# specify dataset and mapping

# make points blue, larger, and semi-transparent

In addition to mapping variables to the x and y axes, variables can be

# indicate sex using color

Scales control how variables are mapped to the visual characteristics

# modify the x and y axes and specify the colors to be used

# reproduce plot for each obsese and non-obese individuals

3.2 Placing the data AND mapping option:

# placing color mapping in the ggplot function

Since the mapping of the variable smoker to color appears in

3.3 Graphs as Objects:

A ggplot2 graph can be saved as a named R object (like a data frame),

# create scatterplot and save it

# plot the graph

# print the graph with a title and line of best fit

# print the graph with a black and white theme

You might also like