Coursera Notes

Uploaded by

christineanne.28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views4 pages

Coursera Notes

Uploaded by

christineanne.28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

R PACKAGES FOR DATA SCIENCE

MODULE 1 ● Bundles together code, data.

Documentation, and tests
INTRODUCTION TO DATA ANALYSIS
WITH R ● Tidyverse Library: collection of
essential R packages for data science
Value derived from data depends on
Four Steps of Data Analysis
● Accuracy of Data
● Data Wrangling and Transformation
● Accessibility of data when we need it
○ Includes: dplyr and tidyr package
Data Asset eXchange (DAX)
○ Combine different functions using
● Curated free and open datasets under
pipe operator
open data licenses
● Data Import and Management
● Provides real-world data
○ Includes: readr package
● Vetted data
○ Solves problem of parsing a flat file
● Ready to use in the enterprise
(.csv file)
● Part of developer.ibm.com
● Functional Programming
● ibm.biz/data-exchange
○ Includes: purr package
○ Provides statistics for the dataset
WHY DATA ANALYSIS (calculating mean value for each
● Data is everywhere column)
● Helps us answer questions from data ● Data Visualization and Exploration
● Plays an important role in ○ Includes: ggplot2 package
○ Discovering useful information ○ Produces charts and visualization
○ Answering questions (box plots, density plots, violin pots,
○ Predicting the future or the unknown tile pots, and time series plots)
The Problem Basic Syntax of the dplyr package
● Can you predict the likelihood of a flight ● select (): select variables by their
delay? names
● Data Analysis using R libraries for ● filter (): filter observations based on
○ Data Cleaning values
○ Exploratory Analysis ● summarize (): compute summary
○ Model Development statistics
○ Model Evaluation ● arrange (): reorder the rows
● mutate (): create new variables

UNDERSTANDING THE DATA

IMPORTING & EXPORTING DATA IN R
● Dataset-Airline Performance
○ From Data Asset eXchange Importing Data
○ In (.csv) format ● Process of loading and reading data into
● Variables in the Dataset R from various resources
○ Performing statistical analysis on ● Important Factors:
selected columns from original data ○ Format of the File (.csv, .json, .xlsx,
set .hdf)
○ File Path of the Dataset
(computer/online)
Download and Extract Data Export to Different Formats in R
● Each row is one data point (observation)
● Many properties associated with each
point
● .csv Data Format: properties are
separated from each other by commas
Load the Package in E ANALYZING DATA IN R
● Install the package
Basic Insights from the Data
● Load the tidyverse library
● Understand your dat before you begin
○ Automatically loads the readr
any analysis
package
● Check:
Import .csv Files (readr package)
○ Variable Data Types
● Includes the read_csv() function
○ Data Distribution
● Tibble: used to read .csv files into a
● Identify potential issues with data
data frame
Basic Insights of a Dataset
● Pass the location of data you want to
● Known data types in tidyverse
use to the read_csv() function as a
○ Character, date, double, integer, and
filename
logical
Help Page
● glimpse() function: determines types of
● Adding question mark before function
variables in your dataset
name
● Shows number of rows and columns in
● Documentation: includes arguments for
the dataset
the function and examples of their use
● Importance of Checking Data Types
Read the Dataset from a URL
○ Potential information and type
● Define a variable that contains the URL
mismatch
path to the file
○ Compatibility with tidyverse functions
● Download the file locally using the
R doppler::summarize(), group_by()
download file() function
● Return a statistical summary of the data
○ First Argument: URL variable
○ Statistical Metrics: tells mathematical
○ Second Argument: local name for
issues (extreme outliers and large
downloaded file
deviations)
● Unzip the content using the untar()
function
● Read the data from the local file using
the read_csv() function
Print the Data in R
● HeadFunction: shows the first 6 rows of
data frame
● Tail Function: shows the bottom 6 rows
of data frame
● Export it to a new .csv file (optional)
○ Use the write_csv() function
How to Replace Missing Values in R
MODULE 2
● Use replace_na()
DATA WRANGLING

PRE-PROCESSING DATA IN R DATA FORMATTING IN R

Data Pre-Processing Data Formatting

● Converting or mapping data from the ● Data collected from different places and
initial raw form into another format stored in different formats
● Data cleaning or data wrangling ● Bringing data into a common standard of
Simple Data Frame Operations expression to make meaningful
● Perform data frame operations along comparisons
columns, wit beach row of the column ● Coherence: in statistics, an indication of
representing a sample the quality of the information within a
single dataset
Reformat an Entire Column
DEALING WITH MISSING VALUES IN R ● -
Missing Values Incorrect Data Types
● Missing values occur when no data ● -
value is stored for a variable in an
observation
DATA NORMALIZATION IN R
● Represented as “?”, “N/A”, 0 or just a
blank cell Data Normalization
How to Deal with Missing Data ● -
● Check with the data collection source Methods of Normalizing Data
● Drop the missing values ● -
○ Drop the variable Simple Feature Scaling in R
○ Drop the data entry ● -
● Replace the missing values Min-Max in R
○ Replace it with an average (similar ● -
data points) Z-score in R
○ Replace it with zero (frequency) ● -
○ Replace it based on other functions
● Leave it as missing data BINNING IN R
How to Check Missing Values in R
Data Normalization
● Use is.na() to count the number of
● -
missing values in columns
Methods of Normalizing Data
Drop Rows
● -
● Hyphen “-” operator: complement of a
Simple Feature Scaling in R
set in R
● -
How to Drop Missing Values in R
Min-Max in R
● Use drop_na()
● -
● Specify column names that contain
Z-score in R
missing values that you want to drop
-
Simple Feature Scaling in R
● -
●

Snowflake Notes
100% (9)
Snowflake Notes
67 pages
T24 Data Migration Tool - DMIG T24 - Reference Guide V1.3
100% (5)
T24 Data Migration Tool - DMIG T24 - Reference Guide V1.3
165 pages
Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
2.0 Mesa Training Manual
80% (5)
2.0 Mesa Training Manual
121 pages
DanBuss DNIP
No ratings yet
DanBuss DNIP
26 pages
Planet User Guide
No ratings yet
Planet User Guide
574 pages
Big Data - Lab 3
No ratings yet
Big Data - Lab 3
25 pages
Data Preparation: Handling Missing Values and Outliers
No ratings yet
Data Preparation: Handling Missing Values and Outliers
28 pages
INF30036 DataTypes Lecture2-1
No ratings yet
INF30036 DataTypes Lecture2-1
42 pages
Data Preparation: Treatment of Missing Values
No ratings yet
Data Preparation: Treatment of Missing Values
26 pages
Unit2
No ratings yet
Unit2
76 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
lect01-2
No ratings yet
lect01-2
19 pages
Unit 2
No ratings yet
Unit 2
29 pages
Week13 Slides Review
No ratings yet
Week13 Slides Review
23 pages
Module 7_(Data Analysis with R Programming)
No ratings yet
Module 7_(Data Analysis with R Programming)
18 pages
Week2 Cheat Sheet Data Wrangling With Tidyverse
No ratings yet
Week2 Cheat Sheet Data Wrangling With Tidyverse
4 pages
Data Wrangling
No ratings yet
Data Wrangling
18 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
No ratings yet
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
26 pages
02-Data Gathering and Preparation
No ratings yet
02-Data Gathering and Preparation
54 pages
ProgrammingForDS14_Rbasics
No ratings yet
ProgrammingForDS14_Rbasics
32 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
Introduction To R Programming 1691124649
No ratings yet
Introduction To R Programming 1691124649
79 pages
Data Science Wrangling
No ratings yet
Data Science Wrangling
121 pages
2 Manipulating Processing Data
No ratings yet
2 Manipulating Processing Data
81 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
Introduction To R
No ratings yet
Introduction To R
34 pages
Module I
No ratings yet
Module I
74 pages
CLASS ONE
No ratings yet
CLASS ONE
66 pages
Module 5-6
No ratings yet
Module 5-6
12 pages
MBA Sem 1 Unit 3 Fundamentals of R (1)
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R (1)
41 pages
Unit - I: Topic - 1
No ratings yet
Unit - I: Topic - 1
13 pages
Data Cleansing Using R
0% (1)
Data Cleansing Using R
10 pages
Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell - Read Online Or Download Now
100% (7)
Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell - Read Online Or Download Now
35 pages
Data_analysis_with_R _24
No ratings yet
Data_analysis_with_R _24
47 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
01 IntroSlides
No ratings yet
01 IntroSlides
43 pages
R Studio
No ratings yet
R Studio
15 pages
R Exercises For Modules
100% (1)
R Exercises For Modules
41 pages
R1_uptoVisualisation
No ratings yet
R1_uptoVisualisation
122 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
R Tutorial #1: Applied Econometrics (Econ3005)
No ratings yet
R Tutorial #1: Applied Econometrics (Econ3005)
21 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
A Brief Introduction To R
No ratings yet
A Brief Introduction To R
17 pages
Chapter 03 Wrangling
No ratings yet
Chapter 03 Wrangling
40 pages
Lesson 1 Introduction To Data Science
No ratings yet
Lesson 1 Introduction To Data Science
43 pages
CS ELEC 4 Midterm Module
No ratings yet
CS ELEC 4 Midterm Module
59 pages
data analysis
No ratings yet
data analysis
42 pages
Statistics-with-R
No ratings yet
Statistics-with-R
10 pages
Agenda: 1) Assign Homework #1 (Due Wednesday 6/30) 2) Lecture Over More of Chapter 2
No ratings yet
Agenda: 1) Assign Homework #1 (Due Wednesday 6/30) 2) Lecture Over More of Chapter 2
43 pages
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
No ratings yet
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
22 pages
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
No ratings yet
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
17 pages
Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell pdf download
100% (3)
Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell pdf download
40 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Data Analyses R Manual NYTS
No ratings yet
Data Analyses R Manual NYTS
24 pages
Week6 Slides Updated
No ratings yet
Week6 Slides Updated
57 pages
Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell pdf download
100% (2)
Solution Manual for Using Multivariate Statistics 7th Edition Barbara G. Tabachnick, Linda S. Fidell pdf download
47 pages
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Oracle Mania - Etext Template Designing For Beginers
No ratings yet
Oracle Mania - Etext Template Designing For Beginers
5 pages
Geological Software (DR Asem Ahmed Hassan) PDF
100% (1)
Geological Software (DR Asem Ahmed Hassan) PDF
67 pages
Nanoscale Research Letters - Instructions For Authors - Nano Reviews
No ratings yet
Nanoscale Research Letters - Instructions For Authors - Nano Reviews
6 pages
Sapscript Forms
No ratings yet
Sapscript Forms
8 pages
FLUENT .MSH File Format
No ratings yet
FLUENT .MSH File Format
16 pages
Apple Loops Utility: User Manual
No ratings yet
Apple Loops Utility: User Manual
18 pages
Do Data Loss Prevention Systems Really Work?: Sara Ghorbanian, Glenn Fryklund, Stefan Axelsson
No ratings yet
Do Data Loss Prevention Systems Really Work?: Sara Ghorbanian, Glenn Fryklund, Stefan Axelsson
18 pages
Filetype PDF Applied Financial Economics
No ratings yet
Filetype PDF Applied Financial Economics
2 pages
ISOWorx Manual - CADWorx & Analysis Solutions - Pdforx Manual - CADWorx & Analysis Solutions
100% (1)
ISOWorx Manual - CADWorx & Analysis Solutions - Pdforx Manual - CADWorx & Analysis Solutions
25 pages
1756-L75 Redundant - B - 30.051 - Kit1 (Released 2 - 2017) PDF
No ratings yet
1756-L75 Redundant - B - 30.051 - Kit1 (Released 2 - 2017) PDF
8 pages
#Instruction Sets V2.5RC-20240208
100% (1)
#Instruction Sets V2.5RC-20240208
83 pages
Cheatsheet Yara
No ratings yet
Cheatsheet Yara
7 pages
How To Use Gaussian 03 For Windows
No ratings yet
How To Use Gaussian 03 For Windows
4 pages
PH Razor
No ratings yet
PH Razor
81 pages
FLAC
No ratings yet
FLAC
82 pages
Envizi L4 POX - Interval Meter Analytics - Universal Interval Consumption Meter Connector
No ratings yet
Envizi L4 POX - Interval Meter Analytics - Universal Interval Consumption Meter Connector
11 pages
Lakes Wrplot View Release Notes 7 PDF
No ratings yet
Lakes Wrplot View Release Notes 7 PDF
9 pages
Sap R/3 Idoc Cookbook For Edi and Interfaces
No ratings yet
Sap R/3 Idoc Cookbook For Edi and Interfaces
230 pages
Presentation - 6371 - CV6371 - Driving BIM Forward
No ratings yet
Presentation - 6371 - CV6371 - Driving BIM Forward
29 pages
Ase17 Paper Burns PDF
No ratings yet
Ase17 Paper Burns PDF
9 pages
CoroGuide Users Manual
No ratings yet
CoroGuide Users Manual
9 pages
An Overview of Multimedia
No ratings yet
An Overview of Multimedia
14 pages
CAD
No ratings yet
CAD
224 pages
Lap Trinh VBA For Word
No ratings yet
Lap Trinh VBA For Word
221 pages
Formal Letter Format Request
100% (1)
Formal Letter Format Request
7 pages

Coursera Notes

Uploaded by

Coursera Notes

Uploaded by

R PACKAGES FOR DATA SCIENCE

MODULE 1 ● Bundles together code, data.

UNDERSTANDING THE DATA

PRE-PROCESSING DATA IN R DATA FORMATTING IN R

Data Pre-Processing Data Formatting

You might also like