Introduction to R
Dr. Manisha Verma
Resources
Data Science and Big Data Analytics: Discovering, Analyzing,
Visualizing and Presenting Data, EMC Education Services, John
Willey & Sons, 2015
Chapter 3
• Data
• The annual sales in U.S. dollars for 10,000 retail customers in a CSV file
displays the
first six
records of
sales
n
• Mean (algebraic measure) (sample vs. 1
population):
x
n
i 1
xi
Note: n is sample size and N is
x
population size. N
• Median:
• Middle value if odd number of values,
or average of the middle two values
otherwise
5
• Quartiles, outliers and boxplots
• Quartiles: Q1 (25th percentile), Q3 (75th percentile)
• Inter-quartile range: IQR = Q3 – Q1
• Five number summary: min, Q1, median, Q3, max
• Outlier: usually, a value higher/lower than 1.5 x IQR
6
• Generic Functions in R
• A group of functions sharing the same name but behaving differently
depending on the number and the type of arguments they receive.
print(5)
print("Hello")
• Both use print() but behave differently.
• R automatically calls [Link]() for numbers and [Link]() for
strings
• plot() is determined by the passed variables
• summary()
• Help in R
R Graphical User Interfaces
• R software uses a command-line interface (CLI)
• Popular GUIs e.g. R commander, Rattle, RStudio
R Graphical User Interfaces
• RStudio
• Scripts
• Workspace
• Plots
• Console
Data Import and Export
• Read from CSV
• Set path
• Read from other files such as TXT
• Import function default values
• Writing file
Attribute and Data Types
• NOIR
• Numeric, Character, and Logical Data Types
• Functions to examine characteristics of variable
• class(): What kind of object it is in R (its abstract type or how R will treat
it).
• typeof(): How the object is stored in memory (its internal storage type).
• Test variables and coerce
• Length: find the number of elements in a vector, list, or other R
object.
• Vectors
Vectors are a basic building block for data in R. Simple R variables are
actually vectors. A vector can only consist of values in the same class.
• Create vectors
using combine function c(), using colon operator :
• Initialize a vector of a length
• Vector has no dimension
• Arrays
• array(): Creates or tests for arrays.
• Matrix: matrix() creates a matrix from the given set of values.
• Matrix operations +, -, * (elementwise multiplication), %*% (matrix
multiplication), t() transpose, solve() inverse, sum() sum of all elements
• Data Frames: provide a structure for storing and accessing several
variables of possibly different data types.
• [Link]() creates data frames
• $ : access the variables in data frame
• Structure of data frame
• Subsetting operator to extract part of data frame