chapter 3 p2
chapter 3 p2
Comments:-
• comments are like helping text in your R program and they are
ignore by the interpreter while executing actual program.
• single comment is write using hash symbol(#) in the beginning of
the statement as follows
# my first program in R programming
•
R does not support multiline comments but you can perform trick
which is something as follows
if ( false){
“ this is a demo for multiline comments and it should be put inside either
single or double quote”
}
myString<-” hi welcome!”
print( myString)
[1] “ I welcome
Essentials of R programming
Objects of R:-
• R has five basic or atomic classes of object.
• everything you see or created in R is an object.
• A Vector, matrix, data frame even a variable is an object.
• R treat it that way. so has five basic classes of object which are as
follows:
1) character
2) numeric( real number)
3) integer( whole numbers)
4) complex
5) logical( true/ false)
Attributes:-
these classes have attributes. think of attributes as their identifier,
name or number which appropriately identify them.
an object can have following attribute
1) names, dimension names
2) dimension
3)class
4) length
Numeric 12.3, 5, 999 v <- 23.5 print(class(v))it produces the following result −
[1] "numeric"
Character 'a' , '"good", "TRUE", '23.4' v <- "TRUE" print(class(v))it produces the following result −
[1] "character"
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
[[2]]
[1] 21.3
[[3]]
function (x) .Primitive("sin")
Matrices
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow =
TRUE)
print(M)
When we execute the above code, it produces the
following result −
,,2
if else repeat
NaN NA NA_integer_
1.Arithmetic Operators
2.Relational Operators
3.Logical Operators
4.Assignment Operators
5.Miscellaneous Operators
Arithmetic Operators
S. No Operator Description Example
1. + This operator is used to add two vectors in R. a <- c(2, 3.3, 4) b <- c(11, 5, 3) print(a+b) It will give us
the following output: [1] 13.0 8.3 5.0
2. - This operator is used to divide a vector from another one. a <- c(2, 3.3, 4) b <- c(11, 5, 3) print(a-b) It will give us
the following output: [1] -9.0 -1.7 3.0
3. * This operator is used to multiply two vectors with each other. a <- c(2, 3.3, 4) b <- c(11, 5, 3) print(a*b) It will give us
the following output: [1] 22.0 16.5 4.0
4. / This operator divides the vector from another one. a <- c(2, 3.3, 4) b <- c(11, 5, 3) print(a/b)It will give us
the following output: [1] 0.1818182
0.6600000 4.0000000
5. %% This operator is used to find the remainder of the first vector with the second b <- c(11, 5, 3) print(a%%b) It will give
vector. a <- c(2, 3.3, 4) us the following output: [1] 2.0 3.3 0
6. %/% This operator is used to find the division of the first vector with the a <- c(2, 3.3, 4) b <- c(11, 5, 3)
second(quotient). print(a%/%b) It will give us the
following output: [1] 0 0 4
7. ^ This operator raised the first vector to the exponent of the second vector. a <- b <- c(11, 5, 3) print(a^b) It will give us
c(2, 3.3, 4) the following output: [1] 0248.0000
391.3539 4.0000
Relational Operators
A relational operator is a symbol which defines
some kind of relation between two entities.
These include numerical equalities and inequalities.
2. < This operator will return TRUE when a <- c(1, 9, 5) b <- c(2, 4, 6) print(a<b) It will give us
every element in the first vector is the following output: [1] FALSE TRUE FALSE
less then the corresponding element
of the second vector.
3. <= This operator will return TRUE when a <- c(1, 3, 5) b <- c(2, 3, 6) print(a<=b) It will give us
every element in the first vector is the following output: [1] TRUE TRUE TRUE
less than or equal to the
corresponding element of another
vector.
4. >= This operator will return TRUE when a <- c(1, 3, 5) b <- c(2, 3, 6) print(a>=b) It will give us
every element in the first vector is the following output: [1] FALSE TRUE FALSE
greater than or equal to the
corresponding element of another
vector.
5. == This operator will return TRUE when a <- c(1, 3, 5) b <- c(2, 3, 6) print(a==b) It will give us
every element in the first vector is the following output:[1] FALSE TRUE FALSE
equal to the corresponding element of
the second vector.
6. != This operator will return TRUE when a <- c(1, 3, 5) b <- c(2, 3, 6) print(a>=b) It will give us
every element in the first vector is the following output: [1] TRUE FALSE TRUE
not equal to the corresponding
element of the second vector.
Logical Operators
The logical operators allow a program to make a
decision on the basis of multiple conditions.
In the program, each operand is considered as a
condition which can be evaluated to a false or true
value.
The value of the conditions is used to determine
the overall value of the op1 operator op2.
Logical operators are applicable to those vectors
whose type is logical, numeric, or complex.
The logical operator compares each element of the
first vector with the corresponding element of the
second vector.
There are the following types of operators which
are supported by R:
S. Operator Description Example
No
1. & This operator is known as the Logical AND operator. Each bit represent separate result. a <- c(3, 0, TRUE, 2+2i) b <- c(2,
When two bits are true then it gives result true otherwise it gives result False 4, TRUE, 2+3i) print(a&b) It will
give us the following output:
[1] TRUE FALSE TRUE TRUE
2. | This operator is called the Logical OR operator. Each bit represent separate result. When two a <- c(3, 0, TRUE, 2+2i) b <- c(2,
bits are false then it gives result false otherwise it will give result True. 4, TRUE, 2+3i) print(a|b) It will
give us the following output:
[1] TRUE TRUE TRUE TRUE
3. ! This operator is known as Logical NOT operator. This operator takes the first element of the a <- c(3, 0, TRUE, 2+2i) print(!a)
vector and gives the opposite logical value as a result. It will give us the following
output: [1] FALSE TRUE FALSE
FALSE
4. && This operator takes the first element of both the vector and gives TRUE as a result, only if a <- c(3, 0, TRUE, 2+2i) b <- c(2,
both are TRUE. 4, TRUE, 2+3i) print(a&&b) It
will give us the following
output: [1] TRUE
5. || This operator takes the first element of both the vector and gives the result TRUE, if one of a <- c(3, 0, TRUE, 2+2i) b <- c(2,
them is true. 4, TRUE, 2+3i) print(a||b) It will
give us the following output:
[1] TRUE
Truth table
p↑ p↓
p q p q
q q
T T F T T F
T F T T F F
F T T F T F
F F T F F T
Assignment Operators
An assignment operator is used to assign a new
value to a variable.
In R, these operators are used to assign values to
vectors.
There are the following types of assignment
S. Operator Description Example
No
1. <- or = or These operators are known as left a <- c(3, 0, TRUE, 2+2i) b <<- c(2, 4, TRUE, 2+3i) d = c(1, 2, TRUE,
<<- assignment operators. 2+3i) print(a) print(b) print(d) It will give us the following output: [1]
3+0i 0+0i 1+0i 2+2i [1] 2+0i 4+0i 1+0i 2+3i [1] 1+0i 2+0i 1+0i 2+3i
2. -> or ->> These operators are known as right c(3, 0, TRUE, 2+2i) -> a c(2, 4, TRUE, 2+3i) ->> b print(a) print(b) It will
assignment operators. give us the following output: [1] 3+0i 0+0i 1+0i 2+2i [1] 2+0i 4+0i
1+0i 2+3i
Miscellaneous Operators
Miscellaneous operators are used for a special and
specific purpose.
These operators are not used for general
mathematical or logical computation.
There are the following miscellaneous operators
which are supported in R
S. No Operator Description Example
1. : The colon operator is used to create the series v <- 1:8 print(v) It will give us the following output: [1] 1 2
of numbers in sequence for a vector. 345678
2. %in% This is used when we want to identify if an a1 <- 8 a2 <- 12 d <- 1:10 print(a1%in%d) print(a2%in%d) It
element belongs to a vector. will give us the following output: [1] FALSE [1] FALSE
3. %*% It is used to multiply a matrix with its M=matrix(c(1,2,3,4,5,6), nrow=2, ncol=3, byrow=TRUE)
transpose. T=m%*%T(m) print(T) It will give us the following output:
14 32 32 77
R - Variables
print(var.1)
cat ("var.1 is ", var.1 ,"\n")
cat ("var.2 is ", var.2 ,"\n")
cat ("var.3 is ", var.3 ,"\n")
When we execute the above code, it produces the
following result −
[1] 0 1 2 3
var.1 is 0 1 2 3
var.2 is learn R
var.3 is 1 1
Types:-
•Regression
•Logistic Regression
•Classification
•Naïve Bayes Classifiers
•Decision Trees
•Support Vector Machine
Advantages:-
• Supervised learning allows collecting data and produce data
output from the previous experiences.
• Helps to optimize performance criteria with the help of
experience.
• Supervised machine learning helps to solve various
types of real-world computation problems.
Disadvantages:-
• Classifying big data can be challenging.
• Training for supervised learning needs a lot of
computation time. So, it requires a lot of time.
Unsupervised learning
• Unsupervised learning is the training of machine using
information that is neither classified nor labeled and
allowing the algorithm to act on that information
without guidance.
Clustering Types:-
1.Hierarchical clustering
2.K-means clustering
3.K-NN (k nearest neighbours)
4.Principal Component Analysis
5.Singular Value Decomposition
6.Independent Component Analysis
Reinforcement learning
• Reinforcement learning is an area of Machine Learning
The above image shows the robot, diamond, and fire. The
goal of the robot is to get the reward that is the diamond and
avoid the hurdles that are fire. The robot learns by trying all
the possible paths and then choosing the path which gives
him the reward with the least hurdles. Each right step will
give the robot a reward and each wrong step will subtract
the reward of the robot. The total reward will be calculated
when it reaches the final reward that is the diamond.
Main points in Reinforcement learning –
Less Computational
Computational Complexity Very Complex in Computation
Complexity
# Structure
str(iris)
Performing K Nearest Neighbor on Dataset
Using the K-Nearest Neighbor algorithm on the dataset
which includes 11 persons and 6 variables or attributes.
# Installing Packages
install.packages("e1071")
install.packages("caTools")
install.packages("class")
# Loading package
library(e1071)
library(caTools)
library(class)
# Loading data
data(iris)
head(iris)
# Splitting data into train
# and test data
split <- sample.split(iris, SplitRatio = 0.7)
train_cl <- subset(iris, split == "TRUE")
test_cl <- subset(iris, split == "FALSE")
# Feature Scaling
train_scale <- scale(train_cl[, 1:4])
test_scale <- scale(test_cl[, 1:4])
#K=7
classifier_knn <- knn(train = train_scale,
test = test_scale,
cl = train_cl$Species,
k = 7)
misClassError <- mean(classifier_knn != test_cl$Species)
print(paste('Accuracy =', 1-misClassError))
# K = 15
classifier_knn <- knn(train = train_scale,
test = test_scale,
cl = train_cl$Species,
k = 15)
misClassError <- mean(classifier_knn != test_cl$Species)
print(paste('Accuracy =', 1-misClassError))
# K = 19
classifier_knn <- knn(train = train_scale,
test = test_scale,
cl = train_cl$Species,
k = 19)
misClassError <- mean(classifier_knn != test_cl$Species)
print(paste('Accuracy =', 1-misClassError))
Naïve Bayes Classifier Algorithm
• Naïve Bayes algorithm is a supervised learning algorithm,
which is based on Bayes theorem and used for solving
classification problems.
• It is mainly used in text classification that includes a
high-dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most
effective Classification algorithms which helps in building
the fast machine learning models that can make quick
predictions.
• It is a probabilistic classifier, which means it
predicts on the basis of the probability of an object.
• Some popular examples of Naïve Bayes Algorithm
are spam filtration, Sentimental analysis, and
classifying
Why is it called Naïve Bayes?
• The Naïve Bayes algorithm is comprised of two
words Naïve and Bayes, Which can be described
as:
• Naïve: It is called Naïve because it assumes
that the occurrence of a certain feature is
independent of the occurrence of other features.
Such as if the fruit is identified on the bases of
color, shape, and taste, then red, spherical, and
sweet fruit is recognized as an apple. Hence
each feature individually contributes to identify
that it is an apple without depending on each
other.
• Bayes: It is called Bayes because it depends on
the principle of Bayes' Theorem.
Bayes' Theorem:
• Bayes' theorem is also known as Bayes'
Rule or Bayes' law, which is used to determine
the probability of a hypothesis with prior
knowledge. It depends on the conditional
probability.
• The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of
hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of
the evidence given that the probability of a
hypothesis is true.
P(A) is Prior Probability: Probability of
hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of
Evidence.
Working of Naïve Bayes' Classifier:
• Working of Naïve Bayes' Classifier can be understood with the
help of the below example:
• Suppose we have a dataset of weather conditions and
corresponding target variable "Play". So using this dataset we
need to decide that whether we should play or not on a
particular day according to the weather conditions. So to solve
this problem, we need to follow the below steps:
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)
--------------------------
P(Sunny)
Possible hyperplanes
• To separate the two classes of data points, there are many possible
hyperplanes that could be chosen. Our objective is to find a plane that
has the maximum margin, i.e the maximum distance between data
points of both classes. Maximizing the margin distance provides some
reinforcement so that future data points can be classified with more
confidence.
Hyperplanes and Support Vectors
Hyperplanes are decision boundaries that help classify the data points. Data
points falling on either side of the hyperplane can be attributed to different
classes. Also, the dimension of the hyperplane depends upon the number of
features. If the number of input features is 2, then the hyperplane is just a
line. If the number of input features is 3, then the hyperplane becomes a
two-dimensional plane. It becomes difficult to imagine when the number of
features exceeds 3.
Support vectors are data points that are closer to the hyperplane and
influence the position and orientation of the hyperplane. Using these support
vectors, we maximize the margin of the classifier. Deleting the support
vectors will change the position of the hyperplane. These are the points that
help us build our SVM.
Advantages:
1. SVM works relatively well when there is a clear margin
of separation between classes.
2. SVM is more effective in high dimensional spaces.
3. SVM is effective in cases where the number of
dimensions is greater than the number of samples.
4. SVM is relatively memory efficient
Disadvantages:
1. SVM algorithm is not suitable for large data sets.
2. SVM does not perform very well when the data set has
more noise i.e. target classes are overlapping.
3. In cases where the number of features for each data
point exceeds the number of training data samples, the
SVM will underperform.
4. As the support vector classifier works by putting data
points, above and below the classifying hyperplane
there is no probabilistic explanation for the
classification.