Machine Learning 2
Machine Learning 2
We can train our model using machine learning to perform various predictions, analysing and more….
By-AVI PRUTHI
Important terms used:
• 1.Classifcation:- in which model draws a conclusion from observed fit
i.e finds the decision boundary.
Like creating a model to check if mail is spam or not, problems
like yes/no , speech recognition comes under the category of
classification.
2.Regression:- in which model tries to estimate the best fit line i.e finds
relation among variables
Eg. Weather prediction, house price prediction ….etc
1.Linear Regression
• Is used to predict a dependent variable value based on a given
independent value.
• Eg: For single variable->
Y=mx+c (y<-dependent, x<-independent)
• Eg: for multiple variables->
Y=mx1+mx2+…..+ mx….
• Used for predicting continuous data
Working of this Algorithm-
• Importation-
from sklearn.linear_model import LinearRegression
• Initialization and training:
reg=LinearRegression.fit(x_train,y_train)
Reg=LogisiticRegression().fit(X_train,y_train)
Reg.score(X_test,y_test)
3.Decision-Tree
• Trains the dataset with if-else-then conditions i.e it is used mostly
where there is dependencies on previous/other variables.
• It can be used for classification as well as regression problems
• Works on concept of less entropy(less randomness).
Working
• Importation:
clf = tree.DecisionTreeClassifier().fit(x_train,y_train)
clf.score(x_test,y_test)
4.Random Forest Classifier
• Is a set of decision trees from a randomly selected subset of the
training set and then it collects the vote from different decision to
decide the final predictions.
• The main parameter used here is ‘n_estimators’ which basically
determines the number of decision trees to be formed and on basis of
their prediction a final Random forest model is formed
Working
• Importation:
from sklearn.ensemble import RandomForestClassifier
• Initialization, training and predicting score
clf = RandomForestClassifier(n_estimators=10).fit(x_train,y_train)
clf.score(x_test,y_test)
5.Support Vector Machine(SVM)
• The goal of SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that
we can easily but the new data point in correct category in future.
• The aim is to create a boundary line called hyperplane which is
created so as to obtain maximum margin.
• It is used for both classification and regression.
Working
• Importation:
from sklearn.svm import SVC
• Initialization, training and predicting score
clf =SVC().fit(x_train,y_train)
clf.score(x_test,y_test)
Types of Machine Learning
• Supervised Learning:
• which works with labelled data i.e in which inputs and
target variables are known.
• Techniques used in supervised learning are-
• Classification
• Regression
• Algorithms based on them has already been discussed-
linear regression, logisitic ….. Etc.
• Unsupervised Learning:
• works with unlabelled data in which the inputs and
target classes are not known and we try to find various
relationships between different features.
• Techniques used here are-
• Clustering- involves partitioning of dataset into groups
called as clusters. The goal is to group data into similar
clusters.
• Association-involves to find the dependency of one data
item on another so as to generate maximum profit.
• Examples- K-means Clustering Algorithm, Principal
Component Analysis etc…
• Semi-Supervised Learning:
clf=KMeans(n_clusters=3).fit_predict(x,y)
7.Naïve Bayes Classifier Algorithm
• Is a statistical classification technique based on Bayes Theorem. It is
one of the simplest supervised learning algorithm
• It consists of 2 words –
• Naïve which means every feature is independent of each other.
• Bayes which means it used Bayes theorem
• Basically Naïve Bayes is a collection of various algorithms-
1.Gaussian Naïve Bayes: is used when predictor values are
continuous and are expected to follow Gaussian distribution.
2.Bernoulli Naïve Bayes: is used when predictors are Boolean in nature
and are expected to follow Bernoulli Distribution.
3.Multinomial Naïve Bayes- makes use of Multinomial distribution and
is often used to solve issues involving document or text classification.
Implementations
• from sklearn.naive_bayes import
GaussianNB
clf = GaussianNB()
• from sklearn.naive_bayes import BernoulliNB
clf = BernoulliNB( )
• from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
• Reg.fit(x_train,y_train)
• Reg.score(x_test,y_test)
• Reg.predict(x_test)