0% found this document useful (0 votes)
9 views

Tools of Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Tools of Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Tools of Machine Learning

Basic Explorations
 Data Munging – Imputing Missing Values, identifying outliers, querying data and
asking all sorts of different questions to data. This is a simple tool and very similar to
Sql querying.
 Visualizations – ordinary charts (bars, pies, histograms, time series, boxplots),
contour charts and heat maps, text clouds (combination clouds and separation clouds),
multivariate charts and heatmaps. All these tools are related to getting a quick insight
about the data from a visualization.
Unsupervised Learning – mining exploratory patterns from the data
 K-means clustering – in this method, we try to identify possible means (centres)
around which we group our data and create clusters. Imagine we have customer
spending data on a supermarket. If we can find clusters of consumers based on their
spending amount or spending quantity, then we can analyse each cluster and gain
some interesting insight from the clusters
 Hierarchical clustering – similar to the K-means, the clustering is done on some
hierarchies.
 Principle Component Analysis (PCA) and Dimensionality Reduction – At times, we
get data on certain observations with many variables (many fields). All of that might
not be that important. PCA simply checks if variations in some of these variables are
completely explained by some other variables. In such a way we can discard some
variables, reducing the dimensions of the data and making the whole data set more
focused.
Supervised Learning – predictions for a target variable
 KNN (K nearest neighbour) classification – In classification, we try to use data to
assign predetermined labels to observations based on few properties. Imagine that we
are trying to identify who will default in a loan or not. The labels will be “default” or
“not default”. So, the aim of classification will be to use the data to decide whether a
certain borrower will be classified as a defaulter or not. In KNN, we look at K similar
borrowers to decide whether a borrower will default or not. It’s simple, requires less
processing power, but quite effective.
 Naïve Bayes – this is also a classification, and here we use the Bayes Rule, using prior
and posterior probabilities in the data to predict labels of observations. Again, simple,
intuitive, and effective.
 Decision Trees – Again, a form of classification. This is more like decision trees.
Based on the data, the algorithm aims to divide the labels based on different variable
groups and variables. For the borrower example, decision trees might predict that if
the borrower earns less than 40,000 a month and has more than 4 family members and
if he is the single bread earner, then he is more likely to default.
 Regression Trees – Similar to Decision Trees, but here we try to predict a number.
For example, expected consumer basket value, predicting sales, etc. But using the
same principle of dividing and sub-dividing the ranges of target numerical variable
based on other features.
 Random Forests – Again similar to Decision Trees and Regression Trees, RF
algorithm chooses random variables to form trees and forms many trees, and then
checks which combination of trees gives the best prediction. This requires a good deal
of processing power, and sort of a black box model. So, good in predictions, but not
that good if we would want to know which factors are playing a key role in the
predictions.
 Neural Networks – one of the peaks of predictive powers, the idea was formed quite
long back, but we did not have the processing power to compute NNs. The idea is
similar to how the neurons in our bodies handle impulses and signals coming from
brain to our muscles and vice versa. The information is transformed single or multiple
of times in different neurons while being transmitted. The same idea with the data –
the predictor variables get transformed multiple times in multiple layers of neurons by
different functions, in order to best match the predicting variable. It has the prediction
performance, but at the same time it is the most black-box model – as seldom people
will understand what is happening in each neuron.
Performance Improvement Methods in Machine Learning
 Train/Test Split – With a large enough data set we can divide our dataset in two
chunks. In one, we will be using all these methods to fit the best model. And then Try
these models in the other chunk. The model best at predicting the values in the other
chunk can be expected to be good at predicting future data as well!
 X-fold Cross Validation – Why stop at 2 chunks of a large dataset? If the data set is
large enough, we can divide it into 5/10/20 chunks, and then iterate 5/10/20 times the
whole process of using a single sample to fit a model and then checking the
performance of the model in the other samples and chose the one which is best at
predicting in all these samples. This is the pure application of the brute processing
power we have in our computers!
 Ensemble Models – So far, in the previous subsection we have seen many models.
But which one is best? Counter question – why we need to pre-decide which one is
best. Why not create a model which automatically tries out all these models and
makes a prediction based on all these models together? The idea is, say we are
interested in using five models, so an ensemble model will find out the accuracy of all
the five using cross validation, and then make a weighted prediction using all these
five. It’s like a voting system, where the most accurate most accurate model will have
more votes (more weightage) and the least accurate model will have the least votes
(more weightage), and thus we will be getting a weighted average prediction, which
sort of can beat all the individual models’ predictions. Netflix recommendations use
one such ensemble model.
Text Analytics
 Keyword Identification – taking in large chunks of text and then identifying key
words and frequently appearing words in the text and finding out the pattern of
appearance of certain words.
 Topic Modelling – identifying themes in texts, clustering texts based into either pre-
determined themes, or from themes identified inside the text. A very simple
illustration is how many hotel reviews are clustered into either about – “Distance”.
“Cleanliness”, “Staff” etc.
 Sentiment Analysis – figuring out the emotion or the sentiment involved in text data.
These can be either in the format of positive or negative sentiment in product reviews,
or identifying which reviewer is angry/scared/surprised/happy etc.

You might also like