Tools of Machine Learning

Uploaded by

noushinlailaansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Tools of Machine Learning

Uploaded by

noushinlailaansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Tools of Machine Learning

Basic Explorations
 Data Munging – Imputing Missing Values, identifying outliers, querying data and
asking all sorts of different questions to data. This is a simple tool and very similar to
Sql querying.
 Visualizations – ordinary charts (bars, pies, histograms, time series, boxplots),
contour charts and heat maps, text clouds (combination clouds and separation clouds),
multivariate charts and heatmaps. All these tools are related to getting a quick insight
about the data from a visualization.
Unsupervised Learning – mining exploratory patterns from the data
 K-means clustering – in this method, we try to identify possible means (centres)
around which we group our data and create clusters. Imagine we have customer
spending data on a supermarket. If we can find clusters of consumers based on their
spending amount or spending quantity, then we can analyse each cluster and gain
some interesting insight from the clusters
 Hierarchical clustering – similar to the K-means, the clustering is done on some
hierarchies.
 Principle Component Analysis (PCA) and Dimensionality Reduction – At times, we
get data on certain observations with many variables (many fields). All of that might
not be that important. PCA simply checks if variations in some of these variables are
completely explained by some other variables. In such a way we can discard some
variables, reducing the dimensions of the data and making the whole data set more
focused.
Supervised Learning – predictions for a target variable
 KNN (K nearest neighbour) classification – In classification, we try to use data to
assign predetermined labels to observations based on few properties. Imagine that we
are trying to identify who will default in a loan or not. The labels will be “default” or
“not default”. So, the aim of classification will be to use the data to decide whether a
certain borrower will be classified as a defaulter or not. In KNN, we look at K similar
borrowers to decide whether a borrower will default or not. It’s simple, requires less
processing power, but quite effective.
 Naïve Bayes – this is also a classification, and here we use the Bayes Rule, using prior
and posterior probabilities in the data to predict labels of observations. Again, simple,
intuitive, and effective.
 Decision Trees – Again, a form of classification. This is more like decision trees.
Based on the data, the algorithm aims to divide the labels based on different variable
groups and variables. For the borrower example, decision trees might predict that if
the borrower earns less than 40,000 a month and has more than 4 family members and
if he is the single bread earner, then he is more likely to default.
 Regression Trees – Similar to Decision Trees, but here we try to predict a number.
For example, expected consumer basket value, predicting sales, etc. But using the
same principle of dividing and sub-dividing the ranges of target numerical variable
based on other features.
 Random Forests – Again similar to Decision Trees and Regression Trees, RF
algorithm chooses random variables to form trees and forms many trees, and then
checks which combination of trees gives the best prediction. This requires a good deal
of processing power, and sort of a black box model. So, good in predictions, but not
that good if we would want to know which factors are playing a key role in the
predictions.
 Neural Networks – one of the peaks of predictive powers, the idea was formed quite
long back, but we did not have the processing power to compute NNs. The idea is
similar to how the neurons in our bodies handle impulses and signals coming from
brain to our muscles and vice versa. The information is transformed single or multiple
of times in different neurons while being transmitted. The same idea with the data –
the predictor variables get transformed multiple times in multiple layers of neurons by
different functions, in order to best match the predicting variable. It has the prediction
performance, but at the same time it is the most black-box model – as seldom people
will understand what is happening in each neuron.
Performance Improvement Methods in Machine Learning
 Train/Test Split – With a large enough data set we can divide our dataset in two
chunks. In one, we will be using all these methods to fit the best model. And then Try
these models in the other chunk. The model best at predicting the values in the other
chunk can be expected to be good at predicting future data as well!
 X-fold Cross Validation – Why stop at 2 chunks of a large dataset? If the data set is
large enough, we can divide it into 5/10/20 chunks, and then iterate 5/10/20 times the
whole process of using a single sample to fit a model and then checking the
performance of the model in the other samples and chose the one which is best at
predicting in all these samples. This is the pure application of the brute processing
power we have in our computers!
 Ensemble Models – So far, in the previous subsection we have seen many models.
But which one is best? Counter question – why we need to pre-decide which one is
best. Why not create a model which automatically tries out all these models and
makes a prediction based on all these models together? The idea is, say we are
interested in using five models, so an ensemble model will find out the accuracy of all
the five using cross validation, and then make a weighted prediction using all these
five. It’s like a voting system, where the most accurate most accurate model will have
more votes (more weightage) and the least accurate model will have the least votes
(more weightage), and thus we will be getting a weighted average prediction, which
sort of can beat all the individual models’ predictions. Netflix recommendations use
one such ensemble model.
Text Analytics
 Keyword Identification – taking in large chunks of text and then identifying key
words and frequently appearing words in the text and finding out the pattern of
appearance of certain words.
 Topic Modelling – identifying themes in texts, clustering texts based into either pre-
determined themes, or from themes identified inside the text. A very simple
illustration is how many hotel reviews are clustered into either about – “Distance”.
“Cleanliness”, “Staff” etc.
 Sentiment Analysis – figuring out the emotion or the sentiment involved in text data.
These can be either in the format of positive or negative sentiment in product reviews,
or identifying which reviewer is angry/scared/surprised/happy etc.

Generative AI For Dummies
67% (3)
Generative AI For Dummies
6 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Painless Statistics
From Everand
Painless Statistics
Barron's Educational Series
No ratings yet
PRAVANA Color Conversion Guide
0% (1)
PRAVANA Color Conversion Guide
36 pages
Isaca Crisc: Exam Summary - Syllabus - Questions
No ratings yet
Isaca Crisc: Exam Summary - Syllabus - Questions
9 pages
Digi Ds-425p Manual
No ratings yet
Digi Ds-425p Manual
19 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Unit-5 MECH 3-2
No ratings yet
Unit-5 MECH 3-2
14 pages
AIML MODEL
No ratings yet
AIML MODEL
13 pages
Mmds
No ratings yet
Mmds
12 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Unit 3
No ratings yet
Unit 3
33 pages
SRU ADA Unit-3
No ratings yet
SRU ADA Unit-3
78 pages
3.popular Machine Learning Algorithm
No ratings yet
3.popular Machine Learning Algorithm
11 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
SSRN 3702236
No ratings yet
SSRN 3702236
8 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
What Are The Common Algorithms in Machine Learning
No ratings yet
What Are The Common Algorithms in Machine Learning
3 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
Business Data Analytics Part 4
No ratings yet
Business Data Analytics Part 4
52 pages
COMP1801 - Copy 1
No ratings yet
COMP1801 - Copy 1
18 pages
UNIT - II - Data Mining Essentials
No ratings yet
UNIT - II - Data Mining Essentials
20 pages
DSA Unit1
No ratings yet
DSA Unit1
37 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
AI overview Simplified
No ratings yet
AI overview Simplified
17 pages
Exercise of Chapter 4_ Data Mining Tools and Techniques Worksheet
No ratings yet
Exercise of Chapter 4_ Data Mining Tools and Techniques Worksheet
4 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
Big Data Analytics Algorithm, Tools in Systematic Review
No ratings yet
Big Data Analytics Algorithm, Tools in Systematic Review
7 pages
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
No ratings yet
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
65 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Artificial Intelligence Chapter 18 (Updated)
No ratings yet
Artificial Intelligence Chapter 18 (Updated)
19 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Koushal Vichare Assingment
No ratings yet
Koushal Vichare Assingment
5 pages
ML notes
No ratings yet
ML notes
10 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
The 10 Algorithms Machine Learning Engineers Need To Know
No ratings yet
The 10 Algorithms Machine Learning Engineers Need To Know
14 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
ML Unit 3
No ratings yet
ML Unit 3
10 pages
Data Science for Civil Engineering Unit 4 Notes
No ratings yet
Data Science for Civil Engineering Unit 4 Notes
18 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Data Science Crash Course
100% (1)
Data Science Crash Course
32 pages
Presentation
No ratings yet
Presentation
42 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
4 pages
Interview AI Algo
No ratings yet
Interview AI Algo
3 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
Module Iii
No ratings yet
Module Iii
15 pages
M4 - FDS
No ratings yet
M4 - FDS
15 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
PRCV Unit-2
No ratings yet
PRCV Unit-2
24 pages
Module 3
No ratings yet
Module 3
11 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
24 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Are Mobile Financial Services Promoting Financial Inclusion in Bangladesh_ Bangladesh Bank
No ratings yet
Are Mobile Financial Services Promoting Financial Inclusion in Bangladesh_ Bangladesh Bank
10 pages
service quality dimensions of MFS in Bangladesh
No ratings yet
service quality dimensions of MFS in Bangladesh
8 pages
Impact of Mobile Financial Services on Financial Inclusion in Bangladesh
No ratings yet
Impact of Mobile Financial Services on Financial Inclusion in Bangladesh
26 pages
sad_07 drawing DFD supp
No ratings yet
sad_07 drawing DFD supp
32 pages
PAM Capabilities Deck Ver 1
No ratings yet
PAM Capabilities Deck Ver 1
1 page
NX7.5 Licence Install
100% (1)
NX7.5 Licence Install
14 pages
Gek 130922B PDF
No ratings yet
Gek 130922B PDF
796 pages
Software-testingnew
No ratings yet
Software-testingnew
1 page
Lesson 2
No ratings yet
Lesson 2
43 pages
Kaggle State of Machine Learning and Data Science Report 2022
No ratings yet
Kaggle State of Machine Learning and Data Science Report 2022
25 pages
Get Beginning Lua Programming Programmer to Programmer 1st Edition Kurt Jung free all chapters
100% (3)
Get Beginning Lua Programming Programmer to Programmer 1st Edition Kurt Jung free all chapters
55 pages
Logical Security Presentation
No ratings yet
Logical Security Presentation
13 pages
Assignment No: 3: Aim: Objective: Theory:-Inverted Index
No ratings yet
Assignment No: 3: Aim: Objective: Theory:-Inverted Index
2 pages
B.SC (Computer Science) 2013 Pattern PDF
No ratings yet
B.SC (Computer Science) 2013 Pattern PDF
129 pages
Creo 7.0 Read This First
No ratings yet
Creo 7.0 Read This First
9 pages
Computer Architecture Lab Manual
No ratings yet
Computer Architecture Lab Manual
108 pages
Teaching Plan Sem 1 2023 - 2023
No ratings yet
Teaching Plan Sem 1 2023 - 2023
7 pages
User Manual - IS - CDC - 2 - Operations and Commands (Guia)
No ratings yet
User Manual - IS - CDC - 2 - Operations and Commands (Guia)
61 pages
Unit 10 Describing Object: Study The Dialogue Below. Read, Discuss and If Necessary Practice The Dialogue!
No ratings yet
Unit 10 Describing Object: Study The Dialogue Below. Read, Discuss and If Necessary Practice The Dialogue!
4 pages
KSG-4.6K-DM / KSG-4.9K-DM / KSG-5K-DM / KSG-5.2K-DM: String Grid-Tied PV Inverter NG Grid-Tied PV Inverter
No ratings yet
KSG-4.6K-DM / KSG-4.9K-DM / KSG-5K-DM / KSG-5.2K-DM: String Grid-Tied PV Inverter NG Grid-Tied PV Inverter
1 page
Content Delivery Networks
No ratings yet
Content Delivery Networks
14 pages
Chapter 1 - Computer Programming
No ratings yet
Chapter 1 - Computer Programming
12 pages
Big Data On Aws at Edutronic PDF
No ratings yet
Big Data On Aws at Edutronic PDF
2 pages
Guide To 4D Simulation For VDC Using Fuzor
No ratings yet
Guide To 4D Simulation For VDC Using Fuzor
30 pages
Xy Xy y X: Solve Cauchy-Euler Type Differential Equation - Solution: Method 1: (A) Let
No ratings yet
Xy Xy y X: Solve Cauchy-Euler Type Differential Equation - Solution: Method 1: (A) Let
5 pages
Learn Ethical Hacking From Scratch
No ratings yet
Learn Ethical Hacking From Scratch
16 pages
Printer and Impact Printer
No ratings yet
Printer and Impact Printer
2 pages
Virtual Lab - Operating System-2
No ratings yet
Virtual Lab - Operating System-2
23 pages
Mechatronics Ppt
No ratings yet
Mechatronics Ppt
4 pages
Dap An
No ratings yet
Dap An
19 pages

Tools of Machine Learning

Uploaded by

Tools of Machine Learning

Uploaded by

Tools of Machine Learning

You might also like