0% found this document useful (0 votes)
123 views

Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases

The document provides an overview of machine learning including definitions, categories of techniques, and examples of real-world applications. It discusses supervised and unsupervised learning, classification, clustering, and regression. Specific techniques covered include naive bayes classification, k-means clustering, hierarchical clustering, and logistic regression. Use cases presented involve spam detection, machine translation, image search, product recommendations, sentiment analysis, and more.

Uploaded by

Ameer Muhammad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views

Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases

The document provides an overview of machine learning including definitions, categories of techniques, and examples of real-world applications. It discusses supervised and unsupervised learning, classification, clustering, and regression. Specific techniques covered include naive bayes classification, k-means clustering, hierarchical clustering, and logistic regression. Use cases presented involve spam detection, machine translation, image search, product recommendations, sentiment analysis, and more.

Uploaded by

Ameer Muhammad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Agenda

• Introduction
• Basics
• Classification
• Clustering
• Regression
• Use-Cases
2
Quick Questionnaire

How many people have heard about Machine Learning

How many people know about Machine Learning

How many people are using Machine Learning


About
• subfield of Artificial Intelligence (AI)
• name is derived from the concept that it deals with
“construction and study of systems that can learn from data”
• can be seen as building blocks to make computers learn to
behave more intelligently
• It is a theoretical concept. There are various techniques with
various implementations.
• http://en.wikipedia.org/wiki/Machine_learning
In other words…

“A computer program is said to learn from


experience (E) with some class of tasks (T) and a
performance measure (P) if its performance at tasks
in T as measured by P improves with E”
Terminology
• Features
– The number of features or distinct traits that can be used to describe
each item in a quantitative manner.
• Samples
– A sample is an item to process (e.g. classify). It can be a document, a
picture, a sound, a video, a row in database or CSV file, or whatever
you can describe with a fixed set of quantitative traits.
• Feature vector
– is an n-dimensional vector of numerical features that represent some
object.
• Feature extraction
– Preparation of feature vector
– transforms the data in the high-dimensional space to a space of
fewer dimensions.
• Training/Evolution set
– Set of data to discover potentially predictive relationships.
Let’s dig deep into it…

What do you mean by

Apple
Learning (Training)

Features: Features: Features:


1. Color: Radish/Red 1. Sky Blue 1. Yellow
2. Type : Fruit 2. Logo 2. Fruit
3. Shape 3. Shape 3. Shape
etc… etc… etc…
Workflow
Categories

• Supervised Learning

• Unsupervised Learning

• Semi-Supervised Learning

• Reinforcement Learning
Supervised Learning
• the correct classes of the training data are
known

Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Unsupervised Learning
• the correct classes of the training data are not
known

Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Semi-Supervised Learning
• A Mix of Supervised and Unsupervised learning

Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Reinforcement Learning
• allows the machine or software agent to learn its
behavior based on feedback from the environment.
• This behavior can be learnt once and for all, or keep on
adapting as time goes by.

Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Machine Learning Techniques
Techniques
• classification: predict class from observations

• clustering: group observations into


“meaningful” groups

• regression (prediction): predict value from


observations
Classification
• classify a document into a predefined category.
• documents can be text, images
• Popular one is Naive Bayes Classifier.
• Steps:
– Step1 : Train the program (Building a Model) using a
training set with a category for e.g. sports, cricket, news,
– Classifier will compute probability for each word, the
probability that it makes a document belong to each of
considered categories
– Step2 : Test with a test data set against this Model
• http://en.wikipedia.org/wiki/Naive_Bayes_classifier
Clustering
• clustering is the task of grouping a set of objects in
such a way that objects in the same group (called
a cluster) are more similar to each other
• objects are not predefined
• For e.g. these keywords
– “man’s shoe”
– “women’s shoe”
– “women’s t-shirt”
– “man’s t-shirt”
– can be cluster into 2 categories “shoe” and “t-shirt” or
“man” and “women”
• Popular ones are K-means clustering and Hierarchical
clustering
K-means Clustering
• partition n observations into k clusters in which each observation belongs
to the cluster with the nearest mean, serving as a prototype of the cluster.
• http://en.wikipedia.org/wiki/K-means_clustering

http://pypr.sourceforge.net/kmeans.html
Hierarchical clustering
• method of cluster analysis which seeks to build
a hierarchy of clusters.
• There can be two strategies
– Agglomerative:
• This is a "bottom up" approach: each observation starts in its own
cluster, and pairs of clusters are merged as one moves up the
hierarchy.
• Time complexity is O(n^3)
– Divisive:
• This is a "top down" approach: all observations start in one cluster,
and splits are performed recursively as one moves down the
hierarchy.
• Time complexity is O(2^n)

• http://en.wikipedia.org/wiki/Hierarchical_clustering
Regression
• is a measure of the relation between
the mean value of one variable (e.g.
output) and corresponding values of
other variables (e.g. time and cost).
• regression analysis is a statistical
process for estimating the
relationships among variables.
• Regression means to predict the
output value using training data.
• Popular one is Logistic regression
(binary regression)
• http://en.wikipedia.org/wiki/Logistic_regression
Classification vs Regression
• Classification means to • Regression means to
group the output into predict the output
a class. value using training
• classification to predict data.
the type of tumor i.e. • regression to predict
harmful or not harmful the house price from
using training data training data
• if it is • if it is a real
discrete/categorical number/continuous,
variable, then it is then it is regression
classification problem problem.
Let’s see the usage in Real life
Use-Cases
• Spam Email Detection
• Machine Translation (Language Translation)
• Image Search (Similarity)
• Clustering (KMeans) : Amazon
Recommendations
• Classification : Google News

continued…
Use-Cases (contd.)
• Text Summarization - Google News
• Rating a Review/Comment: Yelp
• Fraud detection : Credit card Providers
• Decision Making : e.g. Bank/Insurance sector
• Sentiment Analysis
• Speech Understanding – iPhone with Siri
• Face Detection – Facebook’s Photo tagging
Classification in Action
isn’t it easy?
it’s not (Snapshot of Spam folder)

Not a
Spam

Not a
Spam
NER (Named Entity Recognition)

http://nlp.stanford.edu:8080/ner/process
Similar/Duplicate Images
Remember
Features ?
(Feature Extraction)
Can be :
• Width
• Height
• Contrast
• Brightness
• Position
• Hue
• Colors

Check this :
LIRE (Lucene Image REtrieval)
library -
https://code.google.com/p/lire/

Credit: https://www.google.co.in/
Recommendations

http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/
Popular Frameworks/Tools
• Weka
• Carrot2
• Gate
• OpenNLP
• LingPipe
• Stanford NLP
• Mallet – Topic Modelling
• Gensim – Topic Modelling (Python)
• Apache Mahout
• MLib – Apache Spark
• scikit-learn - Python
• LIBSVM : Support Vector Machines
• and many more…

You might also like