Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
• Introduction
• Basics
• Classification
• Clustering
• Regression
• Use-Cases
2
Quick Questionnaire
Apple
Learning (Training)
• Supervised Learning
• Unsupervised Learning
• Semi-Supervised Learning
• Reinforcement Learning
Supervised Learning
• the correct classes of the training data are
known
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Unsupervised Learning
• the correct classes of the training data are not
known
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Semi-Supervised Learning
• A Mix of Supervised and Unsupervised learning
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Reinforcement Learning
• allows the machine or software agent to learn its
behavior based on feedback from the environment.
• This behavior can be learnt once and for all, or keep on
adapting as time goes by.
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Machine Learning Techniques
Techniques
• classification: predict class from observations
http://pypr.sourceforge.net/kmeans.html
Hierarchical clustering
• method of cluster analysis which seeks to build
a hierarchy of clusters.
• There can be two strategies
– Agglomerative:
• This is a "bottom up" approach: each observation starts in its own
cluster, and pairs of clusters are merged as one moves up the
hierarchy.
• Time complexity is O(n^3)
– Divisive:
• This is a "top down" approach: all observations start in one cluster,
and splits are performed recursively as one moves down the
hierarchy.
• Time complexity is O(2^n)
• http://en.wikipedia.org/wiki/Hierarchical_clustering
Regression
• is a measure of the relation between
the mean value of one variable (e.g.
output) and corresponding values of
other variables (e.g. time and cost).
• regression analysis is a statistical
process for estimating the
relationships among variables.
• Regression means to predict the
output value using training data.
• Popular one is Logistic regression
(binary regression)
• http://en.wikipedia.org/wiki/Logistic_regression
Classification vs Regression
• Classification means to • Regression means to
group the output into predict the output
a class. value using training
• classification to predict data.
the type of tumor i.e. • regression to predict
harmful or not harmful the house price from
using training data training data
• if it is • if it is a real
discrete/categorical number/continuous,
variable, then it is then it is regression
classification problem problem.
Let’s see the usage in Real life
Use-Cases
• Spam Email Detection
• Machine Translation (Language Translation)
• Image Search (Similarity)
• Clustering (KMeans) : Amazon
Recommendations
• Classification : Google News
continued…
Use-Cases (contd.)
• Text Summarization - Google News
• Rating a Review/Comment: Yelp
• Fraud detection : Credit card Providers
• Decision Making : e.g. Bank/Insurance sector
• Sentiment Analysis
• Speech Understanding – iPhone with Siri
• Face Detection – Facebook’s Photo tagging
Classification in Action
isn’t it easy?
it’s not (Snapshot of Spam folder)
Not a
Spam
Not a
Spam
NER (Named Entity Recognition)
http://nlp.stanford.edu:8080/ner/process
Similar/Duplicate Images
Remember
Features ?
(Feature Extraction)
Can be :
• Width
• Height
• Contrast
• Brightness
• Position
• Hue
• Colors
Check this :
LIRE (Lucene Image REtrieval)
library -
https://code.google.com/p/lire/
Credit: https://www.google.co.in/
Recommendations
http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/
Popular Frameworks/Tools
• Weka
• Carrot2
• Gate
• OpenNLP
• LingPipe
• Stanford NLP
• Mallet – Topic Modelling
• Gensim – Topic Modelling (Python)
• Apache Mahout
• MLib – Apache Spark
• scikit-learn - Python
• LIBSVM : Support Vector Machines
• and many more…