Established as per the Section 2(f) of the UGC Act, 1956
Approved by AICTE, COA and BCI, New Delhi
        Machine Learning(B21EP0502 )
       Dept. of Electronics and Computer Engineering
        Dr. Vidyasagar K N
COURSE OBJECTIVES:
This course will enable the students to:
1. Discuss the basic theory underlying machine learning.
2. Explain machine learning algorithms to solve problems of moderate
    complexity for data analysis.
3. Illustrate the concept of Genetic Programming and Artificial Neural
   Network.
4. Discuss the implementation of Machine learning algorithms and modules.
COURSE OUTCOMES:
After studying this course, students will be able to:
CO1: Comprehend statistical methods as basis of machine learning domain
CO2: Apply variety of learning algorithms for appropriate applications.
CO3: Implement machine learning techniques to solve problems in applicable
domains
CO4: Evaluate and compare algorithms based on different metrics and
parameters.
CO5: Design application using machine learning techniques.
CO6: Apply Dimensionality Reduction technique.
    UNIT-1:
    INTRODUCTION TO MACHINE LEARNING MACHINE LEARNING,
▪    Introduction to Machine Learning Machine Learning
▪    Types of Machine Learning, Issues in Machine Learning
▪    Application of Machine Learning
▪    Steps in developing a Machine Learning Application
▪    Importance of Data Visualization
▪    Basics of Supervised and Unsupervised Learning
UNIT-2:
REGRESSION TECHNIQUES
▪   Linear Regression
▪   Logistic Regression
▪   Learning with Trees: Decision Trees,
▪   Constructing Decision Trees using Gini Index
▪   Classification and Regression Trees (CART)
▪   Hyperparameters tuning
▪   Loss Functions
▪   Evaluation Measures for Regression Technique
UNIT-3:
CLASSIFICATION AND CLUSTERING
• Classification: Rule based classification, classification by Bayesian Belief
  networks, Hidden Markov Models. Support Vector Machine: Maximum
  Margin Linear Separators, Quadratic Programming solution to finding
  maximum margin separators, Kernels for learning non-linear functions.
• Clustering: K-means Algorithms, Supervised learning after clustering,
  Radial Basis functions. Dimensionality Reduction Techniques, Principal
  Component Analysis
UNIT-4:
ARTIFICIAL NEURAL NETWORKS:
▪ Biological Neurons and Biological Neural Networks,
▪ Perceptron Learning
▪ Activation Functions
▪ Multilayer Perceptron's
▪ Back-propagation Neural Networks
▪ Competitive Neural Networks
TEXT BOOK
• Tom Mitchell: Introduction to Machine Learning , McGraw Hill 2013
• Ethem Alpaydin-Introduction to Machine Learning-The MIT Press (2014)
Who is Alan Turing?
What is Machine
Learning?
Machine learning is a branch of artificial intelligence (AI) and computer
science which focuses on the use of data and algorithms to imitate the
way that humans learn, gradually improving its accuracy
Machine learning is an important component of the growing field of data science.
Through the use of statistical methods, algorithms are trained to make classifications or predictions,
uncovering key insights within data mining projects.
These insights subsequently drive decision making within applications and businesses, ideally impacting key
growth metrics.
As big data continues to expand and grow, the market demand for data scientists will increase, requiring
them to assist in the identification of the most relevant business questions and subsequently the data to
answer them.
• Traditional Programming
    Data
               Computer     Output
 Program
• Machine Learning
    Data
                Computer    Program
   Output
Machine Learning Real
Examples
If you have used Netflix, then you must know that it
recommends you some movies or shows for
watching based on what you have watched earlier.
Machine Learning is used for this recommendation
and to select the data which matches your choice. It
uses the earlier data.
When you upload a photo on Facebook, it can
recognize a person in that photo and suggest you,
mutual friends. ML is used for these predictions. It
uses data like your friend-list, photos available etc.
and it makes predictions based on that.
Software, which shows how you will look when you
get older. This image processing also uses machine
learning.
Types of Machine
Learning
TYPES OF MACHINE LEARNING
TYPES OF MACHINE LEARNING
SUPERVISED LEARNING
•   Supervised learning is when the model is getting trained on a labelled
    dataset. A labelled dataset is one that has both input and output
    parameters. In this type of learning both training and validation, datasets
    are labelled as shown in the figures below.
SUPERVISED LEARNING
• Training the system: While training the model, data is usually split in the
  ratio of 80:20 i.e. 80% as training data and the rest as testing data.
• In training data, we feed input as well as output for 80% of data.
• The model learns from training data only. We use different machine
  learning algorithms(which we will discuss in detail in the next articles) to
  build our model.
• Learning means that the model will build some logic of its own.
  Once the model is ready then it is good to be tested.
• At the time of testing, the input is fed from the remaining 20% of data that
  the model has never seen before, the model will predict some value and we
  will compare it with the actual output and calculate the accuracy.
SUPERVISED LEARNING
CLASSIFICATION
• Classification: It is a Supervised Learning task where output is having
  defined labels(discrete value).
• For example in above Figure A, Output – Purchased has defined labels i.e. 0
  or 1; 1 means the customer will purchase, and 0 means that the customer
  won’t purchase.
• The goal here is to predict discrete values belonging to a particular class
  and evaluate them based on accuracy. It can be either binary or multi-class
  classification.
• In binary classification, the model predicts either 0 or 1; yes or no but in the
  case of multi-class classification, the model predicts more than one
  class. Example: Gmail classifies mails in more than one class like social,
  promotions, updates, and forums.
REGRESSION
• Regression: It is a Supervised Learning task where output is having
  continuous value.
• For example, in above Figure B, Output – Wind Speed is not having any
  discrete value but is continuous in a particular range.
• The goal here is to predict a value as much closer to the actual output value
  as our model can and then evaluation is done by calculating the error value.
• The smaller the error the greater the accuracy of our regression model.
EXAMPLE OF SUPERVISED LEARNING ALGORITHMS
• Linear Regression
• Logistic Regression
• Nearest Neighbor
• Gaussian Naive Bayes
• Decision Trees
• Support Vector Machine (SVM)
• Random Forest
UNSUPERVISED LEARNING
• unsupervised machine learning analyzes and clusters unlabeled datasets
  using machine learning algorithms. These algorithms find hidden patterns
  and data without any human intervention,
• i.e., we don’t give output to our model. The training model has only input
  parameter values and discovers the groups or patterns on its own.
• Data-set in Figure A is Mall data that contains information about its clients
  that subscribe to them. Once subscribed they are provided a membership
  card and the mall has complete information about the customer and his/her
  every purchase.
• Now using this data and unsupervised learning techniques, the mall can
  easily group clients based on the parameters we are feeding in.
UNSUPERVISED LEARNING
UNSUPERVISED LEARNING
                        The input to the unsupervised
                        learning models is as follows:
                        • Unstructured data: May contain
                          noisy(meaningless)         data,
                          missing values, or unknown data
                        • Unlabeled data: Data only
                          contains a value for input
                          parameters, there is no targeted
                          value(output). It is easy to
                          collect as compared to the
                          labeled one in the Supervised
                          approach.
TYPES OF UNSUPERVISED LEARNING
TYPES OF UNSUPERVISED LEARNING
• Clustering: Broadly this technique is applied to group data based on different
  patterns, such as similarities or differences, our machine model finds. These
  algorithms are used to process raw, unclassified data objects into groups.
• For example, in the above figure, we have not given output parameter values,
  so this technique will be used to group clients based on the input
  parameters provided by our data.
TYPES OF UNSUPERVISED LEARNING
• Association: This technique is a rule-based ML technique that finds out
  some very useful relations between parameters of a large data set. This
  technique is basically used for market basket analysis that helps to better
  understand the relationship between different products.
• For e.g. shopping stores use algorithms based on this technique to find out
  the relationship between the sale of one product w.r.t to another’s sales
  based on customer behavior.
• Like if a customer buys milk, then he may also buy bread, eggs, or butter.
  Once trained well, such models can be used to increase their sales by
  planning different offers.
TYPES OF UNSUPERVISED LEARNING
Some algorithms: K-Means Clustering
•   DBSCAN – Density-Based Spatial Clustering of Applications with Noise
•   BIRCH – Balanced Iterative Reducing and Clustering using Hierarchies
•   Hierarchical Clustering
 SEMI-SUPERVISED LEARNING
As the name suggests, its working lies between Supervised and Unsupervised
techniques. We use these techniques when we are dealing with data that is a little
bit labeled and the rest large portion of it is unlabeled. We can use the
unsupervised techniques to predict labels and then feed these labels to
supervised techniques. This technique is mostly applicable in the case of image
data sets where usually all images are not labeled.
REINFORCEMENT LEARNING
• In this technique, the model keeps on increasing its performance using
  Reward Feedback to learn the behavior or pattern.
• These algorithms are specific to a particular problem e.g. Google Self
  Driving car, AlphaGo where a bot competes with humans and even itself to
  get better and better performers in Go Game.
•   Each time we feed in data, they learn and add the data to their knowledge
    which is training data. So, the more it learns the better it gets trained and
    hence experienced.
REINFORCEMENT LEARNING
                         •   Agents observe input.
                         •   An agent performs an action by
                             making some decisions.
                         •   After its performance, an agent
                             receives a reward and accordingly
                             reinforces and the model stores in
                             state-action pair of information.
                         •   Temporal Difference (TD)
                         •   Q-Learning
                         •   Deep Adversarial Networks
Examples
TO BETTER FILTER EMAILS AS
SPAM OR NOT
•   Task – Classifying emails as spam or not
•   Performance Measure – The fraction of emails accurately classified as
    spam or not spam
•   Experience – Observing you label emails as spam or not spam
A CHECKERS LEARNING PROBLEM
•   Task – Playing checkers game
•   Performance Measure – percent of games won against opposer
•   Experience – playing implementation games against itself
HANDWRITING RECOGNITION
PROBLEM
•   Task – Acknowledging handwritten words within portrayal
•   Performance Measure – percent of words accurately classified
•   Experience – a directory of handwritten words with given classifications
A ROBOT DRIVING PROBLEM
•   Task – driving on public four-lane highways using sight scanners
•   Performance Measure – average distance progressed before a fallacy
•   Experience – order of images and steering instructions noted down while
    observing a human driver
FRUIT PREDICTION PROBLEM
•   Task – forecasting different fruits for recognition
•   Performance Measure – able to predict maximum variety of fruits
•   Experience – training machine with the largest datasets of fruits images
FACE RECOGNITION PROBLEM
•   Task – predicting different types of faces
•   Performance Measure – able to predict maximum types of faces
•   Experience – training machine with maximum amount of datasets of
    different face images
AUTOMATIC TRANSLATION OF
DOCUMENTS
•   Task – translating one type of language used in a document to other
    language
•   Performance Measure – able to convert one language to other efficiently
•   Experience – training machine with a large dataset of different types of
    languages
Design a Learning
System in Machine
Learning
“Machine Learning enables a Machine to Automatically learn from Data,
Improve performance from an Experience and predict things without
explicitly programmed.”
When we fed the Training Data to Machine Learning Algorithm, this algorithm
will produce a mathematical model and with the help of the mathematical
model, the machine will make a prediction and take a decision without being
explicitly programmed.
Also, during training data, the more machine will work with it the more it will
get experience and the more it will get experience the more efficient result is
produced.
EXAMPLE :
In Driverless Car, the training data is fed to Algorithm like how to Drive Car in
Highway, Busy and Narrow Street with factors like speed limit, parking, stop
at signal etc.
After that, a Logical and Mathematical model is created based on that and
after that, the car will work according to the logical model.
Also, the more data the data is fed the more efficient output is produced.
Steps for Designing
Learning System are:
STEP – 1: CHOOSING THE TRAINING EXPERIENCE:
• The very important and first task is to choose the training data or training
  experience which will be fed to the Machine Learning Algorithm.
• It is important to note that the data or experience that we fed to the
  algorithm must have a significant impact on the Success or Failure of the
  Model.
• So Training data or experience should be chosen wisely.
ATTRIBUTES WHICH WILL IMPACT ON SUCCESS AND
FAILURE OF DATA
•   The training experience will be able to provide direct or indirect feedback
    regarding choices.
•   For example: While Playing chess the training data will provide feedback
    to itself like instead of this move if this is chosen the chances of success
    increases
ATTRIBUTES WHICH WILL IMPACT ON SUCCESS AND
FAILURE OF DATA
•   Second important attribute is the degree to which the learner will control
    the sequences of training examples.
•   For example: when training data is fed to the machine then at that time
    accuracy is very less but when it gains experience while playing again and
    again with itself or opponent the machine algorithm will get feedback and
    control the chess game accordingly.
ATTRIBUTES WHICH WILL IMPACT ON SUCCESS AND
FAILURE OF DATA
Third important attribute is how it will represent the distribution of examples
over which performance will be measured.
For example, a Machine learning algorithm will get experience while going
through a number of different cases and different examples.
Thus, Machine Learning Algorithm will get more and more experience by
passing through more and more examples and hence its performance will
increase
STEP 2- CHOOSING TARGET FUNCTION
• It means according to the knowledge fed to the algorithm the machine
  learning will choose NextMove function which will describe what type of
  legal moves should be taken.
• For example : While playing chess with the opponent, when opponent will
  play then the machine learning algorithm will decide what be the number
  of possible legal moves taken in order to get success
STEP 3- CHOOSING REPRESENTATION FOR TARGET
FUNCTION
• When the machine algorithm will know all the possible legal moves the
  next step is to choose the optimized move using any representation i.e.
  using linear Equations, Hierarchical Graph Representation, Tabular form
  etc.
• The NextMove function will move the Target move like out of these move
  which will provide more success rate.
• For Example : while playing chess machine have 4 possible moves, so the
  machine will choose that optimized move which will provide success to it.
STEP 4- CHOOSING FUNCTION APPROXIMATION
ALGORITHM
• An optimized move cannot be chosen just with the training data.
• The training data had to go through with set of example and through these
  examples the training data will approximates which steps are chosen and
  after that machine will provide feedback on it.
• For Example : When a training data of Playing chess is fed to algorithm so
  at that time it is not machine algorithm will fail or get success and again
  from that failure or success it will measure while next move what step
  should be chosen and what is its success rate
STEP 5- FINAL DESIGN
The final design is created at last when system goes from number of
examples , failures and success , correct and incorrect decision and what
will be the next step etc.
Example: DeepBlue is an intelligent computer which is ML-based won chess
game against the chess expert Garry Kasparov, and it became the first
computer which had beaten a human chess expert.
ISSUES IN MACHINE LEARNING
• What algorithms exist for learning general target functions from specific
  training examples?
• In what settings will algorithms converge to the desired function, given
  sufficient training data?
• Which algorithms perform best for which types of problems and
  representations?
ISSUES IN MACHINE LEARNING
• How much training data is sufficient?
• What general bounds can be found to relate the confidence in learned
  hypotheses to the amount of training experience and the character of the
  learner's hypothesis space?
ISSUES IN MACHINE LEARNING
• When and how can prior knowledge held by the learner guide the process
   of generalizing from examples?
• Can prior knowledge be helpful even when it is only approximately
  correct?
ISSUES IN MACHINE LEARNING
•   What is the best strategy for choosing a useful next training experience,
    and how does the choice of this strategy alter the complexity of the
    learning problem?
ISSUES IN MACHINE LEARNING
• What is the best way to reduce the learning task to one or more function
  approximation problems?
• Put another way, what specific functions should the system attempt to
  learn? Can this process itself be automated?
ISSUES IN MACHINE LEARNING
How can the learner automatically alter its representation to improve its
ability to represent and learn the target function?
Why Data
Preprocessing:
WHY DATA PREPROCESSING?
➢ Data in the real world is dirty
 ✓ incomplete: lacking attribute values, lacking certain attributes of interest,
   or containing only aggregate data
 ✓ noisy: containing errors or outliers
 ✓ inconsistent: containing discrepancies in codes or names
➢ No quality data, no quality mining results!
 ✓ Quality decisions must be based on quality data
 ✓ Data warehouse needs consistent integration of quality data
➢ A multi-dimensional measure of data quality:
 ✓ A well-accepted multi-dimensional view:
MAJOR TASKS IN DATA PREPROCESSING
• Data cleaning
 •   Fill in missing values, smooth noisy data, identify or remove outliers, and resolve
     inconsistencies
• Data integration
 •   Integration of multiple databases, data cubes, files, or notes
• Data transformation
 •   Normalization (scaling to a specific range)
 •   Aggregation
• Data reduction
 •   Obtains reduced representation in volume but produces the same or similar analytical
     results
 •   Data discretization: with particular importance, especially for numerical data
 •   Data aggregation, dimensionality reduction, data compression,generalization
Forms of data preprocessing
DATA CLEANING
DATA CLEANING
•    Data cleaning tasks
    ✓ Fill in missing values
    ✓ Identify outliers and smooth out noisy data
    ✓ Correct inconsistent data
MISSING DATA
•       Data is not always available
    ✓    E.g., many tuples have no recorded value for several attributes, such as customer income
         in sales data
•       Missing data may be due to
    ✓    equipment malfunction
    ✓    inconsistent with other recorded data and thus deleted
    ✓    data not entered due to misunderstanding
    ✓    certain data may not be considered important at the time of entry
    ✓    not register history or changes of the data
•       Missing data may need to be inferred
HOW TO HANDLE MISSING DATA?
•   Ignore the tuple: usually done when class label is missing (assuming the task is
    classification—not effective in certain cases)
•   Fill in the missing value manually: tedious + infeasible?
•   Use a global constant to fill in the missing value: e.g., “unknown”, a new class?!
•   Use the attribute mean to fill in the missing value
•   Use the attribute mean for all samples of the same class to fill in the
    missing value: smarter
•   Use the most probable value to fill in the missing value: inference-based such
    as regression, Bayesian formula, decision tree
NOISY DATA
 Q: What is noise?
 A: Random error in a measured variable.
 • Incorrect attribute values may be due to
   ✓ faulty data collection instruments
   ✓ data entry problems
   ✓ data transmission problems
   ✓ technology limitation
   ✓ inconsistency in naming convention
 • Other data problems which requires data cleaning
   ✓ duplicate records
   ✓ incomplete data
   ✓ inconsistent data
HOW TO HANDLE NOISY DATA?
 • Binning method:
  ✓ first sort data and partition into (equi-depth) bins
  ✓ then one can smooth by bin means, smooth by bin median, smooth by bin
    boundaries, etc.
  ✓ used also for discretization
• Clustering
  ✓ detect and remove outliers
• Semi-automated method: combined computer and human
  inspection
  ✓ detect suspicious values and check manually
• Regression
  ✓ smooth by fitting the data into regression functions
DATA VISUALIZATION
 • Data visualization is an important skill to possess for anyone trying to
   extract and communicate insights from data.
 • Great business narratives and presentations often stem from brilliant
   visualizations that convey the key ideas in a concise and aesthetic manner.
 • In the field of machine learning, visualization plays a key role throughout
   the entire process of analysis - to obtain relationships, observe trends and
   portray the results as well.
NECESSITY OF DATA VISUALIZATION
• It is difficult for the human eye to decipher patterns from raw numbers
  only.
• Sometimes, even the statistical information summarized from the data
  may mislead you to wrong conclusions.
• Therefore, you should visualize the data often to understand how different
  features are behaving.
DATA VISUALIZATION-RETAIL STORE SALES EXAMPLE
DATA VISUALIZATION-RETAIL STORE SALES EXAMPLE
DATA VISUALIZATION-RETAIL STORE SALES EXAMPLE
• Each of the branches had employed a different strategy to calculate its
  discount rate, and the sales numbers were also quite different across all of
  them.
• It is difficult to draw this type of insight and understand the difference
  between each of the branches using raw numbers alone; therefore, we
  should utilize an appropriate visualization technique to ‘look’ at the data.
FACTS AND DIMENSIONS
• Graphics and visuals, when used intelligently and innovatively, can convey
  a lot more than what raw data alone can.
• Matplotlib serves the purpose of providing multiple functions to build
  graphs from the data stored in your lists, arrays, etc.
There are two types of data, which are as follows:
•   Facts
•   Dimensions
FACTS AND DIMENSIONS
• Facts and dimensions are different types of variables that help you
  interpret data better.
• Facts are numerical data, and dimensions are metadata.
• Metadata explains the additional information associated with the factual
  variable.
• Both facts and dimensions are equally important for generating actionable
  insights from a given data set.
• For example, in a data set about the height of students in a class, the
  height of the students would be a fact variable, whereas the gender of the
  students would be a dimensional variable.
• You can use dimensions to slice data for easier analysis. In this case, the
  distribution of height based on the gender of a student can be studied.
QUESTION
Consider a bank having thousands of ATMs across India. In every
transaction, the following variables are recorded:
1. Withdrawal amount
2. Account balance after withdrawal
3. Transaction charge amount
4. Customer ID
5. ATM ID
6. Date of withdrawal
Which among the following are fact variables?
DIMENSIONAL MODELLING
What are the benefits of having dimension variables apart from facts?
• Performing various types of analyses, such as sector-wise, country-wise
  or funding type-wise analyses.
• Extracting specific, useful information such as the total investment made
  in the automobile sector in India between 2014 and 2015.
 BAR GRAPH
• Plots are used to convey different ideas.
• For example, you can use certain plots to visualize the spread of data
  across two variables and other plots to gauge the frequency of a label.
• Depending on the objective of your visualization task, you can choose an
  appropriate plot.
• A bar graph is helpful when you need to visualize a numeric feature (fact)
  across multiple categories.
• import mplotlib.pyplot as plt
• plt.bar(x_component, y_component): Used to draw a bar graph
• plt.show(): Explicit command required to display the plot object
 BAR GRAPH
• plt.xlabel(), plt.ylabel(): Specify labels for the x and y axes
• plt.title(): Add a title to the plot object.
 SCATTER PLOT
• Scatter plot, as the name suggests, displays how the variables are spread
  across the range considered. It can be used to identify a relationship or
  pattern between two quantitative variables and the presence of outliers
  within them.
• plt.scatter(x_axis, y_axis)
• plt.scatter(x_axis, y_axis, c = color, label = labels)
• Another feature of a scatter plot allows you to use labels to further
  distinguish points over another dimension variable.
SCATTER PLOT
SCATTER PLOT
 SCATTER PLOT-QUESTION
Select the cases where a scatterplot would be helpful in generating insights.
1. To check whether a relationship exists between the age of a person and
   their income.
2. To check whether there are any irregular entries in the data range.
3. To check whether stock prices are positively related to the profit of a
   company.
4. To understand the distribution of the salaries of the employees in a
   company.
 LINE GRAPH AND HISTOGRAM
• A line graph is used to present continuous time-dependent data. It accurately
  depicts the trend of a variable over a specified time period.
• A line chart or line plot or line graph or curve chart is a type of chart which
  displays information as a series of data points called 'markers' connected by
  straight line segments.
• A line graph can be helpful when you want to identify the trend of a particular
        )
  variable. Some key industries and services that rely on line graphs include
  financial markets and weather forecast
• plt.plot(x_axis, y_axis)
• plt.yticks(rotation = number)     #could do for xticks as well
LINE GRAPH AND HISTOGRAM
 LINE GRAPH AND HISTOGRAM
• A histogram is a frequency chart that records the number of occurrences of
  an entry or an element in a data set. It can be useful when you want to
  understand the distribution of a given series.
• A histogram is a plot that lets you discover, and show, the underlying
  frequency distribution (shape) of a set of continuous data.
• plt.hist(profit, bins = 100,edgecolor='Orange',color='cyan')
LINE GRAPH AND HISTOGRAM
BOX PLOT
• Box plots are quite effective in summarizing the spread of a large data set
  into a visual representation. They use percentiles to divide the data range.
• The percentile value gives the proportion of the data range that falls below
  a chosen data point when all the data points are arranged in the
  descending order.
• For example, if a data point with a value of 700 has a percentile value of
  99% in a data set, then it means that 99% of the values in the data set are
  less than 700.
 BOX PLOT
• A Box and Whisker Plot (or Box Plot) is a convenient way of visually
  displaying the data distribution through their quartiles.
• The lines extending parallel from the boxes are known as the “whiskers”,
  which are used to indicate variability outside the upper and lower
  quartiles.
• Outliers are sometimes plotted as individual dots that are in-line with
  whiskers. Box Plots can be drawn either vertically or horizontally.
• plt.boxplot([ list_1, list_2])
BOX PLOT
    BOX PLOT
Box plots divide the data range into three important categories, which are as
follows:
•    Median value: This is the value that divides the data range into two equal
     halves, i.e., the 50th percentile.
•    Interquartile range (IQR): These data points range between the 25th and
     75th percentile values.
•    Outliers: These are data points that differ significantly from other
     observations and lie beyond the whiskers.
BOX PLOT
 CHOOSING PLOT TYPES
• Each of the plot types is good at communicating a specific type of
  information. Which means, in certain situations, certain plot types are
  preferred over the others.
• So, how do you select the best possible plot type in a given situation?
• To answer this question, you need to first define the objective of creating a
  plot.
• A good visualization, along with the right type of graph, presents the
  relationship between different variables effectively and allows you to
  analyze them at a quick glance.
CHOOSING PLOT TYPES
 CHOOSING PLOT TYPES- COMPARISON
• These charts can be used when you want to
  compare one set of values with other sets of
  values.
• The objective is to differentiate one particular
  set of values from the other sets, for example,
  quarterly sales of competing phones in the
  market.
• The following two types of charts are used to
  show a comparison:
    1. Column chart
    2. Bar chart
    CHOOSING PLOT TYPES- COMPOSITION
• You would need to use a composition chart to display
  how the various elements make up the complete data.
• Composition charts can be static, which shows the
  composition at a particular instance of time, or
  dynamic, which shows the changes in the
  composition over a period of time.
• Two of the popular composition charts are as follows:
  • Pie/ Doughnut Chart
  • Stacked Column chart
• The pie chart is by far the most common way to
  represent static composition, while the stacked
  column chart can be used to show the variation of
  composition over a period of time.
    CHOOSING PLOT TYPES- RELATIONSHIP
• A relationship chart helps in visualizing the correlation between variables.
• It can help in answering questions such as ‘Is there a correlation between the
  amount spent on marketing and the sales revenue?’ and
• ‘How does the gross profit vary with the change in offers?’.
• Two of the most common types of charts used to visualize relationships
  between variables are as follows:
•      Scatter plot
•      Bubble Plot
 CHOOSING PLOT TYPES- RELATIONSHIP
• A scatter plot can help correlate two variables, whereas a bubble chart
  adds one more dimension, i.e., the size of the bubble (usually indicative of
  the frequency of occurrence of that particular data point)
    CHOOSING PLOT TYPES- DISTRIBUTION
• A distribution chart tries to answer the question ‘How is the data
  distributed?’.
• For example, suppose you asked everyone their age in a survey.
•    Using a distribution chart will help you visualize the distribution of ages in
    the data set.
• The distribution can be over a variable, or it can also be over a period of
  time. Two of the most used charts for visualizing distribution are as
  follows:
     • Histogram
     • Scatter plots
 CHOOSING PLOT TYPES- DISTRIBUTION
• Histograms are quite good at displaying the distribution of data over
  intervals, whereas scatter plots are good at visualizing the distribution of
  data over two different variables.
Supervised and
Unsupervised Learning
Supervised Learning vs Unsupervised Learning
   𝑥 → 𝑦                            𝑥
                  cat
                dog
                bear
               dog
                bear
                dog
                 cat
                 cat
                bear
Supervised Learning vs Unsupervised Learning
   𝑥 → 𝑦                            𝑥
                  cat
                dog
                bear
               dog
                bear
                dog
                 cat
                 cat
                bear
  Supervised Learning vs Unsupervised Learning
     𝑥 → 𝑦                            𝑥
                    cat
                  dog
                  bear
Classification   dog
                  bear
                                 Clustering
                  dog
                   cat
                   cat
                  bear
   Supervised Learning Examples
          Classification   cat
        Face Detection
       Language Parsing
Structured Prediction
Supervised Learning Examples
cat   = 𝑓(        )
      = 𝑓(                )
      = 𝑓(                )
            Supervised Learning – k-Nearest Neighbors
      cat
                   dog
                               bear
                                                   cat, cat, dog   k=3
cat
             cat         dog          bear
                                             119
            dog
                           bear
            Supervised Learning – k-Nearest Neighbors
      cat
                   dog
                               bear                            k=3
cat
                                                   bear, dog, dog
             cat         dog          bear
                                             120
            dog
                           bear
   Supervised Learning – k-Nearest Neighbors
•How do we choose the right K?
•How do we choose the right features?
•How do we choose the right distance metric?
                                   121
   Supervised Learning – k-Nearest Neighbors
•How do we choose the right K?
•How do we choose the right features?
•How do we choose the right distance metric?
   Answer: Just choose the one combination
                                  122      that works best!
   BUT not on the test data.
    Instead split the training data into a ”Training set” and
    a ”Validation set” (also called ”Development set”)
       Unsupervised Learning – k-means clustering
                               123
k=3
1. Initially assign
all images to a
random cluster
        Unsupervised Learning – k-means clustering
                                124
k=3
2. Compute the
mean image (in
feature space) for
each cluster
        Unsupervised Learning – k-means clustering
                                125
k=3
3. Reassign images
to clusters
based on similarity to
cluster means
         Unsupervised Learning – k-means clustering
                                 126
k=3
4. Keep repeating
this process
until convergence
  Unsupervised Learning – k-Means clustering
•How do we choose the right K?
•How do we choose the right features?
•How do we choose the right distance metric?
•How sensitive is this method with respect to the random
 assignment of clusters?
   Answer: Just choose the one combination
                                  127      that works best!
   BUT not on the test data.
    Instead split the training data into a ”Training set” and
    a ”Validation set” (also called ”Development set”)
               Supervised Learning - Classification
         Training Data                         Test Data
               dog
  cat                      bear
cat                  dog          bear
         cat
                                         128
        dog            bear
   Supervised Learning - Classification
Training Data                 Test Data
                cat
                dog
                cat
       .                129    .
       .                       .
       .                       .
                bear
            Supervised Learning - Classification
         Training Data
𝑥1 = [         ]         𝑦1 = [ cat   ]
𝑥2 = [         ]         𝑦2 = [dog ]
𝑥3 = [        ]          𝑦3 = [cat    ]
                   .                      130
                   .
                   .
𝑥𝑛 = [         ]         𝑦𝑛 = [bear ]
            Supervised Learning - Classification
        Training Data     targets /
        inputs            labels /       predictions
                                                          We need to find a function that
𝑥1 = [𝑥11 𝑥12 𝑥13 𝑥14 ]
                          ground truth
                          𝑦1 = 1         𝑦ො1 = 1
                                                          maps x and y for any of them
𝑥2 = [𝑥21 𝑥22 𝑥23 𝑥24 ]   𝑦2 = 2         𝑦ො2 = 2
                                                                  𝑦ෝ𝑖 = 𝑓(𝑥𝑖 ; 𝜃)
𝑥3 = [𝑥31 𝑥32 𝑥33 𝑥34 ]   𝑦3 = 1         𝑦ො3 = 2
                                                       How
                                                         131
                                                             do  we ”learn” the paramet
                 .                                     of this function?
                 .                                      We choose ones that makes the
                 .
                                                        following
                                                            𝑛
                                                                  quantity small:
𝑥𝑛 = [𝑥𝑛1 𝑥𝑛2 𝑥𝑛3 𝑥𝑛4 ]   𝑦𝑛 = 3         𝑦ො𝑛 = 1            𝐶𝑜𝑠𝑡(𝑦ෝ𝑖 , 𝑦𝑖 )
                                                           𝑖=1
         Supervised Learning – Linear Softmax
        Training Data     targets /
        inputs            labels /
                          ground truth
𝑥1 = [𝑥11 𝑥12 𝑥13 𝑥14 ]   𝑦1 = 1
𝑥2 = [𝑥21 𝑥22 𝑥23 𝑥24 ]   𝑦2 = 2
𝑥3 = [𝑥31 𝑥32 𝑥33 𝑥34 ]   𝑦3 = 1
                                         132
                 .
                 .
                 .
𝑥𝑛 = [𝑥𝑛1 𝑥𝑛2 𝑥𝑛3 𝑥𝑛4 ]   𝑦𝑛 = 3
         Supervised Learning – Linear Softmax
        Training Data     targets /
        inputs            labels /       predictions
                          ground truth
𝑥1 = [𝑥11 𝑥12 𝑥13 𝑥14 ]   𝑦1 = [1 0 0]   𝑦ො1 = [0.85 0.10 0.05]
𝑥2 = [𝑥21 𝑥22 𝑥23 𝑥24 ]   𝑦2 = [0 1 0]   𝑦ො2 = [0.20 0.70 0.10]
𝑥3 = [𝑥31 𝑥32 𝑥33 𝑥34 ]   𝑦3 = [1 0 0]   𝑦ො3 = [0.40 0.45 0.05]
                                         133
                 .
                 .
                 .
𝑥𝑛 = [𝑥𝑛1 𝑥𝑛2 𝑥𝑛3 𝑥𝑛4 ]   𝑦𝑛 = [0 0 1]   𝑦ො𝑛 = [0.40 0.25 0.35]
         Supervised Learning – Linear Softmax
𝑥𝑖 = [𝑥𝑖1 𝑥𝑖2 𝑥𝑖3 𝑥𝑖4 ]      𝑦𝑖 = [1 0 0]     𝑦ො𝑖 = [𝑓𝑐   𝑓𝑑   𝑓𝑏 ]
           𝑔𝑐 = 𝑤𝑐1 𝑥𝑖1 + 𝑤𝑐2 𝑥𝑖2 + 𝑤𝑐3 𝑥𝑖3 + 𝑤𝑐4 𝑥𝑖4 + 𝑏𝑐
           𝑔𝑑 = 𝑤𝑑1 𝑥𝑖1 + 𝑤𝑑2 𝑥𝑖2 + 𝑤𝑑3 𝑥𝑖3 + 𝑤𝑑4 𝑥𝑖4 + 𝑏𝑑
           𝑔𝑏 = 𝑤𝑏1 𝑥𝑖1 + 𝑤𝑏2 𝑥𝑖2 + 𝑤
                                    134
                                       𝑏3 𝑥𝑖3 + 𝑤𝑏4 𝑥𝑖4 + 𝑏𝑏
                                 𝑔𝑐     𝑔𝑐   𝑔𝑑      𝑔𝑏
                          𝑓𝑐 = 𝑒 /(𝑒 +𝑒 + 𝑒 )
                          𝑓𝑑 = 𝑒 𝑔𝑑 /(𝑒 𝑔𝑐 +𝑒 𝑔𝑑 + 𝑒 𝑔𝑏 )
                                𝑔𝑏     𝑔𝑐    𝑔𝑑     𝑔𝑏
                          𝑓𝑏 = 𝑒 /(𝑒 +𝑒 + 𝑒 )