Unit - II
Data Analysis
1. Regression modeling:
Regression Analysis
In the domain of statistics, the most commonly used statistical technique is Regression Analysis which is
used to estimate particular relationships among variables. Under this technique, the main focus is upon the
relationship between dependent variable and any one or more independent variables. There are several
techniques within this analysis that are used for modeling and analyzing several variables. This technique
helps you see how the particular value of a dependent variable changes when any one of the independent
variable varies with all others fixed. In simple terms, through this approach you get to estimate the
conditional expectation or the average value of the dependent variable. Thoroughly in all the cases, the
target for estimation is a function of any one or more independent variables, which is termed as regression
function. The main goal of regression analysis is to ascertain the values of all the parameters to derive a
function that will fit the data observations in the best way possible.
Common uses of Regression Analysis
Most widely, this technique is used for predicting and forecasting. In both these fields, the uses overlap
with that of the domain of machine learning. The technique also helps you in figuring out the form and type
of relationships that forms between a dependent variable and the independent variable. It also interprets the
casual relationships between the same.
There are a variety of techniques within data analytic that are employed to carry out regression analysis.
Some of the famous ones are- Linear Regression Analytic, Logistic Regression Analytic and ordinary least
squares. Linear regression and squares techniques are parametric; in both of these methodologies
regression function is managed from the limited number of peculiar parameters.
The way how regression analysis methods perform depends on the types of the data generating process.
Below given is the list of variables that various regression models incorporate
The dependent variable, Y
The independent variable, X
The unknown parameters, β. It is represented as a scalar or a vector
What is Regression Analysis?
Lets take a simple example : Suppose your manager asked you to predict annual sales. There can be a
hundred of factors (drivers) that affects sales. In this case, sales is your dependent variable. Factors
affecting sales are independent variables. Regression analysis would help you to solve this problem.
1. Linear Regression
Linear regression is one of the prime statistical tools used to study or analyze the relationship between two
variables, X and Y. The purpose of this widely used tool is to analyze the correlation of a response variable
to the notified explanatory variable. This statistical method is also employed to forecast future values from
past values. In economy, it is generally used to ascertain when and how the prices are extended beyond the
stipulated limits. The technique applies the method of least squares in which a straight line is drawn
through the variable prices. This way it lessens the yawning distance between the available prices and the
sequential trendline.
How Linear Regression Works?
Linear regression usually works on logical assumptions. For example if you want to predict the next day price or demand of any
particular security or commodity, then your logical guess would be quite close to the present price or demand of the same. If
prices or demands are continuously soaring or slipping, then your logical guess will reflect with an upward bias or downward
bias respectively. The line which is drawn to connect the two variables using the techniques of least squares is commonly known
as a 'Linear Regression trendline'. It is a line which is shown exactly at middle of the demands or prices.
The main and foremost purpose of linear regression is to adapt the slope values to that of the variables and then form the line that
predicts Y from X in best perceivable manner. If narrowed down, the main goal of this technique is to condense the total number
of squares of the vertical distances of the variables from the prediction line.
In the above figure, a correlation displayed in a straight line is based on the assumption of two variables, X and Y. Though the
exact values cannot be figured out through the values of Y for different values of X, but with 'Linear regression Analytics
Outsourcing', you can assume the statistical correlation between these two variables.
There are various statistics software like SAS, SPSS and R-square that are commonly used in linear regression analyzes.
2. Polynomial Regression
Is a form of linear regression in which the relationship between the independent variable x and
dependent variable y is modeled as an nth degree polynomial. Polynomial regression fits a nonlinear
relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x)
Why Polynomial Regression:
There are some relationships that a researcher will hypothesize is curvilinear. Clearly, such type of
cases will include a polynomial term.
Inspection of residuals. If we try to fit a linear model to curved data, a scatter plot of residuals (Y
axis) on the predictor (X axis) will have patches of many positive residuals in the middle. Hence in
such situation it is not appropriate.
An assumption in usual multiple linear regression analysis is that all the independent variables are
independent. In polynomial regression model, this assumption is not satisfied.
Uses of Polynomial Regression:
These are basically used to define or describe non-linear phenomenon such as:
Growth rate of tissues.
Progression of disease epidemics
Distribution of carbon isotopes in lake sediments
The basic goal of regression analysis is to model the expected value of a dependent variable y in terms of
the value of an independent variable x. In simple regression, we used following equation –
y = a + bx + e
Here y is dependent variable, a is y intercept, b is the slope and e is the error rate.
3. Logistic Regression
In logistic regression, the dependent variable is binary in nature (having two categories). Independent
variables can be continuous or binary. In multinomial logistic regression, you can have more than two
categories in your dependent variable.
Here my model is:
logistic regression equation
Why don't we use linear regression in this case?
Homoscedasticity assumption is violated.
Errors are not normally distributed
y follows binomial distribution and hence is not normal.
Examples
HR Analytics : IT firms recruit large number of people, but one of the problems they encounter is
after accepting the job offer many candidates do not join. So, this results in cost over-runs because they
have to repeat the entire process again. Now when you get an application, can you actually predict whether
that applicant is likely to join the organization (Binary Outcome - Join / Not Join).
Elections : Suppose that we are interested in the factors that influence whether a political candidate
wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of
interest are the amount of money spent on the campaign and the amount of time spent campaigning
negatively.
4. Lasso Regression
Lasso stands for Least Absolute Shrinkage and Selection Operator. It makes use of L1
regularization technique in the objective function. Thus the objective function in LASSO regression
becomes:
λ is the regularization parameter and the intercept term is not regularized. We do not assume that the error
terms are normally [Link] the estimates we don't have any specific mathematical formula but we
can obtain the estimates using some statistical software.
Note that lasso regression also needs standardization.
Advantage of lasso over ridge regression
Lasso regression can perform in-built variable selection as well as parameter shrinkage. While using ridge
regression one may end up getting all the variables but with Shrinked Paramaters.
5. Support Vector Regression
As in classification, support vector regression (SVR) is characterized by the use of kernels, sparse solution,
and VC control of the margin and the number if support vectors. Although less popular than SVM, SVR
has been proven to be an effective tool in real-value function estimation Support vector regression can
solve both linear and non-linear models. SVM uses non-linear kernel functions (such as polynomial) to find
the optimal solution for non-linear models.
The main idea of SVR is to minimize error, individualizing the hyperplane which maximizes the
margin.
library(e1071)
[Link] <- svm(Y ~ X , data)
pred <- predict([Link], data)
points(data$X, pred, col = "red", pch=4)
MULTIVARIATE ANALYSIS
Introduction
Multivariate means involving multiple dependent variables resulting in one outcome. This
explains that the majority of the problems in the real world are Multivariate. For example, we
cannot predict the weather of any year based on the season. There are multiple factors like
pollution, humidity, precipitation, etc. Here, we will introduce you to multivariate analysis, its
history, and its application in different fields.
Multivariate analysis: An overview
Suppose a project has been assigned to you to predict the sales of the company. You cannot
simply say that ‘X’ is the factor which will affect the sales.
We know that there are multiple aspects or variables which will impact sales. To analyze the
variables that will impact sales majorly, can only be found with multivariate analysis. And in
most cases, it will not be just one variable.
Like we know, sales will depend on the category of product, production capacity, geographical
location, marketing effort, presence of the brand in the market, competitor analysis, cost of the
product, and multiple other variables. Sales is just one example; this study can be implemented
in any section of most of the fields.
Multivariate analysis is used widely in many industries, like healthcare. In the recent event
of COVID-19, a team of data scientists predicted that Delhi would have more than 5lakh
COVID-19 patients by the end of July 2020. This analysis was based on multiple variables like
government decision, public behavior, population, occupation, public transport, healthcare
services, and overall immunity of the community.
Multivariate analysis is part of Exploratory data analysis. Based on MVA, we can visualize the
deeper insight of multiple variables.
There are more than 20 different methods to perform multivariate analysis and which method is
best depends on the type of data and the problem you are trying to solve.
Multivariate analysis (MVA) is a Statistical procedure for analysis of data involving more
than one type of measurement or observation. It may also mean solving problems where more
than one dependent variable is analyzed simultaneously with other variables.
Advantages and Disadvantages of Multivariate Analysis
Advantages
The main advantage of multivariate analysis is that since it considers more than one
factor of independent variables that influence the variability of dependent variables,
the conclusion drawn is more accurate.
The conclusions are more realistic and nearer to the real-life situation.
Disadvantages
The main disadvantage of MVA includes that it requires rather complex computations
to arrive at a satisfactory conclusion.
Many observations for a large number of variables need to be collected and tabulated;
it is a rather time-consuming process.
Classification Chart of Multivariate Techniques
Selection of the appropriate multivariate technique depends upon-
a) Are the variables divided into independent and dependent classification?
b) If Yes, how many variables are treated as dependents in a single analysis?
c) How are the variables, both dependent and independent measured?
Multivariate analysis technique can be classified into two broad categories viz., This
classification depends upon the question: are the involved variables dependent on each other or
not?
If the answer is yes: We have Dependence methods.
If the answer is no: We have Interdependence methods.
Dependence technique: Dependence Techniques are types of multivariate analysis techniques
that are used when one or more of the variables can be identified as dependent variables and
the remaining variables can be identified as independent.
Multiple Regression
Multiple Regression Analysis– Multiple regression is an extension of simple linear regression.
It is used when we want to predict the value of a variable based on the value of two or more
other variables. The variable we want to predict is called the dependent variable (or sometimes,
the outcome, target, or criterion variable). Multiple regressions use multiple “x” variables for
each independent variable: (x1)1, (x2)1, (x3)1, Y1)
Conjoint analysis
‘Conjoint analysis‘ is a survey-based statistical technique used in market research that helps
determine how people value different attributes (feature, function, benefits) that make up an
individual product or service. The objective of conjoint analysis is to determine the choices or
decisions of the end-user, which drives the policy/product/service. Today it is used in many
fields including marketing, product management, operations research, etc.
It is used frequently in testing consumer response to new products, in acceptance of
advertisements and in-service design. Conjoint analysis techniques may also be referred to as
multi-attribute compositional modeling, discrete choice modeling, or stated preference
research, and is part of a broader set of trade-off analysis tools used for systematic analysis of
decisions.
There are multiple conjoint techniques, few of them are CBC (Choice-based conjoint) or
ACBC (Adaptive CBC).
Multiple Discriminant Analysis
The objective of discriminant analysis is to determine group membership of samples from a
group of predictors by finding linear combinations of the variables which maximize the
differences between the variables being studied, to establish a model to sort objects into their
appropriate populations with minimal error.
Discriminant analysis derives an equation as a linear combination of the independent variables
that will discriminate best between the groups in the dependent variable. This linear
combination is known as the discriminant function. The weights assigned to each independent
variable are corrected for the interrelationships among all the variables. The weights are
referred to as discriminant coefficients.
The discriminant equation:
F = β0 + β1X1 + β2X2 + … + βpXp + ε
where, F is a latent variable formed by the linear combination of the dependent variable, X1,
X2,… XP is the p independent variable, ε is the error term and β0, β1, β2,…, βp is the
discriminant coefficients.
A linear probability model
A linear probability model (LPM) is a regression model where the outcome variable is binary,
and one or more explanatory variables are used to predict the outcome. Explanatory variables
can themselves be binary or be continuous. If the classification involves a binary dependent
variable and the independent variables include non-metric ones, it is better to apply linear
probability models.
Binary outcomes are everywhere: whether a person died or not, broke a hip, has hypertension
or diabetes, etc.
We typically want to understand what the probability of the binary outcome is given
explanatory variables.
We could actually use our linear model to do so, it’s very simple to understand why. If Y is an
indicator or dummy variable, then E[Y |X] is the proportion of 1s given X, which we interpret
as a probability of Y given X.
We can then interpret the parameters as the change in the probability of Y when X changes by
one unit or for a small change in X For example, if we model , we could interpret β1 as the
change in the probability of death for an additional year of age
Multivariate Analysis of Variance and Covariance
Multivariate analysis of variance (MANOVA) is an extension of a common analysis of
variance (ANOVA). In ANOVA, differences among various group means on a single-response
variable are studied. In MANOVA, the number of response variables is increased to two or
more. The hypothesis concerns a comparison of vectors of group means. A MANOVA has one
or more factors (each with two or more levels) and two or more dependent variables. The
calculations are extensions of the general linear model approach used for ANOVA.
Canonical Correlation Analysis
Canonical correlation analysis is the study of the linear relations between two sets of variables.
It is the multivariate extension of correlation analysis.
CCA is used for two typical purposes :-
Data Reduction
Data Interpretation
You could compute all correlations between variables from the one set (p) to the variables in
the second set (q), however interpretation is difficult when pq is large.
Canonical Correlation Analysis allows us to summarize the relationships into a lesser number
of statistics while preserving the main facets of the relationships. In a way, the motivation for
canonical correlation is very similar to principal component analysis.
Structural Equation Modeling
Structural equation modeling is a multivariate statistical analysis technique that is used to
analyze structural relationships. It is an extremely broad and flexible framework for data
analysis, perhaps better thought of as a family of related methods rather than as a single
technique.
SEM in a single analysis can assess the assumed causation among a set of dependent and
independent constructs i.e. validation of the structural model and the loadings of observed
items (measurements) on their expected latent variables (constructs) i.e. validation of the
measurement model. The combined analysis of the measurement and the structural model
enables the measurement errors of the observed variables to be analyzed as an integral part of
the model, and factor analysis combined in one operation with the hypotheses testing.
Interdependence Technique
Interdependence techniques are a type of relationship that variables cannot be classified as
either dependent or independent.
It aims to unravel relationships between variables and/or subjects without explicitly assuming
specific distributions for the variables. The idea is to describe the patterns in the data without
making (very) strong assumptions about the variables.
Factor Analysis
Factor analysis is a way to condense the data in many variables into just a few variables. For
this reason, it is also sometimes called “dimension reduction”. It makes the grouping of
variables with high correlation. Factor analysis includes techniques such as principal
component analysis and common factor analysis.
This type of technique is used as a pre-processing step to transform the data before using other
models. When the data has too many variables, the performance of multivariate techniques is
not at the optimum level, as patterns are more difficult to find. By using factor analysis, the
patterns become less diluted and easier to analyze.
Cluster analysis
Cluster analysis is a class of techniques that are used to classify objects or cases into relative
groups called clusters. In cluster analysis, there is no prior information about the group or
cluster membership for any of the objects.
While doing cluster analysis, we first partition the set of data into groups based on
data similarity and then assign the labels to the groups.
The main advantage of clustering over classification is that it is adaptable to changes
and helps single out useful features that distinguish different groups.
Cluster Analysis used in outlier detection applications such as detection of credit card fraud. As
a data mining function, cluster analysis serves as a tool to gain insight into the distribution of
data to observe the characteristics of each cluster.
Multidimensional Scaling
Multidimensional scaling (MDS) is a technique that creates a map displaying the relative
positions of several objects, given only a table of the distances between them. The map may
consist of one, two, three, or even more dimensions. The program calculates either the metric
or the non-metric solution. The table of distances is known as the proximity matrix. It arises
either directly from experiments or indirectly as a correlation matrix.
Correspondence analysis
Correspondence analysis is a method for visualizing the rows and columns of a table of non-
negative data as points in a map, with a specific spatial interpretation. Data are usually counted
in a cross-tabulation, although the method has been extended to many other types of data using
appropriate data transformations. For cross-tabulations, the method can be considered to
explain the association between the rows and columns of the table as measured by the Pearson
chi-square statistic. The method has several similarities to principal component analysis, in that
it situates the rows or the columns in a high-dimensional space and then finds a best-fitting
subspace, usually a plane, in which to approximate the points.
A correspondence table is any rectangular two-way array of non-negative quantities that
indicates the strength of association between the row entry and the column entry of the table.
The most common example of a correspondence table is a contingency table, in which row and
column entries refer to the categories of two categorical variables, and the quantities in the cells
of the table are frequencies.
The Objective of multivariate analysis
(1) Data reduction or structural simplification: This helps data to get simplified as possible
without sacrificing valuable information. This will make interpretation easier.
(2) Sorting and grouping: When we have multiple variables, Groups of “similar” objects or
variables are created, based upon measured characteristics.
(3) Investigation of dependence among variables: The nature of the relationships among
variables is of interest. Are all the variables mutually independent or are one or more variables
dependent on the others?
(4) Prediction Relationships between variables: must be determined for the purpose of
predicting the values of one or more variables based on observations on the other variables.
(5) Hypothesis construction and testing. Specific statistical hypotheses, formulated in terms
of the parameters of multivariate populations, are tested. This may be done to validate
assumptions or to reinforce prior convictions.
Model Building Process
Model Building–choosing predictors–is one of those skills in statistics that is difficult to tell. It
is hard to lay out the steps, because at each step, you must evaluate the situation and make
decisions on the next step. But here are some of the steps to keep in mind.
The primary part (stages one to stages three) deals with the analysis objectives, analysis style
concerns, and testing for assumptions. The second half deals with the problems referring to
model estimation, interpretation and model validation. Below is the general flow chart to
building an appropriate model by using any application of the variable techniques-
Model Assumptions
Prediction of relations between variables is not an easy task. Each model has its assumptions.
The most important assumptions underlying multivariate analysis are normality,
homoscedasticity, linearity, and the absence of correlated errors. If the dataset does not
follow the assumptions, the researcher needs to do some preprocessing. Missing this step can
cause incorrect models that produce false and unreliable results.
Multivariate Statistics Summary
The key to multivariate statistics understands conceptually the relationship among techniques
with regards to:
The kinds of problems each technique is suited for.
The objective(s) of each technique.
The data structure required for each technique,
Sampling considerations for each technique.
Underlying mathematical model, or lack thereof, of each technique.
Potential for complementary use of techniques
Finally, I would like to conclude that each technique also has certain strengths and weaknesses
that should be clearly understood by the analyst before attempting to interpret the results of the
technique. Current statistical packages (SAS, SPSS, S-Plus, and others) make it increasingly
easy to run a procedure, but the results can be disastrously misinterpreted without adequate
care.
One of the best quotes by Albert Einstein which explains the need for Multivariate analysis is,
“If you can’t explain it simply, you don’t understand it well enough.”
I tried to provide every aspect of Multivariate analysis. In short, Multivariate data analysis can
help to explore data structures of the investigated samples.
Introduction
In statistics, Probabilistic models are used to define a relationship between variables and can be
used to calculate the probabilities of each variable. In many problems, there are a large number
of variables. In such cases, the fully conditional models require a huge amount of data to cover
each and every case of the probability functions which may be intractable to calculate in real-
time. There have been several attempts to simplify the conditional probability calculations such
as the Naïve Bayes but still, it does not prove to be efficient as it drastically cuts down several
variables.
The only way is to develop a model that can preserve the conditional dependencies between
random variables and conditional independence in other cases. This leads us to the concept of
Bayesian Networks. These Bayesian Networks help us to effectively visualize the probabilistic
model for each domain and to study the relationship between random variables in the form of a
user-friendly graph.
What are Bayesian Networks?
By definition, Bayesian Networks are a type of Probabilistic Graphical Model that uses the
Bayesian inferences for probability computations. It represents a set of variables and its
conditional probabilities with a Directed Acyclic Graph (DAG). They are primarily suited for
considering an event that has occurred and predicting the likelihood that any one of the several
possible known causes is the contributing factor.
Source
As mentioned above, by making use of the relationships which are specified by the Bayesian
Network, we can obtain the Joint Probability Distribution (JPF) with the conditional
probabilities. Each node in the graph represents a random variable and the arc (or directed
arrow) represents the relationship between the nodes. They can be either continuous or discrete
in nature.
In the above diagram A, B, C and D are 4 random variables represented by nodes given in the
network of the graph. To node B, A is its parent node and C is its child node. Node C is
independent of Node A.
Before we get into the implementation of a Bayesian Network, there are a few probability
basics that have to be understood.
Local Markov Property
The Bayesian Networks satisfy the property known as the Local Markov Property. It states that
a node is conditionally independent of its non-descendants, given its parents. In the above
example, P(D|A, B) is equal to P(D|A) because D is independent of its non-descendent, B. This
property aids us in simplifying the Joint Distribution. The Local Markov Property leads us to
the concept of a Markov Random Field which is a random field around a variable that is said to
follow Markov properties.
Conditional Probability
In mathematics, the Conditional Probability of event A is the probability that event A will
occur given that another event B has already occurred. In simple terms, p(A | B) is the
probability of event A occurring, given that event, B occurs. However, there are two types of
event possibilities between A and B. They may be either dependent events or independent
events. Depending upon their type, there are two different ways to calculate the conditional
probability.
Given A and B are dependent events, the conditional probability is calculated as P (A| B) = P
(A and B) / P (B)
If A and B are independent events, then the expression for conditional probability is given by,
P(A| B) = P (A)
Joint Probability Distribution
Before we get into an example of Bayesian Networks, let us understand the concept of Joint
Probability Distribution. Consider 3 variables a1, a2 and a3. By definition, the probabilities of
all different possible combinations of a1, a2, and a3 are called its Joint Probability Distribution.
If P[a1,a2, a3,….., an] is the JPD of the following variables from a1 to an, then there are
several ways of calculating the Joint Probability Distribution as a combination of various terms
such as,
P[a1,a2, a3,….., an] = P[a1 | a2, a3,….., an] * P[a2, a3,….., an]
= P[a1 | a2, a3,….., an] * P[a2 | a3,….., an]….P[an-1|an] * P[an]
Generalizing the above equation, we can write the Joint Probability Distribution as,
P(Xi|Xi-1,………, Xn) = P(Xi |Parents(Xi ))
Example of Bayesian Networks
Let us now understand the mechanism of Bayesian Networks and their advantages with the
help of a simple example. In this example, let us imagine that we are given the task of
modeling a student’s marks (m) for an exam he has just given. From the given Bayesian
Network Graph below, we see that the marks depend upon two other variables. They are,
Exam Level (e)– This discrete variable denotes the difficulty of the exam and has two values (0
for easy and 1 for difficult)
IQ Level (i) – This represents the Intelligence Quotient level of the student and is also discrete
in nature having two values (0 for low and 1 for high)
Additionally, the IQ level of the student also leads us to another variable, which is the Aptitude
Score of the student (s). Now, with marks the student has scored, he can secure admission to a
particular university. The probability distribution for getting admitted (a) to a university is also
given below.
In the above graph, we see several tables representing the probability distribution values of the
given 5 variables. These tables are called the Conditional Probabilities Table or CPT. There are
a few properties of the CPT given below –
The sum of the CPT values in each row must be equal to 1 because all the possible cases for a
particular variable are exhaustive (representing all possibilities).
If a variable that is Boolean in nature has k Boolean parents, then in the CPT it has 2K
probability values.
Coming back to our problem, let us first list all the possible events that are occurring in the
above-given table.
1. Exam Level (e)
2. IQ Level (i)
3. Aptitude Score (s)
4. Marks (m)
5. Admission (a)
These five variables are represented in the form of a Directed Acyclic Graph (DAG) in a
Bayesian Network format with their Conditional Probability tables. Now, to calculate the Joint
Probability Distribution of the 5 variables the formula is given by,
P[a, m, i, e, s]= P(a | m) . P(m | i, e) . P(i) . P(e) . P(s | i)
From the above formula,
P(a | m) denotes the conditional probability of the student getting admission based on the marks
he has scored in the examination.
P(m | i, e) represents the marks that the student will score given his IQ level and difficulty of
the Exam Level.
P(i) and P(e) represent the probability of the IQ Level and the Exam Level.
P(s | i) is the conditional probability of the student’s Aptitude Score, given his IQ Level.
With the following probabilities calculated, we can find the Joint Probability Distribution of the
entire Bayesian Network.
Calculation of Joint Probability Distribution
Let us now calculate the JPD for two cases.
Case 1: Calculate the probability that in spite of the exam level being difficult, the student
having a low IQ level and a low Aptitude Score, manages to pass the exam and secure
admission to the university.
From the above word problem statement, the Joint Probability Distribution can be written as
below,
P[a=1, m=1, i=0, e=1, s=0]
From the above Conditional Probability tables, the values for the given conditions are fed to the
formula and is calculated as below.
P[a=1, m=1, i=0, e=0, s=0] = P(a=1 | m=1) . P(m=1 | i=0, e=1) . P(i=0) . P(e=1) . P(s=0 | i=0)
= 0.1 * 0.1 * 0.8 * 0.3 * 0.75
= 0.0018
Case 2: In another case, calculate the probability that the student has a High IQ level and
Aptitude Score, the exam being easy yet fails to pass and does not secure admission to the
university.
The formula for the JPD is given by
P[a=0, m=0, i=1, e=0, s=1]
Thus,
P[a=0, m=0, i=1, e=0, s=1]= P(a=0 | m=0) . P(m=0 | i=1, e=0) . P(i=1) . P(e=0) . P(s=1 | i=1)
= 0.6 * 0.5 * 0.2 * 0.7 * 0.6
= 0.0252
Hence, in this way, we can make use of Bayesian Networks and Probability tables to calculate
the probability for various possible events that occur.
Conclusion
There are innumerable applications to Bayesian Networks in Spam Filtering, Semantic Search,
Information Retrieval, and many more. For example, with a given symptom we can predict the
probability of a disease occurring with several other factors contributing to the disease. Thus,
the concept of the Bayesian Network is introduced in this article along with its implementation
with a real-life example.
If you would like to know more about careers in Machine Learning and Artificial Intelligence,
check out IIT Madras and upGrad’s Advanced Certification in Machine Learning and Cloud.
Lead the AI Driven Technological
Revolution