0% found this document useful (0 votes)

215 views

CS263 - Bayesian Decision Theory

The document summarizes Bayesian decision theory as applied to pattern recognition problems. It discusses how Bayesian decision theory provides an optimal framework for classifying patterns into categories based on probability distributions over attribute values. The summary describes the key aspects of Bayesian decision theory, including the Bayes rule for calculating posterior probabilities, discriminant functions for two-category classification problems, and approaches for extending Bayesian classification to problems with more than two categories.

Uploaded by

Aaron Carl Fernandez

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

215 views

CS263 - Bayesian Decision Theory

Uploaded by

Aaron Carl Fernandez

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

A Written Report About Bayesian Decision Theory in Pattern Recognition

Aaron Carl T. Fernandez

MSCS – Mapua University

Introduction
One of the goals in pattern recognition is the achievement of an optimal decision rule to classify
these data into their respective categories. Of all existing decision rules in pattern recognition
such as the Chow’s rule and the nearest neaighbor rule, the Bayesian decision theory is often
considered as one of the most optimal (Bow, 2002).

The Bayesian approach describe categories by probability distributions over the attributes of the
objects specified by a model function and its parameters. It also has several advantages over
other methods such as but not limited to, the number of categories is determined automatically,
objects are not assigned to the categories absolutely, all attributes are potentially significant and
data can be real or discrete.

This written report presents a review of Bayesian decision theory in pattern recognition. Decision
theories deal with the development of methods and techniques that are approriate for making
decisions in an optimal fashion. This optimality of the Bayesian approach has been exemplified
in this paper by surverying real-world applications in artificial intelligence and pattern
recognition research.

The survey revealed that more often, the Bayesian approach outperforms all other machine
learning models utilized in solving the task at hand. This paper enumerates five real-world
examples of applying Bayesian decision-driven machine learning in english letter recognition,
computer-vision application, spam filtering, database clustering and association football
prediction.

Bayesian Decision Theory

The Bayes decision theory is a fundamental statistical approach in pattern classification problems
which quantifies the tradeoffs between various classifications using probability and the costs that
accompany such classifications which is based on the Bayes rule:

𝑝(𝑥|𝜔𝑗 )P(𝜔𝑗 )
𝑃(𝜔𝑗 |𝑥) =
𝑝(x)
𝐹𝑜𝑟𝑚𝑢𝑙𝑎 1 − 𝐵𝑎𝑦𝑒𝑠 𝑟𝑢𝑙𝑒 𝑢𝑠𝑖𝑛𝑔 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠

It is an expression of conditional probabilities which represent the chance of an event occuring

given an evidence. The formula above is translated in Lehman’s term as:

𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑥 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦

𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑖𝑡𝑦 = 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒
𝐹𝑜𝑟𝑚𝑢𝑙𝑎 2 − 𝐵𝑎𝑦𝑒𝑠 𝑟𝑢𝑙𝑒 𝑖𝑛 𝐿𝑒ℎ𝑚𝑎𝑛′ 𝑠 𝑡𝑒𝑟𝑚
As shown in the simplified formula above, the posterior probability is the result of the Bayes
rule. Specifically, it states the probability of an event occurring (or a condition being true) given
a specific evidence. Hence the posterior probability is shown as P(ωj|x) where ωj represents a
finite set of ω possible states and x is the feature vector which can also be referred as the feature
space.

Likelihood is the probability of a specific class given a random variable while the prior
probability is the initial reflection of how likely a certain class is expected before the actual
observation.

The evidence signified as 𝑝(x) is usually considered as a scaling term which the Bayes theorem
states as equivalent to the following formula:

𝑝(𝑥) = ∑ 𝑝(𝑥|𝜔𝑗 )P(𝜔𝑗 )

𝑖=1
𝐹𝑜𝑟𝑚𝑢𝑙𝑎 3 − 𝑆𝑐𝑎𝑙𝑖𝑛𝑔 𝑡𝑒𝑟𝑚 𝑓𝑜𝑟𝑚𝑢𝑙𝑎

Continuous and Discrete Bayes

The continous case involves feature vectors that could be any point in the dimensional space
such as the temperature of a room, current bank balances, or any value including decimals while
the discrete case deals with limited or finite number of possible observations such as the state of
a light switch which can only be on (1) or off (0). This leads to minor differences in the way a
decision rule is calculated one of which is the continous case utilizes probability density
functions (see formula 1) while the discrete case just utilizes the probability distribution (see
formula 4).

𝑃(𝑥|𝜔𝑗 )P(𝜔𝑗 )
𝑃(𝜔𝑗 |𝑥) = 𝑃(x)
𝐹𝑜𝑟𝑚𝑢𝑙𝑎 4 − 𝐵𝑎𝑦𝑒𝑠 𝑟𝑢𝑙𝑒 𝑢𝑠𝑖𝑛𝑔 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

Nonetheless, the Bayes decision rule remains unchanged for both cases as its purpose is to
minimize the risk or cost in the decision.

The Risk Function

For example, we are considering shelling out 1,000,00,000 pesos for a business insurance against
fire which costs 100,000 pesos a year. If we decided to not insure it and a fire occurred, then the
cost is 1,000,000 pesos but if a fire does not occur, then we gain 100,000 pesos for not availing
the insurance. The key is to take the action that leads to the minimum risk hence the need to
calculate the risk for every possible decision.

Each action αi has an associated risk 𝑅(𝑎𝑖 | 𝑥), and 𝜆(𝑎𝑖 | 𝜔𝑗 ) is the loss incurred for deciding
𝜔𝑗 . The conditional risk is computed to minimize the overall risk which is the same for both the
continuous and discrete case:
𝑛

𝑅(𝑎𝑖 | 𝑥) = ∑ 𝜆(𝑎𝑖 | 𝜔𝑗 )P(𝜔𝑗 |x)

𝑗=1
𝐹𝑜𝑟𝑚𝑢𝑙𝑎 5 − 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 𝑟𝑖𝑠𝑘

The risk given our observations for every action is the sum of all losses for that action given all
the states, weighted by the probability of occurrence of each state. The action with the minimum
risk is then selected.

The Two-Category Case

A dichotomizer is a classifier that places a pattern in one of two possible categories. Given two
discriminant functions 𝑔1 (𝑥) and 𝑔2 (𝑥), 𝑥 will be assigned to 𝜔1 if 𝑔1 (𝑥) > 𝑔2 (𝑥). This enables
the following definition:

𝐺(𝑥) = 𝑔1 (𝑥) - 𝑔2 (𝑥).

𝐹𝑜𝑟𝑚𝑢𝑙𝑎 6 – Dichotomizer formula

Given the formula above, we can decide 𝜔1 if 𝑔1 (𝑥) > 0 which gives us two forms of
discriminant fucntion:
𝑔(𝑥) = 𝑃(𝜔1 |𝑥) − 𝑃(𝜔2 |𝑥)
𝑃(𝑥|𝜔 ) 𝑃(𝜔 )
𝑔(𝑥) = 𝑙𝑛 𝑃(𝑥|𝜔1 ) + 𝑙𝑛 𝑃(𝜔1 )
2 2
𝐹𝑜𝑟𝑚𝑢𝑙𝑎 7 − 𝑇𝑤𝑜 𝑓𝑜𝑟𝑚𝑠 𝑜𝑓 𝑡𝑤𝑜 − 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑐𝑎𝑠𝑒 𝑑𝑖𝑠𝑐𝑟𝑖𝑚𝑖𝑛𝑎𝑛𝑡 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑣𝑒𝑐𝑡𝑜𝑟𝑠

If the feature vector is binary and assumed (correctly or incorrectly) independent, a simplified
Bayes rules can be employed:
𝑑

𝑔(𝑥) = ∑ 𝜔𝑖 xi + 𝜔0
𝑖 =1
where
𝑝𝑖 (1 − 𝑞𝑖 )
𝜔𝑖 = 𝑙𝑛 ,𝑖 = 1…𝑑
𝑞𝑖 (1 − 𝑝𝑖 )
and
𝑑
1 − 𝑝𝑖 𝑃(𝜔1 )
𝜔0 = ∑ 𝑙𝑛 + 𝑙𝑛
1 − 𝑞𝑖 𝑃(𝜔2 )
𝑖 =1
𝐹𝑜𝑟𝑚𝑢𝑙𝑎 8 − 𝑇𝑤𝑜 𝑓𝑜𝑟𝑚𝑠 𝑜𝑓 𝑡𝑤𝑜 − 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑐𝑎𝑠𝑒 𝑑𝑖𝑠𝑐𝑟𝑖𝑚𝑖𝑛𝑎𝑛𝑡 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠 𝑓𝑜𝑟 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑣𝑒𝑐𝑡𝑜𝑟𝑠

It is important to note that 𝜔𝑖 and 𝜔0 are weights calculated for the linear discriminant. The
discriminant function 𝑔(𝑥) above will indicate whether the current feature vector belongs to
class 1 or class 2. A decision boundary lies wherever 𝑔(𝑥) = 0. This decision boundary can be a
line, or hyper-plane depending on the dimension of the feature space.
Example of a Two-Category Case Problem

𝐹𝑖𝑔𝑢𝑟𝑒 1 – Observed data and its possible classes

Consider the three-dimensional binary feature vector 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) = (0, 1, 1) of the observed
input above which we will attempt to classify if it falls under class 1 or class 2. Given the
following prior probabilities: 𝑃(𝜔1 ) = 0.6 and 𝑃(𝜔2 ) = 0.4. It is already evident that there is a
bias towards class 1.

Also, the likelihood of each independent features is: 𝑝 = {0.8, 0.2, 0.5} and 𝑞 = {0.2, 0.5, 0.9}.
Since the problem definition assumes that the features are independent, the discriminant function
can be calculated as follows:

0.8(1−0.2) 0.2(1−0.5) 0.5(1−0.9)

𝜔1 = 𝑙𝑛 0.2(1−0.8) = 2.77; 𝜔2 = 𝑙𝑛 0.5(1−0.2) = -1.39; 𝜔3 = 𝑙𝑛 0.9(1−0.5) = -2.19;
0.6 1−0.8 1−0.2 1−0.5
𝜔0 = ln (0.4) + ln (1−0.2) + ln(1−0.5) + ln (1−0.9) = 1.0986

𝑔(𝑥) = 2.77𝑥1 - 1.39𝑥2 – 2.19𝑥3 + 1.0986

Plugging the 𝑥𝑖 values into the discriminant function will give the answer 𝑔(𝑥) = −2.4849.
Since 𝑔(𝑥) = −2.4849 < 0, the feature vector 𝑥 = (0, 1, 1) belongs to class 2.

Higher – Dimensional Cases

The problem becomes more difficult when there are more than two potential classes to classify
the data into. The procedure presented above does not yield the correct answer since the
discriminant function’s likelihood is a ratio between two possible states only. However, a neat
trick would be, instead of determining which multiple classes {1, 2, … , 𝑛 } a feature vector
belongs, translate the problem to binary classification and determine the probability that it
belongs to a certain class 𝑖 or not.

This is accomplished by setting 𝑔1 (𝑥) = 𝑔𝑖 (𝑥) and 𝑔2 (𝑥) = 𝑔𝑛𝑜𝑡 𝑖 (𝑥). The probabilities for
𝑔2 (𝑥) can be obtained by summing all the probabilities for classes {1, … , 𝑖 − 1, 𝑖 + 1, … , 𝑛 }. If
𝑥 belongs to class 𝑖, then 𝑔𝑖 (𝑥) > 𝑔𝑛𝑜𝑡 𝑖 (𝑥); otherwise 𝑥 belongs to some other class.

Literature Survey
English Letter Classification Using Bayesian Decision Theory and Feature Extraction Using
Principal Component Analysis
(Husnain, & Naweed, 2009) utilized Bayesian decision theory to identify each of the large
number of black and white rectangular pixel displays as one of the 26 capital letters in the
english alphabet. The character images were based on 20 different fonts with each randomly
distorted to produce a file of 20,000 unique instances.

The image dataset used in the research was donated by David J. Slate and P.W. Frey in 1991 to
UCI data repository. Different distortion techniques such as compress, change aspect ratio along
with x and y axis to add bearable noise to the dataset. For each of the black and white image of
the english alphabet, 16-dimensional feature vector was extracted by the authors to demonstrate
the summary of the alphabet image.

The feature vector contains the characteristic features of the image such as vertical and
horizontal position of the rectangular box containing the alphabet, total number of ON pixels and
edge count. Each instance was converted into 16 primitive numerical attributes such as mean,
variance, moments, and covariance scaled to fit into a range of integer value from 0 to 15. The
detail of the 16 attributes are as follows:

Letter: capital letter 26 values from A to Z

X-box: horizontal position of box Integer
Y-box: vertical position of box Integer
Width: width of box Integer
Height: height of box Integer
Onpix: total # on pixels Integer
X-bar: mean x of on pixels in box Integer
Y-bar: mean y of on pixels in box Integer
X2bar: mean x variance Integer
Y2bar: mean y variance Integer
XYbar: mean x y correlation Integer
X2ybr: mean of x * x * y Integer
Xy2br: mean of x * y * y Integer
X-edge: mean edge count left to right Integer
Xegvy: correlation of x-edge with y Integer
Y-edge: mean edge count bottom to top Integer
Yegvx: correlation of y-edge with x Integer
𝐹𝑖𝑔𝑢𝑟𝑒 2 − 𝐿𝑒𝑡𝑡𝑒𝑟 𝑟𝑒𝑐𝑜𝑔𝑛𝑖𝑡𝑖𝑜𝑛 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠

On the first set of experiment, 14,000 items were used in the training data and the remaining
6,000 were used in the test set wherein one instance is selected randomly and plugged into the
classifier to check its corresponding alphabet class. This achieved 92% accuracy upon
introducing 100 random input instances where only 8 out of those were misclassified.

The training data was increased from 14,000 to 16,000 on the second set of experiment which
reduced the error rate to 2% wherein only 2 out of the 10 random character data were
misclassified. The results were far better than what (Frey, & Slate, 1991) achieved using holland-
style adaptive classifier in letter recognition which had only 80% accuracy.

The research also revealed that english alphabets ‘N’ and ‘H’ have almost the same shape and
same number of ON pixels resulting to similar posterior probability:

𝐹𝑖𝑔𝑢𝑟𝑒 3 − 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑒𝑛𝑔𝑙𝑖𝑠ℎ 𝑎𝑙𝑝ℎ𝑎𝑏𝑒𝑡 ′𝐻 ′ (𝑙𝑒𝑓𝑡) 𝑎𝑛𝑑 ′𝑁 ′ (𝑟𝑖𝑔ℎ𝑡)

The features were reduced from 16 to 8 using principal component analysis which is an
eigenvector/value-based approched used to reduce the dimensions of the features of a
multivariate data. It identifies patterns in data, and express it in such a way which highlights their
similarities and differences. The accuracy of the bayesian decision theory classifier was checked
again for 100 random inputs and resulted to 98% accuracy with 16,000 instances kept as the
training data. As shown in the screengraph below, the first 8 principal components were very
near to 90% of the variance with the remaining components have no significance in classifying
and their variance values diminished.
𝐹𝑖𝑔𝑢𝑟𝑒 4 − 𝑆𝑐𝑟𝑒𝑒𝑛𝑔𝑟𝑎𝑝ℎ 𝑜𝑓 𝑝𝑟𝑖𝑛𝑐𝑖𝑝𝑎𝑙 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 𝑎𝑛𝑑 𝑖𝑡𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

It can be concluded that both principal component analysis and Bayesian decision theory can
give efficient results on document analysis and was proven to be more effective than the holland-
style and statistical adaptive similarity classifiers on letter recognition tasks.

A Vision-based Method for Weeds Identification through the Bayesian Decision Theory
(Tellaechea, Burgos-Artizzu, Pajaresa, & Angela Ribeiro, 2007) developed an automatic
computer vision-based approach for the detection and diffential weed sparying in corn-crops.
Their strategy involved an image segmentation process which divides the incoming image in
cells and extracts features and attributes to be used in the decision-making procedure based on
the computation of a posterior probability under a Bayesian framewor wherein the prior
priobability is computed by the dynamic of the tractor where the method is embedded. The
decision to be made is determining if a cell is to be sprayed or not and requires the existence of a
database containing a set of samples classified as items to be sprayed or not which could be
offline or online.

The knowledge base is built during the offline stage wherein the decision during the online stage
will be based upon. The image segmentation process is identical for both the offline and online
stages.

The training process is done during the offline stage while new images are processed so that a
decision is made about them. This will then be stored in the knowledge based and estimated
during the offline stage.
𝐹𝑖𝑔𝑢𝑟𝑒 5 − 𝑉𝑖𝑠𝑖𝑜𝑛 − 𝑏𝑎𝑠𝑒𝑑 𝑠𝑒𝑔𝑚𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 𝑠𝑐ℎ𝑒𝑚𝑒 𝑎𝑛𝑑 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑝𝑟𝑜𝑐𝑒𝑠𝑠.

A set of 340 digital images acquired with a HPR817 digital camera during four different days in
May 2006 and April/May 2007 were used to assess the validity and performance of the proposed
approach. 18 video sequences acquired at a rate of 15 frames per second based on the tractor
motion were selected. 10 frames were extracted from each video sequence so the 𝑘𝑡ℎ frame in
the sequence 𝑖 is given by 𝑓𝑖𝑘 , where 𝑘 = 1, … , 10 and 𝑖 = 1, … , 18. So, given two consecutive
frames, 𝑓𝑖𝑘 and 𝑓𝑖𝑘+1 , these will differ in 3𝑢 image rows. Assuming the origin of the coordinates
is the bottom-left corner, the row number 1 in 𝑓𝑖𝑘+1 will match the row 3𝑢 in 𝑓𝑖𝑘 where 𝑢 is a
constant parameter which is set to 50.

The rows of cells fourth and fifth are expanded in the frame 𝑓𝑖𝑘+1 to the first, second, and third
rows of cells which implies that the final spraying decision should be made about the first,
second, and third rows of cells while the fourth and fifth rows are used for computing the prior
probability for the next frame. It should also be noted that the tractor speed is fixed at 4km/h
which implies that 12 m are covered in 11 seconds hence, the time elapsed between frames
𝑓𝑖𝑘+1 and 𝑓𝑖𝑘 is about 11 seconds.

The authors designed a test strategy which involved an initialization step labeled as STEP 0. This
step simulates the offline phase with 160 images and was estimated by cross-validation of 256
cells in the training set and 48 cells in the validation set both of which were randomly selected.
Five training processes were performed where each used a different set as validation and the
remaining cells as the training data. This guarantees that the number of training samples is
always greater or equal than 256. For each validation set, 𝑘 was varied and error was computed.
The errors were averaged for each set and for each 𝑘. The best performance for 𝑘 is obtained for
the minimum mean error which was obtained in 𝑘 = 0.3.

𝐹𝑖𝑔𝑢𝑟𝑒 6 − 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑚𝑒𝑎𝑛 𝑒𝑟𝑟𝑜𝑟 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑

For STEP 1 to 3, a decision is made for each frame 𝑘 for the six cells in its bottom part and each
cell was described by its area-vector of attributes 𝑥. After the decision, a set 𝑆𝑌𝑛 of cells
belonging to wn to be sprayed and a set of 𝑆𝑁𝑛 of cells belonging to wn that do not require
spraying are obtained. Prior probabilities are set to 0.5 otherwise the prior probabilities are the
posterior probabilities computed for the four preceding cells in the previous frame.

The knowledgebase is updated by adding both sets of cells (𝑆𝑌𝑛 and 𝑆𝑁𝑛 ) to the previous entries
classifying all cells belongting to 𝑤𝑦 and 𝑤𝑛 and stored in the knowledgebase to obtain a new
estimate of the class-conditional probability density functions.

The performance is established by comparing the criterion of farmers and technical consultants
against the results obtained for each test wherein the number of cells correctly identified to be
sprayed are signified as True Spraying (TS), the number of cells that do not require spraying
correctly detected as True No Spraying (TN), the number of cells that do not require spraying but
identified as cells to be sprayed as False Spraying (FS), and the number of cells requiring
spraying that they are identified by the method as cells that do not require spraying as False No
Spraying (FN).
𝐹𝑖𝑔𝑢𝑟𝑒 7 − 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑚𝑎𝑔𝑒𝑠, 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑐𝑒𝑙𝑙𝑠 𝑡𝑜 𝑏𝑒 𝑜𝑟 𝑛𝑜𝑡 𝑠𝑝𝑟𝑎𝑦𝑒𝑑 𝑎𝑐𝑐𝑜𝑟𝑑𝑖𝑛𝑔 𝑡𝑜 𝑡ℎ𝑒 𝐵𝑎𝑦𝑒𝑠𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠𝑓𝑖𝑒𝑟.

𝐹𝑖𝑔𝑢𝑟𝑒 8 − 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑎𝑛𝑑 𝑌𝑢𝑙𝑒 𝑠𝑐𝑜𝑟𝑒 𝑣𝑎𝑙𝑢𝑒𝑠 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑡𝑒𝑠𝑡𝑠 𝑎𝑛𝑑 𝑠𝑡𝑒𝑝𝑠.

(𝑇𝑆+𝑇𝑁)
The correct classification percentage is equated as follows: 𝐶𝐶𝑃 = (𝑇𝑆 + 𝐹𝑆 + 𝑇𝑁 +
𝑇𝑆
𝐹𝑁) while the Yule coefficient as: 𝑌𝑢𝑙𝑒 = | 𝑇𝑁 |. Figures 7 and 8 shows that the
(𝑇𝑆+𝐹𝑆)+( )−1
𝑇𝑁+𝐹𝑁
best performance was achieved by Test 3 in STEP 3 and that the worst performer was Test 1.
The best performance achieved in STEP 3 was due to the degree of learning performed as also
shown on the tables, the performance improves as the learning progresses.

Overall, the research was successful in developing an automated decision-making process for
detecting weeds in corn corps using bayesian decision theory. Although the robustness of the
proposed approach against illumination variability is still in question according to the paper, it
still achieves an important saving in cost and pollution.

A Bayesian Approach to Filtering Junk E-mail

(Sahami, Dumais, Heckerman, & Horvitz, 1998) employed Bayesian classification techniques to
the problem of junk e-mail filtering. Phrasal features such as the appearance of “FREE!”, “only
$”, and “be over 21” were included in the feature space as well as non-textual features such as
the domain type of the sender and recipient of the message. Other distinctions such as whether a
message has been attached, percentage of non-alphanumeric characters in the subject of the mail
message, or when a given message was received were also considered as powerful distinguishers
in the study.
Due to the large feature space, the authors employed feature selection fo reduce the dimension of
the feature space and attenuate the degree to which the independence assumption is violated by
the Naïve Bayesian classifier.
The first set of experiments dealt with the efficacy of the hand-crafted features using a corpus of
1789 actual e-mail messages out of which 1578 were pre-classified as “junk” and 211 messages
were pre-classified as “legitimate”. The data was split into a training set of 1538 messages and a
testing set of 251 messages. Word-based tokens in the subject and body of each e-mail was first
considered as the feature set which then was augmented to approximately 35 hand-crafted
phrasal features which was finally, further enhanced with 20 non-textual domain-specific
features for junk e-mail detection.

A theoretic notion of cost sensitive classification was considered as the cost for misclassifying a
legitimate e-mail as junk far outweighs the cost of marking a piece of junk as legitimate. With
this, a message is only classified as junk if the probability it would placed in the junk class is
greater than 99.9%.

𝐹𝑖𝑔𝑢𝑟𝑒 9 − 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑢𝑠𝑖𝑛𝑔 𝑣𝑎𝑟𝑖𝑜𝑢𝑠 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑠𝑒𝑡𝑠

Figure 9 shows the precision and recall for both junk and legitimate e-mail for each feature
regime. It shows that while the phrasal information improves the performance slightly, the
incorporation of little domain knowledge improves the resulting classifications.
𝐹𝑖𝑔𝑢𝑟𝑒 10 − 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑛𝑑 𝑅𝑒𝑐𝑎𝑙𝑙 𝑐𝑢𝑟𝑣𝑒𝑠 𝑓𝑜𝑟 𝑗𝑢𝑛𝑘 𝑒 − 𝑚𝑎𝑖𝑙 𝑢𝑠𝑖𝑛𝑔 𝑣𝑎𝑟𝑖𝑜𝑢𝑠 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑠𝑒𝑡𝑠.

Figure 10 focuses on the range from 0.85 to 1.0 to clearly show the greatest variation in the junk
mail precision/recall curves. It shows that the incorporation of additional features, especially
non-textual domain-specific information, gives consistently superior results to just considering
the words in the messages.

This research proved that it is possible to automatically learn effective filters to eliminate a large
portion of junk email from a user’s mail stream. The efficacy of these filters can also be
enhanced by a set of hand-crafted features which are specific for the task at hand. While the
bayesian framework that had been used in the research is successful, it exposed the need for
methods aimed at controlling the variance in parameter estimates for text categorization
problems hence the utlization of Support Vector Machines or SVMs in a decision theoretic
framework that incorporates asymmetric misclassification costs is a promising venue for further
research. Also, the utilization of other Bayesian classifiers which are less restrictive than Naïve
Bayes are seen to obtain better classification probability estimates and make more accurate costs
sesitive classifications.

Autoclass: A Bayesian Approach to Classification. Maximum Entropy and Bayesian Methods

(Cheeseman, Kelly, Self, Stutz, Taylor, & Freeman, 1988) developed a program which
automatically discovers classes from a database based on a Bayesian statistical technique which
determines the most probable number of classes, their probabilistic descriptions, and the
probability that each object is a member of each class. This has been testted on several large, real
databses, and has discovered previously unsuspected classes.
The authors assumed that the data are in an attribute-value vector form and are independent in
each class. It models real valued attributes with a Gaussian normal distribution, parameterized by
a mean and a standard deviation. It also employs conjugate priors which are prior information in
the same form as the data.

The Autoclass program developed breaks the classification problem into two parts: determining
the number of classes and determining the parameters defining them. It uses a Bayesian variant
of Dempster and Laird’s EM algorithm to find the best class parameters for a given number of
classes and differentiate the postetior distribution with respect to the class parameters and equate
with zero to derive the algorithm.

The developed program classified data supplied by researchers active in various domains and has
yielded new and intriguing results such as the discovery of three classes present in the Iris
database with high confidence despite that not all cases can be assigned to their classes with
certainty. It also found four known classes in Stepp’s soybean diesease database which exactly
matched Michalski’s CLUSTER/2 system.

Finally, Autoclass assayed the Infared Astronomical Satellite Databse which contains 5,425
cases and 94 attributes. It was also considered as the least throroughly understood by domain
experts. The program discovered classes which differed significantly from NASA’s previous
analysis but clearly reflect physical phenomena in the data.

Predicting Football Results using Bayesian Nets and Other Machine Learning Techniques
(Joseph, Fenton, & Neil, 2006) compared the performance of an expert-constructed Bayesian
nets with other machine learning techniques such as naïve BN, KNN and decision trees for
predicting the outcome of matches played by english football club, Tottenham Hotspur FC from
1995 to 1997. Their objective was to see how the expert-constructed Bayesian nets perform in
terms of predictive accuracy and explanatory clarity for the factors effecting the result of the
matches under investigation.

The expert-constructed BN uses features such as presence or absence of three key players
(Sherringham, Anderton, & Armstrong), if Wilson is playing in midfield or not, quality of the
opposing team measured on a simple 3-point scale (high, medium, & low), and if the game is
played at Spurs’ home ground or away.

Aside from these, additional factors such as the quality of the Spurs attacking force, the overall
quality of the Spurs team and how well the team will perform given their own quality and of
their opponents were related to the outcome of the game (win, lose, or draw) to simplify the
structure. All of which were measured as low, medium, or high.
𝐹𝑖𝑔𝑢𝑟𝑒 11 − 𝐸𝑥𝑝𝑒𝑟𝑡 − 𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑒𝑑 𝐵𝑎𝑦𝑒𝑠𝑖𝑎𝑛 𝑛𝑒𝑡𝑠 𝑓𝑜𝑟 𝑇𝑜𝑡𝑡𝑒𝑛ℎ𝑎𝑚 𝐻𝑜𝑡𝑠𝑝𝑢𝑟 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒.

All machine learning models were implemented using the MLC++ package apart from the
expert-constructed Bayesian nets which was part of the Hugin tool. The match data was divided
into disjoint subsets which was used for training and validation sets. The data for each season
was divided into three groups of ten matches and one group of eight matches organized
chronologically.
𝐹𝑖𝑔𝑢𝑟𝑒 12 − 𝐶𝑜𝑚𝑝𝑎𝑟𝑖𝑠𝑜𝑛 𝑜𝑓 𝑙𝑒𝑎𝑟𝑛𝑒𝑟 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑤𝑖𝑡ℎ 𝑒𝑥𝑝𝑒𝑟𝑡 𝑚𝑜𝑑𝑒𝑙 𝑑𝑎𝑡𝑎.

The expert-constructed Bayesian nets was the most accurate predictor of the outcome of the
Spurs games wuth a classification error over the disjoint training and test data sets of 40.79%. Its
poorest performance was on the 1995/1996 data. However, it is worth noting that with
classification errors of 50% and 40.74% for the 1995/1996 and 1996/1997 seasons respectively,
it was still the best classifier for the intra-season data. It also produced the best results among all
the classifiers for every one of the cross-season test period with the classification error averaged
at 33.62%.

Figure 12 shows the relative accuracy of the machine learning models implemented in this
research. KNN was the best performer when the same training and test data for the complete
seasons was used but it dropped significantly when disjoint training and test data sets were used
out of which the expert-constructed Bayesian nets outperformed all other learners.

This study reveals which of the selected attributes are the crucial factors affecting the outcome of
a football game, and the relationship between these factors. One of the limitations of all the non-
expert methods used is that they only use the supplied attributes which affects the learnt
Bayesian nets. The performance of the Bayesian network constructed was impressive given the
inherent analtsus bias against it. Although this study has now long been irrelevant since it
contains variables relating to key players who have retired or left the team already, its results
confirm the excellent potential of Bayesian networks when they are built by a reliable domain
expert.

The direction in which future work could be done to extend this study is the construction of a
more symmetrical model using similar data for all the teams in the league. However, this may
involve multiplying the amount of computational work by the number of additional teams in the
league. Another potential improvement is to qualify the inherent quality of each player who
plays and usage of abstract nodes like the quality of the attack and defence to improve the model
and ensure its longevity.

Conclusion
This paper has described the Bayesian approach to pattern recognition and exemplified five real
– world classification tasks. The Bayesian decision theory provides a simple and extensible
approach not only limited to classification but predictions and general mixture separation. The
theoretical basis behind this is free from any ad hoc quantities, and in any measures which alter
the data to suit the needs of the program. As a result, most of the Bayesian classification models
described in the paper lend itself easily to extension and further research.

References
Bow, S. (2002). Pattern recognition and image preprocessing. New York: Marcel Dekker.
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. Toronto: John Wiley &
Sons.

Husnain, M., & Naweed, S. (2009). English Letter Classification Using Bayesian Decision
Theory and Feature Extraction Using Principal Component Analysis. European Journal of
Scientific Research, 34, 2nd ser., 196-203.

Joseph, A., Fenton, N., & Neil, M. (2006). Predicting football results using Bayesian nets and
other machine learning techniques. Knowledge-Based Systems, 19(7), 544-553.
doi:10.1016/j.knosys.2006.04.011

Nadler, M. (1993). Pattern recognition engineering. New York: John Wiley & Sons.
Tellaeche, A., Burgos-Artizzu, X. P., Pajares, G., & Ribeiro, A. (2008). A vision-based method
for weeds identification through the Bayesian decision theory. Pattern Recognition, 41(2), 521-
530. doi:10.1016/j.patcog.2007.07.007

Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). A Bayesian Approach to
Filtering Junk E-mail.

Stutz, J., & Cheeseman, P. (1996). Autoclass — A Bayesian Approach to Classification.

Maximum Entropy and Bayesian Methods, 117-126. doi:10.1007/978-94-009-0107-0_13

Tellaeche, A., Burgos-Artizzu, X. P., Pajares, G., & Ribeiro, A. (2008). A vision-based method
for weeds identification through the Bayesian decision theory. Pattern Recognition, 41(2), 521-
530. doi:10.1016/j.patcog.2007.07.007

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (20)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Chapter 7: Supervised Hebbian Learning: Brandon Morgan 1/13/2021
100% (1)
Chapter 7: Supervised Hebbian Learning: Brandon Morgan 1/13/2021
2 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
100% (1)
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
35 pages
Exercises 1 Bayasian Decision Theory
No ratings yet
Exercises 1 Bayasian Decision Theory
13 pages
ML - Chapter 6 - Model Evaluation
No ratings yet
ML - Chapter 6 - Model Evaluation
65 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
Answers All 2007
0% (1)
Answers All 2007
64 pages
Single Layer Perceptron Classifier
No ratings yet
Single Layer Perceptron Classifier
62 pages
Perceptons Neural Networks
No ratings yet
Perceptons Neural Networks
33 pages
Unit Ii ML MCQ
No ratings yet
Unit Ii ML MCQ
9 pages
OS Lecture3 - Inter Process Communication
No ratings yet
OS Lecture3 - Inter Process Communication
43 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Sample Questions Pattern Recognition
No ratings yet
Sample Questions Pattern Recognition
8 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Acn Question Bank With Solution.
100% (1)
Acn Question Bank With Solution.
47 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
PDF Hands-on Time Series Analysis With Python: From Basics To Bleeding Edge Techniques B. V. Vishwas download
100% (1)
PDF Hands-on Time Series Analysis With Python: From Basics To Bleeding Edge Techniques B. V. Vishwas download
62 pages
Evolutionary Programming
No ratings yet
Evolutionary Programming
19 pages
Les 3 DWM
No ratings yet
Les 3 DWM
21 pages
IIT Madras Notes Machine Learning
No ratings yet
IIT Madras Notes Machine Learning
13 pages
AI-Lecture 12 - Simple Perceptron
100% (1)
AI-Lecture 12 - Simple Perceptron
24 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
No ratings yet
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
13 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
Bayesian Inference
No ratings yet
Bayesian Inference
5 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
No ratings yet
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
1 page
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Supervised and Deep Learning
No ratings yet
Supervised and Deep Learning
83 pages
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
No ratings yet
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
7 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
Distributed Databases: Solutions To Practice Exercises
No ratings yet
Distributed Databases: Solutions To Practice Exercises
4 pages
Back Propagation Network: Soft Computing
No ratings yet
Back Propagation Network: Soft Computing
33 pages
Digital Signal Processing Question Bank 01
No ratings yet
Digital Signal Processing Question Bank 01
37 pages
LSTM
No ratings yet
LSTM
42 pages
Matlab Calculator GUI (Assignment)
No ratings yet
Matlab Calculator GUI (Assignment)
12 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Independent Component Analysis: Bhagesh Bhutani (20) Chayan Sharma (21) Deepak
No ratings yet
Independent Component Analysis: Bhagesh Bhutani (20) Chayan Sharma (21) Deepak
15 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
ArabicOCR - Amazing OCR Library For Arabic PDF Documents - by Shekhar Khandelwal - Medium
No ratings yet
ArabicOCR - Amazing OCR Library For Arabic PDF Documents - by Shekhar Khandelwal - Medium
16 pages
Early Detection of Parkinson S Disease Using Machine Learning 11591
No ratings yet
Early Detection of Parkinson S Disease Using Machine Learning 11591
7 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
Naive Ba Yes
No ratings yet
Naive Ba Yes
5 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
Text
No ratings yet
Text
131 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
ANN - Ch2-Adaline and Madaline
100% (1)
ANN - Ch2-Adaline and Madaline
29 pages
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Motivation Letter
No ratings yet
Motivation Letter
1 page
The Theory of Automata in Natural Language Processing
No ratings yet
The Theory of Automata in Natural Language Processing
6 pages
Agile Devops Proof
No ratings yet
Agile Devops Proof
20 pages
JCL Slides Day34
No ratings yet
JCL Slides Day34
91 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
R22 B.sc Course Structure and Contents (1)
No ratings yet
R22 B.sc Course Structure and Contents (1)
130 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
49 pages
1-Exploring Random Variables
100% (2)
1-Exploring Random Variables
15 pages
Topic 2 Frequency Distribution and Data Presentation, Measures of Central Tendency and Dispersion
No ratings yet
Topic 2 Frequency Distribution and Data Presentation, Measures of Central Tendency and Dispersion
46 pages
IBM Netezza In-Database Analytics Reference Guide-3.0.1 PDF
No ratings yet
IBM Netezza In-Database Analytics Reference Guide-3.0.1 PDF
739 pages
Introduction To Normal Distribution: Nathaniel E. Helwig
0% (1)
Introduction To Normal Distribution: Nathaniel E. Helwig
56 pages
Segunda Asignación de Estadística Aplicada A La Ingeniería Química 2016 I
No ratings yet
Segunda Asignación de Estadística Aplicada A La Ingeniería Química 2016 I
5 pages
Uncertainty: Chapter 13, Sections 1-6
No ratings yet
Uncertainty: Chapter 13, Sections 1-6
31 pages
[FREE PDF sample] Variational Methods for Machine Learning with Applications to Deep Networks Lucas Pinheiro Cinelli Matheus Araújo Marins Eduardo Antônio Barros Da Silva Sérgio Lima Netto ebooks
100% (6)
[FREE PDF sample] Variational Methods for Machine Learning with Applications to Deep Networks Lucas Pinheiro Cinelli Matheus Araújo Marins Eduardo Antônio Barros Da Silva Sérgio Lima Netto ebooks
65 pages
Statistics All Grade 11
No ratings yet
Statistics All Grade 11
18 pages
Cano-Krawczyk2020 Article KappaUpdatedEnsembleForDriftin
No ratings yet
Cano-Krawczyk2020 Article KappaUpdatedEnsembleForDriftin
44 pages
Queueing Theory Notes
No ratings yet
Queueing Theory Notes
47 pages
Statistics and Probability: Quarter 3 - Module 6: Central Limit Theorem
No ratings yet
Statistics and Probability: Quarter 3 - Module 6: Central Limit Theorem
17 pages
Instant Download Feature engineering for machine learning principles and techniques for data scientists First Edition Casari PDF All Chapters
No ratings yet
Instant Download Feature engineering for machine learning principles and techniques for data scientists First Edition Casari PDF All Chapters
55 pages
Operation Research Book
100% (2)
Operation Research Book
225 pages
Lab Project 5: The Normal Approximation To Binomial Distribution
No ratings yet
Lab Project 5: The Normal Approximation To Binomial Distribution
4 pages
Probability Distribution
No ratings yet
Probability Distribution
27 pages
Math For Data Science
No ratings yet
Math For Data Science
538 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
6 pages
MA Economics CBCS 2023 24 With Objectives
No ratings yet
MA Economics CBCS 2023 24 With Objectives
34 pages
Identification and Lossy Reconstruction in Noisy Databases: Ertem Tuncel, Deniz G Und Uz
No ratings yet
Identification and Lossy Reconstruction in Noisy Databases: Ertem Tuncel, Deniz G Und Uz
5 pages
Mathematics: Unit Statistics 1B
No ratings yet
Mathematics: Unit Statistics 1B
20 pages
Battery Reliability and Safety
No ratings yet
Battery Reliability and Safety
9 pages
Add Math C5 Ex - Teach
No ratings yet
Add Math C5 Ex - Teach
21 pages
B.Tech FoT DU
No ratings yet
B.Tech FoT DU
33 pages
OR Queueing Theory 2024 To Send
No ratings yet
OR Queueing Theory 2024 To Send
9 pages

CS263 - Bayesian Decision Theory

Uploaded by

CS263 - Bayesian Decision Theory

Uploaded by

A Written Report About Bayesian Decision Theory in Pattern Recognition

Aaron Carl T. Fernandez

Bayesian Decision Theory

It is an expression of conditional probabilities which represent the chance of an event occuring

𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑥 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦

𝑝(𝑥) = ∑ 𝑝(𝑥|𝜔𝑗 )P(𝜔𝑗 )

Continuous and Discrete Bayes

The Risk Function

𝑅(𝑎𝑖 | 𝑥) = ∑ 𝜆(𝑎𝑖 | 𝜔𝑗 )P(𝜔𝑗 |x)

The Two-Category Case

𝐺(𝑥) = 𝑔1 (𝑥) - 𝑔2 (𝑥).

𝐹𝑖𝑔𝑢𝑟𝑒 1 – Observed data and its possible classes

0.8(1−0.2) 0.2(1−0.5) 0.5(1−0.9)

𝑔(𝑥) = 2.77𝑥1 - 1.39𝑥2 – 2.19𝑥3 + 1.0986

Higher – Dimensional Cases

Letter: capital letter 26 values from A to Z

𝐹𝑖𝑔𝑢𝑟𝑒 3 − 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑒𝑛𝑔𝑙𝑖𝑠ℎ 𝑎𝑙𝑝ℎ𝑎𝑏𝑒𝑡 ′𝐻 ′ (𝑙𝑒𝑓𝑡) 𝑎𝑛𝑑 ′𝑁 ′ (𝑟𝑖𝑔ℎ𝑡)

𝐹𝑖𝑔𝑢𝑟𝑒 6 − 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑚𝑒𝑎𝑛 𝑒𝑟𝑟𝑜𝑟 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑

A Bayesian Approach to Filtering Junk E-mail

𝐹𝑖𝑔𝑢𝑟𝑒 9 − 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑢𝑠𝑖𝑛𝑔 𝑣𝑎𝑟𝑖𝑜𝑢𝑠 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑠𝑒𝑡𝑠

Autoclass: A Bayesian Approach to Classification. Maximum Entropy and Bayesian Methods

Stutz, J., & Cheeseman, P. (1996). Autoclass — A Bayesian Approach to Classification.

You might also like