Data Science - UNIT-2 - Notes
Data Science - UNIT-2 - Notes
Probability theory
There are many sources of uncertainty in ai, including variance in the specific data values, the sample of data
collected from the domain, and in the imperfect nature of any models developed from such data.
• Uncertainty is the biggest source of difficulty for beginners in machine learning, especially developers.
• Noise in data, incomplete coverage of the domain, and imperfect models provide the three main sources
of uncertainty in machine learning.
• Probability provides the foundation and tools for quantifying, handling, and harnessing uncertainty in
applied machine learning.
Uncertainty means working with imperfect or incomplete information. Probability is a numerical description of
how likely an event is to occur or how likely it is that a proposition is true. Probability is a number between 0 and
1, where, roughly speaking, 0 indicates impossibility and 1 indicates certainty.
How to compute the probability?
Given: Statistical experiment has n equally-likely outcomes, r outcome is “success”
Find: Probability of successful outcome(S)
P( S) = Number of Successes ∕ Total Number of Outcomes = r/n Example:1
Given: 10 marbles: 2 red, 3 green, 5 blue.
Find: probability of selecting green? Solution: P(G) = 3/10= .30
A Random Variable is a set of possible values from a random experiment.
Example: Throw a die once
Random Variable X = "The score shown on the top face". X could be 1, 2, 3, 4, 5 or 6 So the Sample Space is {1,
2, 3, 4, 5, 6}
We can show the probability of any one value using this style: P(X = value) = probability of that value
X = {1, 2, 3, 4, 5, 6}
In this case they are all equally likely, so the probability of any one is 1/6
• P(X = 1) = 1/6
• P(X = 2) = 1/6
• P(X = 3) = 1/6
• P(X = 4) = 1/6
• P(X = 5) = 1/6
• P(X = 6) = 1/6
Atomic event: A complete specification of the state of the world about which the agent is uncertain. E.g., if the
world consists of only two Boolean variables Cavity and Toothache, then there are 4 distinct atomic events:
Cavity = false Toothache = false
Cavity = true Toothache = false
Cavity = true Toothache = false
Cavity = true Toothache = true
Atomic events are mutually exclusive and exhaustive
In case of atomic events that are mutually exclusive if some atomic event is true, then all other atomic events are
false. And in case of atomic events that are exhaustive there is always some atomic event true.
Joint Probability
It is the likelihood of more than one event occurring at the same time.
Two types of joint probability we can find:
1. Mutual exclusive events(Without common outcomes)
2. Non Mutual exclusive events (With common outcomes)
Mutual exclusive mean occurrence of events both A and B together is impossible i.e. P(A and B)=0 and A or B is
the sum of A and B i.e. P(A or B) =P(A) + P(B)
In case of Non Mutual exclusive events A or B is the sum of A and B minus A and B i.e.
P(A or B) =P(A) + P(B) – P(A and B)
The conditional probability of an event B in relationship to an event A is the probability that event B occurs given
that event A has already occurred. The notation for conditional probability is P(B|A), read as the probability of B
given A i.e. probability of B given that an event A is already occurred.
When two events, A and B, are dependent, the probability of both occurring is:
Problem 1: A math teacher gave her class two tests. 25% of the class passed both tests and 42% of the class passed
the first test. What percent of those who passed the first test also passed the second test?
Answer: P(Second | First) = P(First and Second)/P(First) = 0.25/0.42=0.60 = 60%
Problem 2: A jar contains black and white marbles. Two marbles are chosen without replacement. The probability
of selecting a black marble and then a white marble is 0.34, and the probability of selecting a black marble on the
first draw is 0.47. What is the probability of selecting a white marble on the second draw, given that the first
marble drawn was black?
Answer: P(White | Black) = P(Black and White)/P(Black) = 0.34/0.47=.72 = 72%
Note:
The following terminologies are also used when the Bayes theorem is applied:
Hypotheses: The events E1, E2,… En is called the hypotheses
Priori Probability: The probability P(Ei) is considered as the priori probability of hypothesis Ei
Posteriori Probability: The probability P(Ei|A) is considered as the posteriori probability of hypothesis Ei
P(A|B) = P(A∩B)/P(B)
Where P(A|B) is the probability of condition when event A is occurring while event B has already occurred.
P(A ∩ B) is the probability of event A and event B
P(B) is the probability of event B
Some illustrations will improve the understanding of the concept.
Example 1:A bag I contain 4 white and 6 black balls while another Bag II contains 4 white and 3 black balls. One
ball is drawn at random from one of the bags, and it is found to be black. Find the probability that it was drawn
from Bag I.
Solution:
Let E1 be the event of choosing the bag I, E2 the event of choosing the bag II, and A be the event of drawing a
black ball.
Then,P(E1) = P(E2) = 1/2
Also,P(A|E1) = P(drawing a black ball from Bag I) = 6/10 = 3/5
P(A|E2) = P(drawing a black ball from Bag II) = 3/7
By using Bayes’ theorem, the probability of drawing a black ball from bag I out of two bags,
P(E1|A) = P(E1)P(A|E1)P(E1)P(A│E1)+P(E2)P(A|E2)
=(1/2 × 3/5)/(1/2 × 3/5 + 1/2 × 3/7) = 7/12
Example 2: A man is known to speak truth 2 out of 3 times. He throws a die and reports that the
number obtained is a four. Find the probability that the number obtained is actually a four.
Solution:
Let A be the event that the man reports that number four is obtained.
Let E1 be the event that four is obtained and E2 be its complementary event.
Then, P(E1) = Probability that four occurs = 1/6
P(E2) = Probability that four does not occurs = 1 – P(E1) = 1 −1/6 = 5/6
Also, P(A|E1) = Probability that man reports four and it is actually a four = 2/3
P(A|E2) = Probability that man reports four and it is not a four = 1/3
By using Bayes’ theorem, probability that number obtained is actually a four,
P(E1|A) =P(E1)P(A|E1)P(E1)P(A│E1) + P(E2)P(A|E2) = (1/6 × 2/3)/(1/6 × 2/3 + 5/6 × 1/3) = 2/7
Practice Problems: Solve the following problems using Bayes Theorem.
1. A bag contains 5 red and 5 black balls. A ball is drawn at random, its color is noted, and again the ball is
returned to the bag. Also, 2 additional balls of the color drawn are put in the bag. After that, the ball is
drawn at random from the bag. What is the probability that the second ball drawn from the bag is red?
2. Of the students in the college, 60% of the students reside in the hostel and 40% of the students are day
scholars. Previous year result reports that 30% of all students who stay in the hostel scored A Grade and
20% of day scholars scored A grade. At the end of the year, one student is chosen at random and found
that he/she has an A grade. What is the probability that the student is a hostlier?
3. From the pack of 52 cards, one card is lost. From the remaining cards of a pack, two cards are drawn and
both are found to be the diamond cards. What is the probability that the lost card being a diamond?
Problem: A factory production line is manufacturing bolts using three machines, A, B and C. Of the
total output, machine A is responsible for 25%, machine B for 35% and machine C for the rest. It is
known from previous experience with the machines that 5% of the output from machine A is
defective, 4% from machine B and 2% from machine C. A bolt is chosen at random from the
production line and found to be defective. What is the probability that it came from (a) machine A (b)
machine B (c) machine C?
Problem: An engineering company advertises a job in three newspapers, A, B and C. It is known that
these papers attract undergraduate engineering readerships in the proportions 2:3:1. The probabilities
that an engineering undergraduate sees and replies to the job advertisement in these papers are 0.002,
0.001 and 0.005 respectively. Assume that the undergraduate sees only one job advertisement. (a) If the
engineering company receives only one reply to it advertisements, calculate the probability that the
applicant has seen the job advertised in place A. (i) A, (ii) B, (iii) C. (b) If the company receives two
replies, what is the probability that both applicants saw the job advertised in paper A?