U3 Prob & Stat & Hypo
U3 Prob & Stat & Hypo
& Hypothesis
Understanding Data
• Big is a larger data whose volume is much larger than 'Small data'
and is characterized as follows:
• Volume
• Velocity
• Variety
• Veracity
• Validity
• Value
Types of Data
• Formatting:
• Converting categorical data
• Date and time handling
Data preparation
• Sampling:
• Random Sampling:
• Over-sampling and Under-sampling
• Bootstrapping:
• Decomposition:
• Principal Component Analysis (PCA)
• Singular Value Decomposition (SVD)
• Scaling:
• Normalization (Min-Max Scaling)
• Standardization (Z-score Scaling)
• Robust Scaling:
Data preprocessing
• In real world, the available data is dirty. 'dirty' means:
• Incomplete data
• Outlier data
• Data with inconsistent values
• Inaccurate data
• Data with missing values
• Duplicate data
Example
• The 'bad' or 'dirty' data can be observed in following patient table.
Example
• Consider the set: S = (12, 14, 19, 22, 24, 26, 28, 31, 32). Apply various
binning techniques and show the result.
Split data in training and testing
• Correlation
Example
• Find the covariance and correlation of data
X= (1, 2, 3, 4, 5) and Y= (1, 4, 9, 16, 25).
Multivariate data
• When the data involves three or more variables, it is categorized
under multivariate.
• Analysis techniques are regression analysis, path analysis, factor
analysis, cluster analysis and multivariate analysis of variance
(MANOVA).
Central tendency
• When n = 2
• Properties of CDF
Probability Density Function
• The plot of Gaussian PDF has even symmetry around mean value
Probability
• P (F) =?
Joint Probability of Two Variables
• Find the probability that employee should rank 1 officer and male.
• Occurrence of 2 or more events.
Conditional probability
P(B | A) ∗ P(A)
P(A | B) =
P(B)
• The above equation is called as Bayes Rule or Bayes Theorem.
• P(A|B) is called as posterior probability, which we need to calculate. It is
defined as updated probability after considering the evidence.
• P(B|A) is called the likelihood. It is the probability of evidence when hypothesis
is true.
• P(A) is called the prior probability, probability of hypothesis before considering
the evidence
• P(B) is called marginal probability. It is defined as the total probability of the
evidence under all hypotheses
• Hence, Bayes Theorem can be written as:
posterior = likelihood * prior / evidence
Example
• Diagnose whether someone has a certain disease, if following
information is given :
• Prior Probability: Based on the general population, the prior
probability of someone having this disease is 1% (i.e., 1 out of 100
people has it).
• Evidence: Conduct a medical test, which is not perfect. The test has:
• A True Positive Rate of 90%, meaning if the person has the disease, the test
will correctly identify it 90% of the time.
• A False Positive Rate of 10%, meaning if the person does not have the disease,
the test will incorrectly say they have it 10% of the time.
Find out the updated probability (posterior probability) that the
person has the disease given that the test result is positive.
Bayes’ Theorem Examples
• A man is known to speak the lies 1 out of 4 times. He throws a die and
reports that it is a six. Find the probability that is actually a six?
• Components:
• Nodes: Represent random variables (e.g., weather, disease, sensor readings).
• Edges: Represent dependencies or conditional relationships between
variables.
• Conditional Probability Tables (CPTs): Define the probability of a node
given its parent nodes.
Applications of Bayesian Networks
• Medical Diagnosis
• Decision Support Systems
• Risk Assessment
• Machine Learning
• Natural Language Processing
Advantages of Bayesian Networks
• Modeling Uncertainty
• Visual Representation
• Learning from Data
Example
Example
Bayesian Networks Example
• Example: Harry installed a new burglary alarm at his home to detect
burglary. The alarm reliably responds at detecting a burglary but also
responds for minor earthquakes. Harry has two neighbors David and
Sophia, who have taken a responsibility to inform Harry at work when
they hear the alarm. David always calls Harry when he hears the
alarm, but sometimes he got confused with the phone ringing and calls
at that time too. On the other hand, Sophia likes to listen to high
music, so sometimes she misses to hear the alarm. Here we would like
to compute the probability of Burglary Alarm.
• Calculate the probability that alarm has sounded, but there is
neither a burglary, nor an earthquake occurred, and David and
Sophia both called the Harry.
Bayesian Networks
Hypothesis
• A hypothesis is an assumption or prediction based on some evidence
that can be tested .
• It is specifically used in Supervised Machine learning, where an ML
model learns a function that best maps the input to corresponding
outputs with the help of an available dataset.
• Types Hypothesis
• Null Hypothesis (H0)
• Alternative Hypothesis (H1)
Hypothesis testing
• Here, given mean(A) and mean(B) are for two different samples.
• N1 and N2 are sample sizes of two groups A and B.
• s² is the variance of the two samples and the degree of freedom is Here, given
N1+N2-2.
• Then, t-statistics compared with the t-critical value.
Chi- Square test
• Chi-Square test is a non-parametric test.
• It measures the statistical significance between observed frequency and expected
frequency, and each observation is independent of each other and follows normal
distribution.
• This comparison is used to calculate the value of the Chi-Square statistic as:
• This approach starts with a general idea and then narrows down to
specific instances or observations.
• It is often associated with deductive reasoning, where you begin with
a general principle and apply it to a specific case.
• Example:
• General: "All birds have wings."
• Specific: "This animal is a bird, so it must have wings."
Specific to General
• This approach starts with specific observations or data and then make
broader generalization.
• It's associated with inductive reasoning, where conclusions are drawn
based on the analysis of specific instances or patterns.
• Example:
• Specific: "I have seen five different swans, and all of them are white."
• General: "All swans must be white."
Hypothesis Space Search by Find-S Algorithm
• The Find-S algorithm is a simple machine learning algorithm used for
concept learning.
• Find-S algorithm is initially starts with the most specific hypothesis.
• This algorithm considers only the positive instances and eliminates
negative instances while generating the hypothesis.
Bias and Variance