Lecture - 4 - Start
Lecture - 4 - Start
Outline
1. Science, Method & Measurement
2. On Building An Index
3. Correlation & Causality
4. Probability & Statistics
5. Samples & Surveys
6. Experimental & Quasi-experimental Designs
7. Conceptual Models
8. Quantitative Models
9. Complexity & Chaos
10. Recapitulation - Envoi
Outline
1. Science, Method & Measurement
2. On Building An Index
3. Correlation & Causality
4. Probability & Statistics
5. Samples & Surveys
6. Experimental & Quasi-experimental Designs
7. Conceptual Models
8. Quantitative Models
9. Complexity & Chaos
10. Recapitulation - Envoi
Quantitative Techniques for Social Science Research
Lecture # 4:
Probability and Statistics
Ismail Serageldin
Alexandria
2012
On Probabilities
Recall
Random events
Random events/outcomes require a
probabilistic treatment
Social Science studies of events/outcomes
usually require a statistical probabilistic
treatment
Here multiple measurements and
probabilistic techniques are used
Probability became a science in the
17th century
A Genius: Blaise Pascal
(1623-1662)
= exp −
; ; = exp −
495
The Gaussian (normal)
distribution
• The cumulative distribution function for the
standard Gaussian distribution and the
Gaussian distribution with mean µ and
standard deviation σ is given by the
following formulas:
= dx
; ; = x; ; dx
496
Carl Friedrich Gauss
(1777-1855)
A parenthesis:
An example of the genius of
Gauss
1 + 2 + 3 + 4 + 5 + … + 100 = ?
5050
1 + 2 + 3 + 4 +… + 100
100 + 99 + 98 + 97 + … + 1
1 + 2 + 3 + 4 +… + 100
100 + 99 + 98 + 97 + … + 1
1 + 2 + 3 + 4 +… + 100
100 + 99 + 98 + 97 + … + 1
1 + 2 + 3 + 4 +… + 100
100 + 99 + 98 + 97 + … + 1
1 + 2 + 3 + 4 +… + 100
100 + 99 + 98 + 97 + … + 1
101 x 100 x ½ = 5050
1+ 2+ …+n = (1+n) x (n/2)
He was six years old!
Let’s look at an example…
Example:
Being struck by Lightening
US Data:
• USA Population:
– 1961 183.7 Million
– 1999 272.7 Million
– Average over the 38 year period: 228 Million
In USA or Egypt
Differs where
you are:
So let’s see…
Consider each “Non Match” an
independent Event
• For Event 1, the first person, there are
no previously analyzed people.
Therefore, the probability, P(NM1), that
person number 1 does not share
his/her birthday with previously
analyzed people is 1, or 100%.
• Ignoring leap years for this analysis,
the probability of 1 can also be written
as 365/365, for reasons that will
become clear below.
Continuing
• the probability, P(NM2), that Person 2
has a different birthday than Person 1
is 364/365.
• This is because, if Person 2 was born
on any of the other 364 days of the
year, Persons 1 and 2 will not share
the same birthday.
• P(NM3) = 363/365
• P(NM4) = 362/365 …. And so on…
Bringing this all together…
• P(23NM) = 343/365
• And these independent events all
together …having No Match in the 23
persons … is equal to:
• P(NM) = 365/365 × 364/365 × 363/365 ×
362/365 × ... × 343/365 = X
• P(NM) for 23 persons = 0.492703
• P(M) = 1- p(NM)= 1- 0.492703
• P(M) = 0.507297
So…
• The probability of having a match with
someone’s birthday in a group of :
q ; =1− n
q =1− n
542
Same birthday as you
• So: for the same birthday as you:
– For n = 23 gives about 6.1%, which is less than
1 chance in 16.
– You need at least 253 people in the room to
have a greater than 50% chance that one
person has the same birthday as you.
• Note that this 253 number is significantly
higher than 365/2 = 182.5. Why?
• The reason is that it is likely that there are
some birthday matches among the other
people in the room.
Same birthday as you
• So: for the same birthday as you:
– For n = 23 gives about 6.1%, which is less than
1 chance in 16.
– You need at least 253 people in the room to
have a greater than 50% chance that one
person has the same birthday as you.
• Note that this 253 number is significantly
higher than 365/2 = 182.5. Why?
• The reason is that it is likely that there are
some birthday matches among the other
people in the room.
Same birthday as you
• So: for the same birthday as you:
– For n = 23 gives about 6.1%, which is less than
1 chance in 16.
– You need at least 253 people in the room to
have a greater than 50% chance that one
person has the same birthday as you.
• Note that this 253 number is significantly
higher than 365/2 = 182.5. Why?
• The reason is that it is likely that there are
some birthday matches among the other
people in the room.
Probability is a science.
Its results can often be counter-intuitive.
FYI
• The probability of large number of
observations of independent events will
generally map out as a normal distribution
(the bell curve, the Gaussian distribution).
• It is a dimensionless quantity.
Standardizing, Normalizing
• The Standard Score is derived by
subtracting the population mean from an
individual raw score and then dividing the
difference by the population standard
deviation:
• where:
– µ is the mean of the population;
– σ is the standard deviation of the
population.
The quantity is in terms of the
standard deviation of the population
• The quantity z represents the distance
between the raw score and the population
mean in units of the standard deviation.