0% found this document useful (0 votes)
14 views

Stats 1 - IITM BS Notes - Part 5

Continuation (Part 5) of notes on Statistics 1 for Data Science by IIT Madras

Uploaded by

ryandonovan.des
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
14 views

Stats 1 - IITM BS Notes - Part 5

Continuation (Part 5) of notes on Statistics 1 for Data Science by IIT Madras

Uploaded by

ryandonovan.des
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 17
n independent trials, X = number of successes > Let there be m independent Bernoulli trials. © X O12 - --7 > Let p is probability of success. word F > X= number of succeses in 1 independent trials > The probabilities of outcomes of the independent trials are ‘SNo | Outcome | Number | Probabilities of sue- a pxPeeP noi | pxpx...* (I~) n=1 | p..xpx(l-p)xP 1 (=p) x (i—p)... x px (Mp) 1 (= p)x(=9)... ep 0 (l= p)x (l=) Visualising Binomial Distribution For a small n, distribution is skewed except for when the probability is 0.6. 1 | a al «a oa For larger n values, the distribution begins to get symmetric 02 Paw Rrmts| a on Hi oa | ff bel on | ae Expectation and Variance of a Binomial Random Variable Expectation and Variance of Binomial Random Variable Pm X=M+ t+... tM > E(X) = E(X1 + Xo t+ -.. + Xp) = E(X) + E(%) +--+ E(%) > E(X)=ptp+...+p=np V(X) = V(X + Xo + + Xn) = VOX) + VOX) + » V(X) = (1 — p) + p(1— p) +... + p(1—p) Result Using the fact that the expectation of the sum of random variables is ‘equal to the sum of their expectations, we see that Expectation of a Binomial random variable X is +V(X,) p(1 — p) E(X) = np Also, since the variance of the sum of independent random variables is ‘equal to the sum of their variances, we have variance of 2 Binomial random variable is Sn Hypergeometric Distribution Imagine a bag of black and white balls, Picking two balls will be dependent. For the hypergeometric to work, > The population must be dividable into two and only two independent subsets (black balls and white balls in our example) > The experiment must have changing probabilities of success with each experiment (the fact that balls are not replaced after the draw in our example makes this true in this case) > Another way to say this is that you sample without replacement and therefore each draw is not independent. Understanding the Hypergeometric distribution If we randomly select n items without replacement from a set of NV items of which: m of the items are of one type and NM ~ m of the items are of a second type " B)(B)...18](w)iw) ....w)~ ~(B)...[e) (wy ...[w Pe Nom ¥ mx items of type 1, then the probability mass Let X be the number of function of the discrete random variable, X, is called the hypergeometric distribution and is of the form: ‘m) (Nm n-x ‘When you draw this out: X=2 wt be #~swast” ¢ NS Example: Sampling from a deck of cards > Take a deck of 52 cards. Draw five cards from the deck. Let the random variable X denote the number of aces in the random sample of five cards. What is the probability distribution of X? > This is a Hypergeometric distribution with N=52,.n=5,m=4 > The pmf is given by @ ran ldlo=s) Finally: If we randomly select 1 items without replacement from a set of V items of which: m of the items are of one type and N — m of the items are of a second type The probability mass function of the discrete random variable is called the hypergeometric distribution and is of the form: pixana (has) () ix Sx X ~ Hypergeometric(N, m,n) nm > EX) = FT » Var(X) =n > ¥~ Bin( 2 NomN=n ROW NAT > EY) > Var(Y) = nem 4) am N > Consider the term: N=" > For n= 1, replacement has no effect both are Bernoulli trial > For n=, the whole population is sampled- hence variance is zero. > If the population IV is very large compared to the sample size e. N >> n) then Hypergeometric(N, m, n) is about PL | Comparison of Binomial and Hypergeometric distributions To see how this comparison behaves, see this video from 24:00 onwards. Poisson distribution - Distribution of Poisson random variable This gives the probability of a number of events occurring in a given period of time. > The Poisson probability distribution gives the probability of a number of events occurring in a fixed interval of time or space. > We assume that these events happen with a known average rate, \, and independently of the time since the last event. > Let X denote the number of times an event occurs in an interval of time (or space). > We say X ~ Poisson(), in other words, X is a random variable that follows Poisson distribution with parameter 2. > The Poisson distribution may be used to approximate the Binomial distribution if the probability of success is “small” and the number of trials is “larg Poisson as Binomial approximation > Define “succes as exactly one event happening in a short ath dt The m events happening in Taterval of length r can be viewed 8 n successes happening in n intervals of length dt, with each a3 pba > Hence the problem canbe viewed as 9 Bin (n/p = experiment Try to understand why the red area is defined as it is. Probability mass function of Poisson The distribution, with an average number of J events per interval, is defined as Poisson discrete random variable, X ~ Poisson(), with the p.m given by > The random varaible X represents the number of events per time interval (In the example: number of vehicles passing per ‘minute) > eis the mathematical constant 2.718 To see how the probability mass function is derived, see this video between 14:00 and 18:00. As 2 increases, the distribution looks more symmetric as seen below: Graph of pmf for 7 Ty Graph of pmf for \= 8 Li os 1 wal ep + 03 lr 02 1 a 1 5 al I wor aa AML ak set Expectation - Poisson Distribution Expectation of Poisson distribution Variance - Poisson Distribution > Now, E(X?) = E(X(X ~1)) + E(X) =? +A > Hence Var(X) = E(X?) — (E(X))? = 9? +=? Note that both the expectation and variance of a Poisson distribution are 2 (which is the average rate). Examples of Poisson distribution > Events occurring in fixed interval of time 1. Number of vehicles passing through a traffic intersection in a fixed time interval of one minute. 2. Number of people withdrawing money from a bank in a fixed time interval of fifteen minutes. 3. Number of telephone calls received per minute at a call center > Events occurring in fixed interval of space 1. ‘Number of typos (incorrect spelling) in a book. 2, Number of defects in a wire cable of finite length. 3, Number of defects per meter in a roll of cloth Modeling number of killings The number of dogs that are killed on a particular stretch of road in Chennai in any one day can be modeled by a Poisson(.42) random variable. 1, Calculate the probability that exactly two dogs are killed on a given day on this stretch of road. Let X= number of dogs killed in one day X ~ Poisson(\ = 0.42) 84? x 0.427 P(X =2)= = 0.058 2. Find the probability that exactly two dogs are killed over a 5-day period on this stretch of road. Let X= number of dogs killed in five day X ~ Poisson( = 2.1) eee P(X = 2) = SABE = 0.2% Week 12 Continuous Random Variable > A continuous random variable is one that has possible values that form an interval along the real number line. In other words, a continuous random variable can assume any value ‘over an interval or intervals. Probability density function (pdf) While discrete random variables have probability mass functions, continuous random variables have probability density functions, > Every continuous random variable X has a curve associated with it > The probability distribution curve of @ continuous random variable is also called its probability density function. It is denoted by F(x) Area under a pdf > Consider any two points a and b, where a is less than b. > The probability that X assumes a value that lies between a and b is equal to the area under the curve between a and b. ‘That is, P(X € [a,b]) = Pla < X < b) is area under curve between a and b Plasx< w= [res Properties of pdf 1, The area under the probability distribution curve of a continuous random variable between any two points is between 0 and 1. # sultan Few! between Oand 1 2 \ =P(x-a)20 a 4 Properties of pdf 2, Total area under the probability distribution curve of a ‘continuous random variable is always 1. fe) ‘area under curve is 1 fae 1 R Zhe ma) te Cumulative distribution function Cumulative distribution function For a continuous random variable X F(a) = P(X Expected value: E(X) = fx F(x)dx. > Variance: Var(X) = f(x — E(X))? F(x)dx Uniform distribution Uniform distribution U(a, b) > A continuous random variable has a[uniform distribution) denoted X ~ U(a, ), Uniform distribution U(a, b) > A continuous random variable has a uniform distribution, denoted X ~ U(a,6), probability density function is: = 7 &® { 0 otherwise Standard uniform distribution When the uniform distribution is across the interval of 0 to 1, it is a standard uniform distribution. a Standard uniform distribution 4 > A random variable has the standard uniform distribution with minimum 0 and maximum 1 if its probability density function is given by 1 0 Verify F(x) is a pdf > F(x) 20, ford LS Flddx = Ip faye = 1 Graph of pdf of a Standard uniform distribution U(0, 1) Cumulative distribution of Uniform distribution For X ~ U(a,b) 0 forxb To see how you get the value of f(x) when x lies between a and b, see below: Cumulative distribution of Uniform distribution cat. Faye P(Xee) For X ~ U(a,b) It is important to note here that the slope of the line between a and b is 1/b-a. Cumulative distribution of Uniform distribution a: 2 be 4 1 og | F(X) 06 04 02 > When x < a F(x) =0. Expectation - Continuous Random variable Expectation of X ~ U(a, b) > X ~ Ula,b); atb 2 E(X) = E(X) =f? xf(x). dx Variance - Continuous Random Variable Variance of X ~ U(a, b) > X~ Ula. b); _ (b~aP Var(X) = “T > E(x?) = SP #(x), dx = fox? idx = = Bisttab = 3 > Var(X) = E(X2) - EX)? = Pigs _ (og)? = (6-2) =e Non Uniform Distribution Suppose that the number of minutes of playing time of a certain college basketball player in a randomly chosen game has the following density curve. F(x) Jou-(2-Or8, een piebas 3? ony 228 ov Triangular Distribution Triangular distribution It is now 2 p.m., and Joan is planning on studying for her statistics test until 6 p.m., when she will have to go out to dinner. However, she knows that she will probably have interruptions and thinks that the amount of time she will actually spend studying in the next 4 hours is a random variable whose probability density curve is as follows: ie Tae ek ald F(x) Exponential Distribution x

You might also like