0% found this document useful (0 votes)
66 views

Chapter1 Statistics

This document provides an introduction to statistics, including key concepts and terminology. It discusses: 1. Descriptive statistics, which involves collecting and presenting data through tables and graphs like histograms, bar graphs, and box plots. 2. Inferential statistics, which allows drawing conclusions about a population based on a sample. It involves estimation and testing hypotheses. 3. Key terms like population, sample, variable, parameter, and qualitative vs. quantitative data. It also discusses different scales of measurement for qualitative data. 4. Measures of central tendency like the mean and median, and how skewness can impact these values. 5. Measures of spread, specifically the standard deviation as a measure

Uploaded by

arokia samy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Chapter1 Statistics

This document provides an introduction to statistics, including key concepts and terminology. It discusses: 1. Descriptive statistics, which involves collecting and presenting data through tables and graphs like histograms, bar graphs, and box plots. 2. Inferential statistics, which allows drawing conclusions about a population based on a sample. It involves estimation and testing hypotheses. 3. Key terms like population, sample, variable, parameter, and qualitative vs. quantitative data. It also discusses different scales of measurement for qualitative data. 4. Measures of central tendency like the mean and median, and how skewness can impact these values. 5. Measures of spread, specifically the standard deviation as a measure

Uploaded by

arokia samy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

CHAPTER 1

STATISTICS

1.1 INTRODUCTION TO STATISTICS


Statistics is a mathematical science deals with the collection, organizing, presentation
of data, interpretation and analyzing data in such a way that meaningful conclusions
can be drawn from them. In general, its investigations and analyses fall into two broad
categories called descriptive and inferential statistics.

1.1.1 Descriptive Statistics


Once you have collected data, what will you do with it? Data can be described and
presented in many different formats such using table or graph. This area of statistics is
called “Descriptive Statistics.”
A statistical graph is a tool that helps you learn about the shape or distribution of a
sample or a population. A graph can be a more effective way of presenting data than a
mass of numbers because we can see where data clusters and where there are only a
few data values. Newspapers and the Internet use graphs to show trends and to enable
readers to compare facts and figures quickly. Statisticians often graph data first to get
a picture of the data. Some of the types of graphs that are used to summarize and
organize data are the dot plot, the bar graph, the histogram, the stem-and-leaf plot, the
frequency polygon (a type of broken line graph), the pie chart, and the box plot.

Figure 1: The types of graphs

1
Statistics and Probability

1.1.2 Inferential Statistics


Inferential statistics is the process of drawing conclusions about a population based
on certain statistics calculated from a sample of data drawn from that population.
Statistical inference is important in order to analyze data properly. Indeed , proper
data analysis is necessary to interpret research results and to draw appropriate
conclusion. In this courses, two basic statistics concepts are presented: estimation, and
statistical hypothesis are applied to comparison of means, varianances and
proportions.

1.1.3 Terms and Definations


Population is all individuals, objects, or measurements whose properties are being
studied.
Sample is a subset of the population studied.
Statistic is a numerical characteristic of the sample; a statistic estimates the
corresponding population parameter.
Variable is a characteristic of interest for each person or object in a population
Numerical Variable is variables that take on values that are indicated by numbers.
Parameter is a number that is used to represent a population characteristic and that
generally cannot be determined easily.
Proportion is the number of successes divided by the total number in the sample.
Random Variable Notation using upper case letters such as X denote a random
variable. Lower case letters like x denote the value of a random variable.

1.1.4 Type of Data


Most data can be put into two groups: qualitative or quantitative. Quantitative data
can be separated into two subgroups: discrete and continuous. Quantitative data is
discrete if the corresponding data values take the result of counting (such as the
number of students, cars, books, and children). Data is continuous if it is the result of
measuring (such as distance, weight, speed, time pressure).

Qualitative data is an attribute whose value is indicated by a label. This data can be
classified into four levels of measurement, nominal, ordinal, interval and ratio.
Nominal scale data are not ordered and cannot be used in calculations. Nominal
data such as colors, names, labels and favorite foods along with yes or no responses.

2
CHAPTER 1: INTRODUCTION TO STATISTICS

Example: Smartphone companies are have Sony, Oppo, Asus, Samsung and Apple.
This is just a list and there is no agreed upon order. Some people may favor Apple but
that is a matter of opinion.
The ordinal scale data can be ordered but cannot be used in calculations. Example:
A cruise survey where the responses to questions about the cruise are “excellent,”
“good,” “satisfactory,” and “unsatisfactory.” These responses are ordered from the
most desired response to the least desired.
The inteval data is the data that is measured using the interval scale. Interval level
data can be used in calculations, but comparison cannot be done. For example:
Highest daily temperature in Malaysia is between 30 to 40° C. 80° C is not four times
as hot as 20° C.
The ratio data is the data that is measured using the ratio scale. Ratio data can be
calculated but you will not have a negative value. For example, four multiple choice
statistics final exam scores are 80, 68, 20 and 92 (out of a possible 100 points).
Quantitative Data Qualitative Data
Quantitative data are the result Qualitative data are the result
Definition of counting or measuring attributes of of categorizing or describing attributes of a
a population. population.
Data that
Qualitative data are generally described by
you will Quantitative data are always numbers.
words or letters.
see
Hair color
Amount of money you have
Blood type
Height,Weight
Examples Ethnic group
Number of people living in your town
The car a person drives
Number of students who take statistics
The street a person lives on

1.1.5 Measures of the Center of the Data


The “center” of a data set is also a way of describing location. The two most widely
used measures of the “center” of the data are the mean (average) and the median.
The mean is simply a numerical average. Suppose that the observations in a
sample are 𝑥1 , 𝑥2 , … , 𝑥𝑛 . The sample mean denoted by 𝑥̅ is

𝑛
1 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
𝑥̅ = ∑ 𝑥𝑖 =
𝑛 𝑛
𝑖=1

There are other measures of central tendency is the sample median. Given that the

3
Statistics and Probability

observations in a sample are 𝑥1 , 𝑥2 , … , 𝑥𝑛 , arranged in increasing order of magnitude,


the sample median is
𝑥𝑛+1 , if 𝑛 is odd
2
𝑥̃ = {1
(𝑥𝑛 + 𝑥𝑛+1 ) , if 𝑛 is even
2 2 2

Example
Suppose the data set is the following: 1.7, 2.2, 3.9, 3.11 and 14.7. Find the sample
mean and median.

Solution
𝑛
1 1.7 + 2.2 + 3.9 + 3.11 + 14.7
𝑥̅ = ∑ 𝑥𝑖 = = 5.12
𝑛 5
𝑖=1
𝑥̅ = 5.12 and 𝑥̃ = 3.9

The mean is influenced considerably by the presence of the extreme observation,


whereas the median places emphasis on the true ‘centre’ of the data set.
The median is generally a better measure of the center when there are extreme
values or outliers because it is not affected by the precise numerical values of the
outliers. The mean is the most common measure of the center.

Skewness and the Mean, Median, and Mode


The distribution is symmetrical if the shape to the left and the right of the vertical line
are mirror images of each other. In a perfectly symmetrical distribution, the mean
and the median are the same (Figure 2a). The distribution of data is skewed to the
left (negative skew) if the mean is less than the median, which is often less than the
mode. (mean < median < mode) (Figure 2b). The distribution of data is skewed to the
right (positive skew) if the mean is the largest, while the mode is the smallest
(mean > median > mode) Figure 2c.

Figure 2(a) Figure 2(b) Figure 2(c)

Figure 2 : Skewned Distribution of the Data

4
CHAPTER 1: INTRODUCTION TO STATISTICS

1.1.6 Measures of the Spread of Data


An important characteristic of any set of data is the variation in the data. In some data
sets, the data values are concentrated closely near the mean; in other data sets, the
data values are more widely spread out from the mean. The most common measure of
variation, or spread, is the standard deviation. The standard deviation is a number
that measures how far data values are from their mean.
The symbol 𝜎 2 represents the population variance; the population standard
deviation σ is the square root of the population variance. Suppose that the
observations in a population are 𝑥1 , 𝑥2 , … , 𝑥𝑁 . The population variances and standard
deviation are
1 1
𝜎 2 = 𝑁 ∑𝑁 2 𝑁
𝑖=1(𝑥𝑖 − 𝜇) and 𝜎 = √𝑁 ∑𝑖=1(𝑥𝑖 − 𝜇)
2

1
where population mean, 𝜇 = 𝑁 ∑𝑁
𝑖=1 𝑥𝑖 .
2
The symbol s represents the sample variance; the sample standard deviation s is the
square root of the sample variance. The sample variance and sample standard
deviation are
1 1
𝑠 2 = 𝑛−1 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 and 𝑠 = √𝑛−1 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2

Example
Suppose the data set is the following: 10, 15, 20, 25 and 30. Find the sample mean
and standard deviation.

Solution
1 10+15+20+25+30
Sample Mean 𝑥̅ = ∑𝑛𝑖=1 𝑥𝑖 = = 20
𝑛 5
1
Sample Variance 𝑠 2 = 𝑛−1 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
1
= 5−1 ((10 − 20)2 + (15 − 20)2 + ⋯ + (30 − 20)2 ) = 62.5

Sample standard deviation 𝑠 = √𝑠 2 = √62.5 = 7.9057

1
Population variance 𝜇 = 𝑁 ∑𝑁
𝑖=1 𝑥𝑖 = 20

1
Population Variance 𝜎 2 = 𝑁 ∑𝑁 2
𝑖=1(𝑥𝑖 − 𝜇) = 50

Population standard deviation 𝜎 = √𝜎 2 = √50 = 7.07107

5
Statistics and Probability

Calculation of Mean and Standard Deviation Using Calculator

FX570MS

FX570ES PLUS

6
CHAPTER 1: INTRODUCTION TO STATISTICS

The standard deviation provides a measure of the overall variation in a data set.
The standard deviation is always positive or zero.
 The standard deviation is small when the data are all concentrated close to the
mean, exhibiting little variation or spread.
 The standard deviation is larger when the data values are more spread out
from the mean, exhibiting more variation.

1.2 INTRODUCTION TO PROBABILITY


Probability is used to quantify the chance that an outcome of a random experiment
will occur or degree of belief that outcome will occur. For example, if you toss a coin,
will you obtain a head or tail? If you roll a die will obtain 1, 2, 3, 4, 5 or 6?
The value of a probability is a number between 0 and 1 inclusive. An event that
cannot occur has a probability (of happening) equal to 0 and the probability of an
event that is certain to occur has a probability equal to 1.
In order to quantify probabilities, we need to define the sample space of an
experiment and the events that may be associated with that experiment. The sample
space (S) is the set of all possible outcomes in an experiment. An event (E) is some
specific outcome of an experiment or an event is a subset of the sample space.While
the probability is defined as
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝐸 𝑛(𝐸)
𝑃(𝐸 ) = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒 = 𝑛(𝑆)

Example 1
A die is rolled, find the probability of getting (a) a 3, (b) an even number and (c) at
least five.

Solution
(a) The event of interest is "getting a 3". so E = {3}.
The sample space S is given by S = {1,2,3,4,5,6}.
Hence
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝐸 𝑛(𝐸) 1
𝑃(𝐸 ) = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒 = =6
𝑛(𝑆)

(b) The event of interest is an “even number”so E = {2,4,6}.

7
Statistics and Probability

Hence
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝐸 𝑛(𝐸) 3 1
𝑃(𝐸 ) = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒 = =6=2
𝑛(𝑆)

(c) The event of interest is at “least five”so E = {5,6}.


𝑛(𝐸) 2 1
𝑃 (𝐸 ) = =6=3
𝑛(𝑆)

1.6.1 Terms and Definations


i. “OR” Event, 𝑨 ∪ 𝑩 - An outcome is in the event A OR B = 𝑨 ∪ 𝑩 if the outcome
is in A or is in B or is in both A and B.
ii “AND” Event, 𝑨 ∩ 𝑩 - An outcome is in the event A AND B= 𝑨 ∩ 𝑩 if the
outcome is in both A and B at the same time.
iii. Complimentary Event, A′ - The complement of event A is denoted A′.
A′ consists of all outcomes that are NOT in A.
iv. Mutually Exclusive Events - A and B are mutually exclusive events if they
cannot occur at the same time. This means that A and B do not share any outcomes
and P(A ∩ B) = 0.
v. The Addition Rule - If A and B are defined on a sample space, then:
P(A ∪ B) = P(A) + P(B) – P(A ∩ B).
If A and B are mutually exclusive, then P(A ∩ B) = 0. Then
P(A ∪ B) = P(A) + P(B) – P(A ∩ B) becomes
P(A ∪ B) = P(A) + P(B).
vi. Conditional Probability of an Event
The conditional probability of A given B is written P(A|B).
P(A|B) is the probability that event A will occur given that the event B has already
occurred. The formula to calculate P(A|B) is
𝑃 (𝐴 ∩ 𝐵 )
𝑃(𝐴/𝐵) =
𝑃 (𝐵 )
vii. Independent Events
Two events A and B are independent if one occurred does not affect the chance
the other occurs. Two events are independent if the following are true:
 P(A|B) = P(A)

 P(B|A) = P(B)

8
CHAPTER 1: INTRODUCTION TO STATISTICS

 P(A ∩ B) = P(A)P(B)

Example 2
Suppose the sample space S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Let A = {1, 2, 3, 4, 5}, B =
{4, 5, 6, 7, 8}, and C = {7, 9}. Determine wheather (a) A and B are mutually
exclusive, (b) A and C are mutually exclusive, (c) Find P(A ∪ B) and P(A ∪ C).

Solution
2
(a) A AND B = {4, 5}. P(A ∩ B)=10 and is not equal to zero.

Therefore, A and B are not mutually exclusive.

(b) A AND C = { } =do not have any numbers so P(A ∩ C) = 0.


Therefore, A and C are mutually exclusive.

5 5 2 8 4
(c) P(A ∪ B) = P(A) + P(B) – P(A ∩ B) = 10 + 10 − 10 = 10 = 5
5 2 0 7
P(A ∪ C) = P(A) + P(C) – P(A ∩ C) = 10 + 10 − 10 = 10

Example 3
Suppose we toss one fair, six-sided die. The sample space S = {1, 2, 3, 4, 5, 6}. Let
event A = face is 2 or 3 and B = event that face is even. Find (a) 𝑃(𝐴/𝐵) and (b)
𝑃(𝐴′/𝐵).

Solution
(a) Event A = {2, 3}, Event B ={2, 4, 6} and A ∩ B = {2}.
1
1 3 𝑃(𝐴∩𝐵) ( ) 1
Hence 𝑃(𝐴 ∩ 𝐵) = 6 , 𝑃(𝐵) = 6 and 𝑃(𝐴/𝐵) = = 6
1 =3
𝑃(𝐵) ( )
2

(b) Event 𝐴′ = {1,4,5,6}, Event B ={2, 4, 6} and 𝐴′ ∩ 𝐵 = {4, 6}.


2
2 3 𝑃(𝐴′ ∩𝐵) ( ) 2
Hence 𝑃(𝐴′ ∩ 𝐵) = 6 , 𝑃(𝐵) = 6 and 𝑃 (𝐴′ /𝐵) = = 6
1 =3
𝑃(𝐵) ( )
2

Example 4
(i) If 𝑃 (𝐴/𝐵) = 0.4, 𝑃 (𝐵) = 0.8 and 𝑃(𝐴) = 0.5, are the event A and B

9
Statistics and Probability

(a) independent, and (b) mutually exclusive?


(ii) If 𝑃 (𝐴) = 0.5, 𝑃(𝐵) = 0.3, and A and B are mutually exclusive, are they
independent?

Solution
(i) (a) 𝑃(𝐴/𝐵) = 0.4, 𝑃(𝐵) = 0.8 and 𝑃 (𝐴) = 0.5,
Two events are independent if P(A|B) = P(A),
𝑃 (𝐴/𝐵) = 0.4 ≠ 𝑃(𝐴) = 0.5
Hence A and B are not independent.

𝑃(𝐴∩𝐵)
(b) 𝑃(𝐴/𝐵) = so 𝑃(𝐴 ∩ 𝐵) = 𝑃 (𝐴/𝐵)𝑃(𝐵) = 0.4(0.8) = 0.32 ≠ 0
𝑃(𝐵)

Hence A and B are not mutually exclusive events.

(ii) A and B are mutually exclusive events, then 𝑃(𝐴 ∩ 𝐵) = 0.


Two events are independent if 𝑃(𝐴 ∩ 𝐵) = 𝑃 (𝐴)𝑃(𝐵).
𝑃 (𝐴)𝑃(𝐵). = 0.5(0.3) = 0.15 ≠ 𝑃 (𝐴 ∩ 𝐵) = 0
Hence, two events are not independent.

1.3 DISCRETE RANDOM VARIABLES


The probability distribution of a discrete random variable X is a list of each possible
value of X together with the probability that x takes that value in one trial of the
experiment.
The probability distribution of a discrete random variable X must satisfy the
following two conditions:
i. Each probability 𝑃(𝑋 = 𝑥) must be between 0 and 1:
ii. 0 ≤ 𝑃(𝑋 = 𝑥) ≤ 1
iii. The sum of all the possible probabilities is 1: ∑ 𝑃(𝑋 = 𝑥 ) = 1.

1.3.1 Cumulative Distribution Function


The cumulative distribution function of a discrete random variable x, denoted as
𝐹(𝑥 ) = 𝑃(𝑋 ≤ 𝑥 ) = ∑𝑖=1 𝑃(𝑋 = 𝑥𝑖 )

1.3.2 The Mean and Variance of a Discrete Random Variable

10
CHAPTER 1: INTRODUCTION TO STATISTICS

The mean (also called the "expectation value" or "expected value") of a discrete
random variable X is

𝜇 = 𝐸 (𝑋) = ∑ 𝑥𝑃(𝑋 = 𝑥)

The mean of a random variable may be interpreted as the average of the values
assumed by the random variable in repeated trials of the experiment.

The variance 𝜎 2 of a discrete random variable X is

𝜎 2 = 𝑉 (𝑋) = ∑(𝑥 − 𝜇)2 𝑃(𝑋 = 𝑥)

The standard deviation of X is 𝜎 = √𝜎 2 .


The variance and standard deviation of a discrete random variable X may be
interpreted as measures of the variability of the values assumed by the random
variable in repeated trials of the experiment.

Rules for Means and Variances


i. 𝐸 (±𝑎) = ±𝑎
ii. 𝑉 (±𝑎) = 0
iii. 𝐸 (𝑎 ± 𝑏𝑋) = 𝑎 ± 𝑏𝐸(𝑋)
iv. 𝑉 (𝑎 ± 𝑏𝑋) = 𝑏2 𝑉(𝑋)

Example 1
A fair coin is tossed three times. Let X is the number of heads that are observed.
(a) Construct the probability distribution of X and show that X is discrete random
variable.
(b) Find the probability that at least one head is observed.
(c) Find 𝐹(𝑥 ).
(d) Find mean and variance of X.

Solution
(a) Let X = the number of heads you get when you toss three fair coins.
𝑆 = {𝑇𝑇𝑇, 𝑇𝐻𝐻, 𝐻𝑇𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝑇, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝐻𝐻𝐻}
So, 𝑥 = {0,1,2,3}, X is in words and x is a number.

11
Statistics and Probability

Then the probability distribution of X is as follows:

𝑥 0 1 2 3
𝑃(𝑋 = 𝑥) 1/8 3/8 3/8 1/8

1 3 3 1
Since ∑ 𝑃(𝑋 = 𝑥 ) = 8 + 8 + 8 + 8 = 1, then X is discrete random variable.
3 3 1 7
(b) 𝑃 (𝑋 ≥ 1) = 8 + 8 + 8 = 8

(c) 𝐹 (𝑥 ) = 𝑃 (𝑋 ≤ 𝑥 )
𝐹(0) = 𝑃 (𝑋 ≤ 0) = 1/8; 𝐹(1) = 𝑃(𝑋 ≤ 1) = 4/8; 𝐹 (2) = 𝑃(𝑋 ≤ 2) = 7/8
and 𝐹 (3) = 𝑃(𝑋 ≤ 3) = 1
𝑥 0 1 2 3
𝑃(𝑋 = 𝑥) 1/8 3/8 3/8 1/8
𝐹 (𝑥 ) 1/8 4/8 7/8 1

Figure 3: Probability Distribution and Cumulative Distibution for Tossing a Fair Coin Thrice

1 3 3 1 12 4
(d) 𝜇 = 𝐸 (𝑋) = ∑ 𝑥𝑃(𝑋 = 𝑥 ) = 0 (8) + 1 (8) + 2 (8) + 3 (8) = =3
8

𝜎 2 = ∑ (𝑥 − 𝜇 ) 2 𝑃 (𝑋 = 𝑥 )

4 2 1 4 2 3 4 2 3 4 2 1
= (0 − ) ( ) + (1 − ) ( ) + (2 − ) ( ) + (3 − ) ( ) = 0.778
3 8 3 8 3 8 3 8
𝜎 = √𝜎 2 = 0.882

Example 2
A pair of fair dice is rolled. Let X represent the sum of two dice of dots on the top
faces.
(a) Construct the probability distribution of X for a paid of fair dice and show that X
is discrete random variable.
(b) Find 𝑃(𝑋 ≥ 9).
(c) Find the probability that X takes an even value.
(d) Find 𝐹(𝑥 ).

12
CHAPTER 1: INTRODUCTION TO STATISTICS

Solution
Let X represent the sum of two dice.
(a) The sample space is
𝑆 = {(1 + 1), (1 + 2), … , (5 + 6), (6 + 6)}, 𝑛(𝑆) = 36
Then the probability distribution of X is as follows:

x 2 3 4 5 6 7 8 9 10 11 12
𝑃 (𝑋 = 𝑥) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

1 2 2 1
Since ∑ 𝑃(𝑋 = 𝑥 ) = 36 + 36 + ⋯ + 36 + 36 = 1, then X is discrete random
variable.

Figure 4: Probability Distribution for Tossing Two Fair Dice

10
(b) 𝑃 (𝑋 ≥ 9) = 𝑃(𝑋 = 9) + 𝑃(𝑋 = 10) + 𝑃 (𝑋 = 11) + 𝑃(𝑋 = 12) = 36

(c) Probability of X takes an even value is


18
𝑃(𝑋 = 2) + 𝑃(𝑋 = 4) + ⋯ + 𝑃 (𝑋 = 12) = 36

(d)
x 2 3 4 5 6 7 8 9 10 11 12
𝑃 (𝑋 = 𝑥) 1 2 3 4 5 6 5 4 3 2 1
36 36 36 36 36 36 36 36 36 36 36
𝐹 (𝑥) 1 3 6 10 15 21 26 30 33 35 1
36 36 36 36 36 36 36 36 36 36

1.4 CONTINUOUS RANDOM VARIABLES


The probability distribution of a continuous random variable X is an assignment of
probabilities to intervals of decimal numbers using a function𝑓(𝑥), called a density
function. For a continuous random variable X, a probability density function is a
function such that
i. 𝑓(𝑥) ≥ 0

13
Statistics and Probability


ii. ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
𝑏
iii. 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = 𝑃 (𝑎 < 𝑋 < 𝑏) = ∫𝑎 𝑓(𝑥 )𝑑𝑥 = area under 𝑓(𝑥) from a to b
as illustrated in Figure 1.

Figure1: Probability Given as Area of a Region under a Curve

A continuous variable is a variable whose value is obtained by measuring.


Examples: height of students in class, weight of students in class, time it takes to
get to school, distance traveled between classes.

1.4.1 The cumulative distribution function


The cumulative distribution function of a continuous random variable X is
𝑥
𝐹(𝑥 ) = 𝑃(𝑋 ≤ 𝑥 ) = ∫−∞ 𝑓 (𝑥 )𝑑𝑥 for −∞ < 𝑥 < ∞

1.8.2 Mean and Variance of a continuous random variable


Suppose X is a continuous random variable with probability density function 𝑓(𝑥), the
mean and variance of X are

𝜇 = 𝐸 (𝑋) = ∫−∞ 𝑥𝑓(𝑥 )𝑑𝑥
∞ ∞
𝜎 = 𝑉 (𝑋) = ∫ (𝑥 − 𝜇) 𝑓(𝑥)𝑑𝑥 = ∫ 𝑥 2 𝑓 (𝑥)𝑑𝑥 − 𝜇2
2 2
−∞ −∞

The standard deviation of X is 𝜎 = √𝜎 2 .

Example 1
Let the continuous variable X denote the current measured in a thin copper wire in
milliamperes. Assume that the range of X is [0, 20 mA], and assume that the
probability density function of X is

14
CHAPTER 1: INTRODUCTION TO STATISTICS

0.05 , 0 ≤ 𝑥 ≤ 20
𝑓 (𝑥 ) = {
0, others

(a) Show X is continuous random variable.


(b) Find (i) 𝑃(5 < 𝑋 ≤ 10), (ii) 𝑃(𝑋 > 10) and (iii) 𝑃(𝑋 < 5) .
(c) Find 𝐹 (𝑥 ).
(d) Find mean and variance of X.

Solution

(a) X is continuous random variable if ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
∞ 0 20 ∞
∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−∞ 0𝑑𝑥 + ∫0 0.05𝑑𝑥 + ∫20 0𝑑𝑥
= 0 + 0.05𝑥]20
0 +0 =1

∴ X is continuous random variable


10
(b) i. 𝑃(5 < 𝑋 ≤ 10) = ∫5 0.05𝑑𝑥 = 0.05𝑥 ]10
5 = 0.25
20 20
ii. 𝑃(𝑋 > 10) = ∫10 0.05𝑑𝑥 = 0.05𝑥]10 = 0.05(10) = 0.5

5 5
iii.𝑃(𝑋 < 5) = ∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫0 0.05𝑑𝑥 = 0.05𝑥]50 = 0.025

𝑥
(c) 𝐹(𝑥 ) = 𝑃(𝑋 ≤ 𝑥 ) = ∫−∞ 𝑓 (𝑥 )𝑑𝑥
For x < 0
𝑥
𝐹(𝑥 ) = 𝑃(𝑋 ≤ 𝑥 ) = ∫−∞ 0𝑑𝑥 = 0
For 0 ≤ 𝑥 ≤ 20
𝑥
𝐹(𝑥 ) = 𝑃(𝑋 ≤ 𝑥 ) = ∫0 0.05𝑑𝑥 = 0.05𝑥
For 𝑥 > 20
𝐹 (𝑥 ) = 𝑃 (𝑋 ≤ 𝑥 ) = 1
Then
0 , 𝑥<0
𝐹(𝑥 ) = {0.05𝑥, 0 ≤ 𝑥 ≤ 20
1, 𝑥 > 20
20
∞ 20 0.05𝑥 2
(d) 𝜇 = 𝐸 (𝑋) = ∫−∞ 𝑥𝑓 (𝑥 )𝑑𝑥 = ∫0 𝑥(0.05)𝑑𝑥 = ] = 10 𝑚𝐴
2 0

15
Statistics and Probability

∞ 20
𝜎 = 𝑉(𝑋) = ∫ 𝑥 𝑓 (𝑥 )𝑑𝑥 − 𝜇 = ∫ 𝑥 2 (0.05)𝑑𝑥 − 102
2 2 2
−∞ 0

3 20
0.05𝑥
= ] − 102
3 0
400
= − 102 = 33.33 𝑚𝐴2
3

Exercise
1. Determine the correct data type (quantitative or qualitative). Indicate whether
quantitative data are continuous or discrete.
i. The number of pairs of shoes you own
ii. The type of car you drive
iii. The place where you go on vacation
iv. The distance it is from your home to the nearest grocery store
v. The number of classes you take per school year.
vi. The tuition for your classes
vii. The type of calculator you use
viii.Movie ratings
ix. Political party preferences
x. Weights of sumo wrestlers
xi. Amount of money (in dollars) won playing poker
xii. Number of correct answers on a quiz
xiii.Peoples’ attitudes toward the government
xiv.IQ scores

2. A fair coin is tossed twice. Let X be the number of heads that are observed.
a) Construct the probability distribution of X.
b) Find the probability that at least one head is observed.
c) Find 𝐹 (𝑥 ).
d) Find mean and variance of X.

3. A random variable X has the uniform distribution on the interval [0,1]: the density
function is 𝑓 (𝑥 ) = 1 if x is between 0 and 1 and 𝑓 (𝑥 ) = 0 for all other values of x.
a) Show X is continuous random variable.

16
CHAPTER 1: INTRODUCTION TO STATISTICS

b) Find (i) 𝑃(0.5 < 𝑋 ≤ 1), (ii) 𝑃(𝑋 > 0.10) and (iii) 𝑃(𝑋 < 0.5) .
c) Find 𝐹 (𝑥 ).
d) Find mean and variance of X.

17

You might also like