Unit 6
Unit 6
Random
Variables
Discrete Continuous
Random Variable Random Variable
Discrete Random Variable
• A discrete random variable is a variable that can assume only a countable
number of values
Many possible outcomes:
• number of complaints per day
• number of TV’s in a household
• number of rings before the phone is answered
Only two possible outcomes:
• gender: male or female
• defective: yes or no
• spreads peanut butter first vs. spreads jelly first
Continuous Random Variable
• These can potentially take on any value, depending only on the ability
to measure accurately.
Discrete Random Variables
Probability Distribution
x Value Probability
0 1/4 = .25
1 2/4 = .50
2 1/4 = .25
.50
Probability
.25
0 1 2 x
Probability Distributions
Probability
Distributions
Discrete Continuous
Probability Probability
Distributions Distributions
Binomial Normal
Poisson
Continuous Probability Distributions
• These can potentially take on any value, depending only on the ability to
measure accurately.
The Normal Distribution
Probability
Distributions
Continuous
Probability
Distributions
Normal
The Normal Distribution
f(x)
• Bell Shaped
• Symmetrical
• Mean, Median and Mode are Equal
Location is determined by the mean, μ σ
Spread is determined by the standard
deviation, σ x
μ
The random variable has an infinite
theoretical range: Mean
+ ∞ to − ∞ = Median
= Mode
Many Normal Distributions
μ
x
Finding Normal Probabilities
f(x)
P (a ≤ x ≤ b)
a b x
Probability as
Area Under the Curve
The total area under the curve is 1.0, and the curve is symmetric, so half is
above the mean, half is below
f(x)
P(−∞ < x < μ) = 0.5 P(μ < x < ∞) = 0.5
μ x
x
μ−1σ μ μ+1σ
68.26%
• μ ± 2σ covers about 95% of x’s
• μ ± 3σ covers about 99.7% of x’s
2σ 2σ 3σ 3σ
μ x μ x
95.44% 99.72%
Importance of the Rule
• The chance that a value that far or farther away from the mean is
highly unlikely, given that particular mean and standard deviation
The Standard Normal Distribution
• Also known as the “z” distribution
• Mean is defined to be 0
• Standard Deviation is 1
f(z)
z
0
x −μ
z=
σ
Example
x − μ 250 −100
z= = = 3.0
σ 50
• This says that x = 250 is three standard deviations
(3 increments of 50 units) above the mean of 100.
Comparing x and z units
μ = 100
σ = 50
100 250 x
0 3.0 z
Note that the distribution is the same, only the scale has changed. We can
express the problem in original units (x) or in standardized units (z)
General Procedure for Finding Probabilities
Calculate z-values:
x −μ 8−8
z= = =0
σ 5
8 8.6 x
x − μ 8.6 − 8 0 0.12 Z
z= = = 0.12
σ 5 P(8 < x < 8.6)
= P(0 < z < 0.12)
Z Table example
(continued)
• Suppose x is normal with mean 8.0 and standard deviation 5.0.
Find P(8 < x < 8.6)
µ =8 µ =0
σ =5 σ =1
8 8.6 x z
0 0.12
Z
8.0
8.6
Finding Normal Probabilities
(continued)
• Type I Error
• Reject a true null hypothesis
• Considered a serious type of error
• Type II Error
• Fail to reject a false null hypothesis
8. If zcalculated is larger than zcritical then reject H0. However if zcalculated is less than
zcritical then you are failed to reject the null hypothesis ( or you have to accept
the null hypothesis (in non-technical language))
Level of Significance
Z test 0.10 0.05 0.02 0.01
Two Tailed 1.645 1.96 2.33 2.575
One Tailed 1.28 1.645 2.0565 2.33
for two-tailed test, the values are ± while for one-tailed tests, they are negative or positive accordingly as they are
left-tailed or right-tailed
• An ambulance service claims that it takes on the average 8.9 minutes to reach its destination in
emergency calls. To check this claim, the agency which licenses ambulance services has then
timed on 50 emergency calls, getting a mean of 9.3 minutes with standard deviation of 1.8
minutes. Does this constitute evidence that the figure claimed is too low at 1 percent
significance level.
• Ho: µ=8.9
• Ha: µ not equal to 8.9
• Two tailed test
Hypotheses Testing
• The mean life time of a sample of 400 light bulbs produced by a company is
found to be 1600 hours with standard deviation of 150 hours. Test the
hypothesis that the mean life time of the bulbs produced in general is higher
than the mean life of 1570 hours @ α =0.01 level of significance.
Rejection region
Acceptance region
Ho: µ=< 1570
Ha: µ > 1570 Right tailed test.
0
x̄ = 1600 hrs, µ= 1570, N= 400, Zcal = 4
4
Level of Significance
Z test 0.10 0.05 0.02 0.01
Two Tailed 1.645 1.96 2.33 2.575
One Tailed 1.28 1.645 2.0565 2.33
• An ambulance service claims that it takes on the average 8.9 minutes
to reach its destination in emergency calls. To check this claim, the
agency which licenses ambulance services has then timed on 50
emergency calls, getting a mean of 9.3 minutes with standard
deviation of 1.8 minutes. Does this constitute evidence that the figure
claimed is too low at 1 percent significance level.
• Ho: µ = 8.9
• Ha: µ ≠ 8.9
• Two tailed test
• Zcalculated= 1.57
• [email protected]=2.58 0 1.57
-2.58 2.58
We are failed to reject the Null hypothesis
• A factory has a machine dispense 80ml fluid in bottle. An employee
has a doubt about the measurement. Using a sample of 40 bottles, he
measures the average liquid dispensed by the machine is 78 ml with
standard deviation of 2.5 ml.
• Design the hypotheses, test the claim at 95%
Chi-Square Test
The chi-square is a very versatile statistic and is used
extensively in statistical work.
The variable as given below also closely follows
the chi-square distribution.
Number 1 2 3 4 5 6
Frequency 22 24 38 30 46 44
Observed Number 1 2 3 4 5 6
Frequency 22 24 38 30 46 44
Number 1 2 3 4 5 6
Expected
Frequency 34 34 34 34 34 34
O E (O-E) (O – E)2 (O – E)2/E
30 34 -4 16 0.47
46 34 12 144 4.24
44 34 10 100 2.94
Rejection region
15.3
15.086
Chi square calculated value = 15.3 15.3
But the critical or table value is 15.086
Therefore, null hypothesis is rejected
Thus, dice is unfair (or biased)
Level of significance
α =0.01
Degree of freedom
= n-1
= 6-1
=5
Critical Value
= 15.086
Example 2
A coin is flipped 200 times and and the results obtained in this regard are as follows -
Head Tail
Frequency 92 108
92 100 -8 64 0.64
6.635
1.28
Therefore, we are failed to reject the null hypothesis
Thus, it is concluded that coin is fair (or unbiased)
Level of significance
α =0.01
Degree of freedom
= n-1
= 2-1
=1
Critical Value
= 6.635
Report and Presentation
Importance of the Report and Presentation
For the following reasons, the report and its presentation
are important parts of the marketing research project:
Data Analysis
Report Preparation
Oral Presentation
Research Follow-Up
Report Format
I. Title page
II. Letter of transmittal
III. Letter of authorization
IV. Table of contents (see next page)
V. List of tables
VI. List of graphs
VII. List of appendices
VIII. List of exhibits
IX. Executive summary
a. Major findings
b. Conclusions
c. Recommendations
1. Problem definition
a. Background to the problem
b. Statement of the problem
2. Approach to the problem
3. Research design
a. Type of research design
b. Information needs
c. Data collection from secondary sources
d. Data collection from primary sources
e. Scaling techniques
f. Questionnaire development and pretesting
g. Sampling techniques
h. Fieldwork
4. Data analysis
a. Methodology
b. Plan of data analysis
5. Results
6. Limitations and caveats
7. Conclusions and recommendations
8. Exhibits
a. Questionnaires and forms
b. Statistical output
c. Lists
9. Reference/Bibliography
Report Writing
• Readers. A report should be written for a specific reader or readers: the
marketing managers who will use the results.
• Easy to follow. The report should be easy to follow. It should be structured
logically and written clearly.
• Presentable and professional appearance. The look of a report is
important.
• Objective. Objectivity is a virtue that should guide report writing. The rule
is, "Tell it like it is."
• Reinforce text with tables and graphs. It is important to reinforce key
information in the text with tables, graphs, pictures, maps, and other
visual devices.
• Terse. A report should be terse and concise. Yet, brevity should not be
achieved at the expense of completeness.
Guidelines for Tables
• Title and number. Every table should have a number (1a) and title (1b).
• Arrangement of data items. The arrangement of data items in a table should emphasize
the most significant aspect of the data.
• Basis of measurement. The basis or unit of measurement should be clearly stated (3a).
• Leaders, rulings, spaces. Leaders, dots or hyphens used to lead the eye horizontally,
impart uniformity and improve readability (4a). Instead of ruling the table horizontally or
vertically, white spaces (4b) are used to set off data items. Skipping lines after different
sections of the data can also assist the eye. Horizontal rules (4c) are often used after the
headings.
• Explanations and comments: Headings, stubs, and footnotes. Designations placed over
the vertical columns are called headings (5a). Designations placed in the left-hand column
are called stubs (5b). Information that cannot be incorporated in the table should be
explained by footnotes (5c).
• Sources of the data. If the data contained in the table are secondary, the source of data
should be cited (6a).
U.S. Auto Sales 2003 - 2007
Guidelines for Graphs: Round or Pie Charts
9%
GM
7% 24% Ford
Chrysler
Toyota
11%
Honda
Nissan
Other
18%
18%
13%
Guidelines for Graphs: Line Charts
5000000
4500000 GM
4000000
Ford
3500000
Chrysler
Unit Sales
3000000
2500000 Toyota
2000000 Honda
1500000 Nissan
1000000
Other
500000
0
2003 2004 2005 2006 2007
Year
Guidelines for Graphs: Line Charts
20000000
18000000 Other
16000000
Nissan
14000000
Honda
Unit Sales
12000000
10000000 Toyota
8000000 Chrysler
6000000 Ford
4000000
GM
2000000
0
2003 2004 2005 2006 2007
Year
Guidelines for Graphs: Pictographs
• The histogram is a vertical bar chart in which the height of the bars
represents the relative or cumulative frequency of occurrence of a specific
variable.
Histogram of 2007 U.S. Auto Sales
45,00,000
40,00,000
35,00,000
30,00,000
GM
25,00,000 Ford
Chrysler
20,00,000 Toyota
Honda
15,00,000 Nissan
other
10,00,000
5,00,000
0
Oral Presentation