Subject: Ce 221 Engineering Data Analysis: Simple Random Sample
Subject: Ce 221 Engineering Data Analysis: Simple Random Sample
(Week 2-3)
Introduction
Engineering Data Analysis (EDA) is an indispensable analysis tool for the
engineering team of the industries to analyze processes, integration, and yield
(conversion rate) effectively in order to enhance the competitiveness of the company
Learning Outcome
1. Know the method of Data collection
2. Apply planning and conducting experiments
3. Interpret Planning and conducting surveys
Learning Content
Stratified sampling - This involves taking a sample from each population unit in non-
overlapping groups. For instance, the manufacturer of a light bulb wishes to
investigate the lifetime of their bulbs. If 25-watt, 60-watt, and 100-watt bulbs were
produced, a separate sample could be selected from each of the three bulb sizes. This
would result in information on all the three bulb sizes.
FREQUENCY DISTRIBUTIONS
Classification of Data:
UNGROUPED DATA- When the data is small (n ≤ 30) or when there are few distinct
values, the data may be organized without grouping.
EXAMPLE 1.1
GROUP DATA- Statistical data gathered the large masses (n ≥ 30) can be assessed by
grouping the data into different classes.
The following are suggested steps in forming a frequency distribution from raw data:
1. Find the range (R). The range is the difference between the largest and smallest
value.
2.Decide on a suitable number of classes. This will depend upon what information the
table is supposed to present. Surge suggested the number of classes (m) as
m= 1+3.3 log n where n= number of cases
The class size (c) may be rounded off to the same place value as the data.
4. Fins the number of observations in each class. This is the class frequency (f).
EXAMPLE:
The following are data on the observed compressive strength in psi of 50 samples of
concrete interlocking blocks.
136 92 11 11 12 13 13 12 10 125
5 8 1 7 2 0 4
127 103 11 12 11 82 10 13 12 95
0 6 8 4 7 0
R=H-L
R = 148 - 82 = 66
m = 1 + 3.3log 50 = 7 Classes
c = 66/7 = 9.4
use c = 10 since the data values are to the nearest ones.
The lowest value is 82. It is convenient to start with 80 as the lower limit of the
first class. 80 + 10 = 90 is the lower limit of the second class.
The number of observed values tallied in each is the class frequency. The relative
frequency of each class is also obtained and presented in Table 1.2.
Compressiv Tally No. of Relative Frequency
e Strength blocks
(psi) (frequency,
f)
80-89 II 2 0.04
∑ 50 1.00
24.5 23.6 24.1 25.0 22.9 24.7 23.8 25.2 23.7 24.4
24.7 23.9 25.1 24.6 23.3 24.3 24.6 23.9 24.1 24.4
24.5 25.7 23.6 24.0 23.9 24.2 24.7 24.9 25.O 24.8
24.5 23.4 24.9 24.8 24.7 24.1 22.8 23.1 25.3 24.6
The lowest value is 22.8, therefore. 22.5 maybe the lower limit of the 1 st class. 22.5 + 0.5 =
23.0 is the lower limit of the 2nd class.
Gasoline Tally No. of cars Relative Frequency
Consumption (frequency, f)
(miles/gallon)
∑ 40 1.000
TABLE 1.4 CLASS LIMITS, CLASS BOUNDARIES AND CLASS MARKS FOR
FREQUENCY DISTRIBUTION PRESENTED IN TABLE 1.2
Table 1.5 Class limits, class boundaries and class marks of frequency distribution presented
in Table 1.3
∑ҳ
x= i=1
n
X
∑ fiҳ
i=1
¿
n
Example 1.5
The following data represent the time in seconds for 9 glued samples to dry and
attains its bond strength: 3.6, 2.5, 3.1, 4.3, 2.4, 2.9, 2., 4.1 and 3.4. Calculate the mean.
SOLUTION:
n
1.46 6 8.76
1.48 4 5.92
1.49 5 7.45
1.50 6 9.00
1.52 9 13.68
∑ 30 44.81
Classes fi Xi fi Xi
∑ 40 972
Median
The median of a set of numbers in an array is either the middle value or the
arithmetic mean of two middle values.
The sample median ᵪ is used to estimate the population median μ.
Example 1.8
For the set of numbers 1, 3, 3, 5, 6, 8, 9, 9, 10
Solution:
~
x = x5 = 6
Example 1.9
For the set of numbers 4, 4, 7, 9, 11, 12, 15, 18
Solution:
~ x 4 + x 5 9+11
x= 2
=
2
= 10
Example 1.10
Find the median of the data in Example1.5.
Solution:
Arrange the data in ascending magnitude. 2.3, 2.5, 2.6, 2.9, 3.1, 3.4, 3.6, 4.1, 4.3
~
x = x5 =3.1 seconds
~
n ( ∑f )
x 2 L
= Lm + C
fm
Where:
Lm = lowest class boundary of the median class
MODE
Mode is the value which occurs with greatest frequency. The sample mode is
designated as ^X and the population mode by ~ .
μ
Example 1.12
For the set of numbers 3, 3, 5, 7, 9, 10, 11, 10, 11, 12, 9, 18, 9
Solution:
^
x = 9 (unimodal)
Example 1.13
The set of numbers 6, 7, 9, 10, 12 has no mode.
Example 1.14
For the set of values 2.2, 3.1, 4.1, 4.1, 5.4, 5.4, 5.4, 5.4, 6.2, 7.7, 7.7, 8.5, 8.5, 8.5, 9.3
Solution:
^
x = 5.4 and 8.5(bimodal)
^ d1
X=L +
mo
(d 1+d 2)
c
where:
Lmo = lower class boundary of the modal class
d1 = excess of modal frequency over frequency of the next lower
class
d2 = excess of modal frequency over frequency of the next
lower class
c = size of the modal class interval
Conducting a Survey
There are various methods for administering a survey. It can be done as a face-to face
interview or a phone interview where the researcher is questioning the subject. A different
option is to have a self-administered survey where the subject can complete a survey on
paper and mail it back, or complete the survey online. There are advantages and
disadvantages to each of these methods.
The advantages of face-to-face interviews include fewer misunderstood questions, fewer
incomplete responses, higher response rates, and greater control over the environment in
which the survey is administered; also, the researcher can collect additional information if
any of the respondents’ answers need clarifying. The disadvantages of face-to-face
interviews are that they can be expensive and time-consuming and may require a large staff
of trained interviewers. In addition, the response can be biased by the appearance or attitude
of the interviewer.
The advantages of self-administered surveys are that they are less expensive than
interviews, do not require a large staff of experienced interviewers and can be administered
in large numbers. In addition, anonymity and privacy encourage more candid and honest
responses, and there is less pressure on respondents. The disadvantages of self-
administered surveys are that responders are more likely to stop participating mid-way
through the survey and respondents cannot ask them to clarify their answers. In addition,
there are lower response rates than in personal interviews, and often the respondents who
bother to return surveys represent extremes of the population – those people who care about
the issue strongly, whichever way their opinion leans.
Designing a Survey
Surveys can take different forms. They can be used to ask only one question or they can ask
a series of questions. We can use surveys to test out people’s opinions or to test a
hypothesis.
When designing a survey, the following steps are useful:
1. Determine the goal of your survey: What question do you want to answer?
2. Identify the sample population: Whom will you interview?
3. Choose an interviewing method: face-to-face interview, phone interview, self-
administered paper survey, or internet survey.
4. Decide what questions you will ask in what order, and how to phrase them. (This is
important if there is more than one piece of information you are looking for.)
5. Conduct the interview and collect the information.
6. Analyze the results by making graphs and drawing conclusions.
77 80 82 68 65 59 61
57 50 62 61 70 69 64
67 70 62 65 65 73 76
87 80 82 83 79 79 71
80 77
Once you've sorted the data by value and grouped them by the tens digit, put them into a
graph called "Temperatures." Label the left column (the stem) as "Tens" and the right column
as "Ones," then fill in the corresponding temperatures as they occur above.
Now that you've had a chance to try this problem on your own, read on to see an example of
the correct way to format this data set as a stem-and-leaf plot graph.
Temperatures
TensOnes
5 079
6 11224555789
7 001367799
Temperatures
8 0002237
You should always begin with the lowest number, or in this case temperature: 50. Since 50
was the lowest temperature of the month, enter a 5 in the tens column and a 0 in the ones
column, then observe the data set for the next lowest temperature: 57. As before, write a 7 in
the ones column to indicate that one instance of 57 occurred, then proceed to the next-
lowest temperature of 59 and write a 9 in the ones column.
Find all of the temperatures that were in the 60s, 70s, and 80s and write each temperature's
corresponding ones value in the ones column. If you've done it correctly, it should yield a
stem-and-leaf plot graph that looks like the one in this section.
Constructing a Survey
1. Martha wants to construct a survey that shows which sports students at her school like to
play the most.
a) List the goal of the survey.
The goal of the survey is to find the answer to the question: “Which sports do students at
Martha’s school like to play the most?”
b) What population sample should she interview?
A sample of the population would include a random sample of the student population in
Martha’s school. A good strategy would be to randomly select students (using dice or a
random number generator) as they walk into an all-school assembly.
c) How should she administer the survey?
Face-to-face interviews are a good choice in this case. Interviews will be easy to conduct
since the survey consists of only one question which can be quickly answered and recorded,
and asking the question face to face will help eliminate non-response bias.
d) Create a data collection sheet that she can use to record her results.
In order to collect the data to this simple survey Martha can design a data collection sheet
such as the one below:
Sport Tally
Baseball
Basketball
Football
Sport Tally
Soccer
Volleyball
Swimming
9th grade
Grade Level Number of Hours Worked Total number of students
10th grade
11th grade
12th grade
This data collection sheet allows Raoul to write down the actual numbers of hours worked
per week by students as opposed to just collecting tally marks for several categories.
Display, Analyze, and Interpret Statistical Survey Data
In the previous section we considered two examples of surveys you might conduct in your
school. The first one was designed to find the sport that students like to play the most. The
second survey was designed to find out how many hours per week students worked.
For the first survey, students’ choices fit neatly into separate categories. Appropriate ways to
display the data might be a pie chart or a bar graph. Let’s revisit this example.
In Example A Martha interviewed 112 students and obtained the following results.
Sport Tally
Baseball |||| |||| |||| |||| |||| |||| | 31
Basketball |||| |||| |||| || 17
Football |||| |||| |||| 14
Soccer |||| |||| |||| |||| |||| ||| 28
Volleyball |||| |||| 9
Swimming |||| ||| 8
Gymnastics ||| 3
Fencing || 2
Sport Tally
Total: 112
a) Make a bar graph of the results showing the percentage of students in each category.
To make a bar graph, we list the sport categories on the x−axis and let the percentage of
students be represented by the y−axis.
To find the percentage of students in each category, we divide the number of students in
each category by the total number of students surveyed:
Sport Percentage
Baseball 31/112=.28=28%
Basketball 17/112=.15=15%
Football 14/112=.125=12.5%
Soccer 28/112=.25=25%
Volleyball 9/112=.08=8%
Swimming 8/112=.07=7%
Gymnastic 3/112=.025=2.5%
Fencing 2/112=.02=2%
Now we can make a graph where the height of each bar represents the percentage of
students in each category:
b. Make a pie chart of the collected information, showing the percentage of students in each
category.
To make a pie chart, we find the percentage of the students in each category by dividing the
number of students in each category as in part a. The central angle of each slice of the pie is
found by multiplying the percentage of students in each category by 360 degrees (the total
number of degrees in a circle). To draw a pie-chart by hand, you can use a protractor to
measure the central angles that you find for each category.
Here is the pie-chart that represents the percentage of students in each category:
For the second survey, actual numerical data can be collected from each student. In this
case we can display the data using a stem-and-leaf plot, a frequency table, a histogram, or
a box-and-whisker plot.
Design of experiment (DOE) is a body of knowledge, based upon statistical and other
scientific disciplines, for efficient and effective planning of experiments and for making sound
inferences from experimental data.
In an experiment, we deliberately change one or more process variables (or factors) in order
to observe the effect the changes have on one or more response variables. The (statistical)
design of experiments (DOE) is an efficient procedure for planning experiments so that the
data obtained can be analyzed to yield valid and objective conclusions.
DOE begins with determining the objectives of an experiment and selecting the process
factors for the study. An Experimental Design is the laying out of a detailed experimental
plan in advance of doing the experiment. Well-chosen experimental designs maximize the
amount of “information” that can be obtained for a given amount of experimental effort.
Used to evaluate which process inputs have a significant impact on the process output
and what the target level of those inputs should be to achieve a desired result (Output).
Design of experiment defines the:
Population to be studied,
Randomisation Process
Administration of Treatments,
Sample size requirement
Method of statistical analysis
The process of DOE may seem too cumbersome and extensive to comprehend at the first
try.
But, there is a need to understand the use of design of experiment in product and process
research and development to achieve product excellence.
Randomnisation
Replication and
Local Control
One can easily comprehend the Idea of DOE and easily implement it in product and process
research.
Test I
1. Create an Ungrouped Frequency Distribution table with the data from the survey,
accomplished among the students of university, which answered the question of how many
books they read per year. Arrange the data in frequency table
7 3 1 7 8 5 4 4 5 6 6 3 3 4 5 1 8 3
2. The highest flow recorded each year was determined from the flow data of gaging station
at a certain river. The following observations reflect the highest annual flow in (m 3/s) for 50
years
55 43 60 94 37 56 91 30 65 68
42 75 33 71 60 65 76 52 69 58
45 48 39 61 35 78 56 39 44 65
71 60 61 77 61 59 47 49 74 69
83 69 40 64 31 27 36 87 62 66
**Start with “26” as the lower limit of your of the first class.
** use class size (c) of “10”
3. Find the mean weight of how many students read books per year in question no.1
4. Find the mean of gaging station in question no.2
TEST II
1. Samuel conducted a survey to answer the following question: “What are the favorite
subjects of your classmate during High School . He collected the following information by
asking his classmates in High School.( Choose 5 subjects in your high school).
a) Make a pie chart of the results showing the percentage of people in each category.
b) Make a bar graph of the results.
2. Melissa conducted a survey to answer the question “What sport do high school students
like to watch on TV the most?” She collected the following information on her data collection
sheet.
a) Make a pie-chart of the results showing the percentage of people in each category.
b) Make a bar-graph of the results.
3. Pedro conducted a survey to find how many hours of TV teenagers watch each week in
Isabela He collaborated with three friends that lived in city/ municipality of Isabela and found
the following information: (Choose 3 municipality/city only)
a) Make a stem-and-leaf plot of the data.
b) Decide on an appropriate bin size and construct a frequency table.
c) Make a histogram of the results.
http://www.fs-technology.com/EN/EDA-en.html
https://www.slideshare.net/derechohernan/mean-for-grouped-data
https://youtu.be/lLQ7nRjOpng
https://www.ck12.org/statistics/planning-and-conducting- surveys/lesson/Planning-
and-Conducting-Surveys-ALG-I/
https://www.thoughtco.com/stem-and-leaf-plot-an-overview-2312423