Engineering Data Analysis Chapter 1 2
Engineering Data Analysis Chapter 1 2
ENGINEERING DATA
ANALYSIS
FACULTY, CET
COURSE OUTLINE
1. OBTAINING DATA
1.1 Methods of Data Collection
1.2 Planning and Conducting Surveys
1.3 Planning and Conducting Experiments:
Introduction to Design of Experiments
2. PROBABILITY
2.1 Relationship among Events
2.2 Counting Rules Useful in Probability
2.3 Rules of Probability
3. Discrete Probability Distribution
3.1 Random Variables and their Probability Distribution
3.2 Cumulative Distribution Functions
3.3 Expected Values of Random Variables
3.4 Binomial Distribution
3.5 Poisson Distribution
4. Continuous Probability Distribution
4.1 Continuous Random Variables and their Probability
Distribution
4.2 Expected Values of Continuous Random Variables
4.3 Normal Distribution
4.4 Normal Approximation to the Binomial and Poisson
Distribution
4.5 Exponential Distribution
5. Joint Probability Distribution
5.1 Two or Random Variables
5.1.1 Joint Probability Distributions
5.1.2 Marginal Probability Distribution
5.1.3 Conditional Probability Distribution
5.1.4 More than Two Random Variables
5.2 Linear Functions of Random Variables
5.3 General Functions of Random Variables
6. Sampling Distributions and Point Estimation of Parameters
6.1 Point Estimation
6.2 Sampling Distribution and the Central Limit Theorem
6.3 General Concept of Point Estimation
6.3.1 Unbiased Estimator
6.3.2 Variance of a Point Estimator
6.3.3 Standard Error
6.3.4 Mean Squared Error of an Estimator
7. Statistical Intervals
7.1 Confidence Intervals: Simple Sample
7.2 Confidence Intervals: Multiple Samples
7.3 Prediction Intervals
7.4 Tolerance Intervals
8. Test of Hypothesis for a Single Sample
8.1 Hypothesis Testing
8.1.1 One-sided and Two-sided Hypothesis
8.1.2 P-value in Hypothesis Tests
8.1.3 General Procedure for Test of Hypothesis
8.2 Test on the Mean of a Normal Distribution, Variance
Known
8.3 Test on the Mean of a Normal Distribution, Variance
Unknown
8.4 Test on the Variance and Statistical Deviation of a Normal
Distribution
8.5 Test on a Population Proportion
9. Statistical Inference of Two Samples
9.1 Inference on the Difference in Means of Two Normal
Distributions, Variances Known
9.2 Inference on the Difference in Means of Two Normal
Distributions, Variances Unknown
9.3 Inference on the Variance of Two Normal
9.4 Inference on Two Population Proportions
10. Simple Linear Regression and Correlation
10.1 Empirical Models
10.2 Regression: Modelling Linear Relationships –
The Least-Squares Approach
10.3 Correlation: Estimating the Strength of Linear Relation
10.4 Hypothesis Tests in Simple Linear Regression
10.4.1 Use of t-tests
10.4.2 Analysis of Variance Approach to Test
Significance of Regression
10.5 Prediction of New Observations
10.6 Adequacy of the Regression Model
10.6.1 Residual Analysis
10.6.2 Coefficient of Determination
Course References
(1) Myers, R. et. al,. 2012., “Probability and Statistics for
Engineers and Scientist”. 9th Ed.
(2) Hayter, A., 2012., “Probability and Statistics for
Engineers and Scientist”., 4th Ed.
(3) Soong, T., 2004., “Fundamentals of Probability and
Statistics for Engineers”. 1st Ed.
Grading System
Attendance : 5%
Quizzes/Participation : 15%
Prelim Exam : 20%
Midterm Exam : 20%
Prefinal Exam : 20%
Final Exam : 20%
100%
Quantitative
Data
Qualitative
QUANTITATIVE
➢ are measures of values or counts and expressed as
numbers.
QUALITATIVE
➢ Defined as the data that approximates and
characterize.
➢ Non-numerical in nature.
➢ Collected through methods of observations, one-on-
one interviews and similar methods.
DATA TYPES
Continuous
Quantitative
Discrete
Data
Qualitative
CONTINUOUS DATA (VARIABLE)
➢ Data that can take the form of decimals or continuous
values of varying degrees of precision.
-Ex. Height, Weight
DISCRETE DATA (DISCONTINUOUS)
➢ Data whose value cannot take the form of decimals.
-Ex. Family Size, Enrolment Size
DATA TYPES
Continuous
Quantitative
Discrete
Data
Attribute
Qualitative
Open
ATTRIBUTE DATA
➢ Data that can be counted for recording and analysis.
-Ex. Size of T-Shirt: XS, M, L, XL, XXL
OPEN
➢ Data that is depending on the sample and not given a
specific value on a possible set of responses or answers.
DATA TYPES
Continuous
Quantitative
Discrete
Data
Nominal
Attribute
Qualitative Ordinal
Open
NOMINAL DATA
➢ Data defined by an operation which allows making
statements only equality or difference.
-Ex. Gender, Race, Religion, Political Affiliation
ORDINAL DATA
➢ Data defined affiliation operation whereby members of
a particular group are ranked.
-Ex. Awareness, IQ
Ungrouped (Raw) DATA
➢ Are data which are not organized in any specific way.
They are simply the collection of data as they are
gathered.
Grouped DATA
➢ Are raw data organized into groups or categories with
corresponding frequencies. Organized in this manner,
the data is referred to as frequency distribution.
METHODS OF DATA
COLLECTION
2. SECONDARY DATA
data which have been collected by
someone else and which have already
been passed through the statistical
process.
METHODS OF DATA COLLECTION:
PRIMARY DATA
1. Observation
2. Interview
3. Questionnaire
4. Case Study
5. Survey
METHODS OF DATA COLLECTION: PRIMARY DATA
OBSERVATION
Observation method is a method
under which data from the field is collected
with the help of observation by the observer
or by personally going to the field.
ADVANTAGES DISADVANTAGES
2. Unstructured Observation
when observation is done without any thought before
observation.
1. Participant
when the Observer is member of the group which he is
observing.
2. Non-participant
when observer is observing people without giving any
information to them.
1. Controlled
when the observation takes place in natural
condition. It is done to get spontaneous picture of
life and persons.
2. Uncontrolled
when observation takes place according to
definite pre arranged plans , with experimental
procedure then it is controlled observation
generally done in laboratory under controlled
condition.
METHODS OF DATA COLLECTION: PRIMARY DATA
INTERVIEW METHOD
INTERVIEW METHOD
This method of collecting data
involves presentation or oral-
verbal stimuli and reply in terms
of oral-verbal responses.
QUESTIONNAIRE METHOD
This method of data collection is
quite popular, particularly in
case of big enquiries.
The questionnaire is mailed to respondents who are
expected to read and understand the questions
and write down the reply in the space meant for
the purpose in the questionnaire itself. The
respondents have to answer the questions on their
own.
METHODS OF DATA COLLECTION: PRIMARY DATA
QUESTIONNAIRE METHOD
ADVANTAGES DISADVANTAGES
ADVANTAGES DISADVANTAGES
They are less costly and less They are subject to selection
time-consuming; they are bias
advantageous when exposure
data is expensive or hard to
obtain.
They are advantageous when They generally do not allow
studying dynamic populations in calculation of incidence
which follow-up is difficult. (absolute risk).
METHODS OF DATA COLLECTION: PRIMARY DATA
SURVEY METHOD
SURVEY METHOD is one of the
common methods of diagnosing
and solving of social problems is
that of undertaking surveys.
ADVANTAGES DISADVANTAGES
Surveys can take different forms. They can be used to ask only
one question, or they can ask a series of questions. We can use
surveys to test out people’s opinions or to test a hypothesis.
Example:
1. Martha wants to construct a survey that shows which
sports students at her school like to play the most.
Step 1: GOAL
The goal of the survey is to find the answer to the question:
“Which sports do students at Martha’s school like to play the
most?”
Step 2: POPULATION
A sample of the population would include a random sample
of the student population in Martha’s school. A good
strategy would be to randomly select students (using dice or
a random number generator) as they walk into an all-school
assembly.
DESIGNING A SURVEY
Step 3: METHODS
Face-to-face interviews are a good choice in this case.
Interviews will be easy to conduct since the survey consists of
only one question which can be quickly answered and
recorded, and asking the question face to face will help
eliminate non-response bias.
Step 4: DATA
DESIGNING A SURVEY
Example:
1. Juan wants to construct a survey that shows how
many hours per week the average student at his school
works.
Step 1: GOAL
The goal of the survey is to find the answer to the question “How
many hours per week do you work?”
Step 2: POPULATION
Juan suspects that older students might work more hours per
week than younger students. He decides that a stratified sample
of the student population would be appropriate in this case. The
strata are grade levels 9th through 12th. He would need to find
out what proportion of the students in his school are in each
grade level, and then include the same proportions in his
sample.
DESIGNING A SURVEY
Step 3: METHODS
Face-to-face interviews are a good choice in this case since
the survey consists of two short questions which can be
quickly answered and recorded.
Step 4: DATA
THE BASIS OF CONDUCTING AN
EXPERIMENT
SAMPLE SPACE
The set of all possible outcomes or results of a statistical experiment is called the
sample space and is represented by the letter S.
ELEMENT
Each outcome in a sample space is called an element or a member of the sample
space.
EVENT
Is the subset of this sample space and it is represented by letter E.
Example 1:
Consider the experiment of tossing a dice. If we are interested in the number that
shows on the top face, the sample space would be
S = {1,2,3,4,5,6}
Example 2:
An experiment consists of flipping a coin and then flipping it a second time if a head
occurs. If a tail occurs on the first, flip, then a dice is tossed once. To list the elements of
the sample space providing the most information, we construct the tree diagram
S = {HH, HT, T1, T2, T3, T4, T5, T6}
PROBABILITY
2.1 SAMPLE SPACE AND RELATIONSHIPS AMONG EVENTS
Difference between Sample Space and Events
Sample space is set of all possible outcomes of an experiment and event is the subset
of a sample space.
PROBABILITY
2.1 SAMPLE SPACE AND RELATIONSHIPS AMONG EVENTS
Difference between Sample Space and Events
S = {1, 2, 3, 4, 5, 6}
1 = odd number E1 = {1, 3, 5}
2 = even number E2 = {2, 4, 6}
So we have,
{1, 3, 5} ∪ {2, 4, 6} = {1, 2, 3, 4, 5, 6}
𝑜𝑟 𝑆 = 𝐸1 ∪ 𝐸2
Hence, we can say union of Events E1 and E2 is S.
Null space – is a subset of the sample that contains no elements and is denoted by the
symbol Ø. It is also called empty space.
PROBABILITY
2.1 SAMPLE SPACE AND RELATIONSHIPS AMONG EVENTS
Operations with Events
Intersection of events
The intersection of two events A and B is denoted by the symbol A ∩ B. It is the event
containing all elements that are common to A and B.
For example,
Let A = {3, 6, 9, 12, 15} and B = {1, 3, 5, 8, 12, 15, 17} ; then
A ∩ B = {3, 12, 15}
Let X = {q, w, e, r, t} and Y = {a, s, d, f} ; then X ∩ Y = Ø,
since X and Y have no elements in common
PROBABILITY
2.1 SAMPLE SPACE AND RELATIONSHIPS AMONG EVENTS
Operations with Events
Union of events
The union of events A and B is the event containing all the elements that belong to A
or to B or to both and is denoted by the symbol A ∪ B. The elements A ∪ B maybe listed or
defined by the rule A ∪ B = 𝑥 𝑥 ∈ 𝐴 𝑜𝑟 𝑥 ∈ 𝐵
For example,
Let A = {a, e, i, o, u} and B = {b, c, d, e, f} ; then A ∪ B = {a, b, c, d, e, f, i, o, u}
Let X = {1, 2, 3, 4} and Y = {3, 4, 5, 6}; then X ∪ Y = {1, 2, 3, 4, 5, 6}
Complement of an Event
The complement of an event A with respect to S is the set of all elements of S that are not in
A and is denoted by A’.
For example,
Consider the sample space S = {dog, cow, bird, snake, pig}
Let A = {dog, bird, pig}; then A’ = {cow, snake}
PROBABILITY
2.1 SAMPLE SPACE AND RELATIONSHIPS AMONG EVENTS
Probability of an Event
𝑛(𝐸)
𝑃 𝐸 = Each probability must lie between 0 and 1 inclusive, and the sum of
𝑛(𝑆)
all probabilities assigned must be equal to 1. Therefore,
0 ≤ 𝑃 𝐸 ≤ 1 𝑎𝑛𝑑 𝑃 𝑆 = 1
If a die is tossed, the sample space is {1, 2, 3, 4, 5, 6}. In this set, we have a number of
elements equal to 6. Now, if the event is the set of odd numbers in a dice, then we have {1, 3, 5}
as an event. In this set, we have 3 elements. So, the probability of getting odd numbers in a
single throw of dice is given by
3 1
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = =
6 2
PROBABILITY
2.1 SAMPLE SPACE AND RELATIONSHIPS AMONG EVENTS
PROBABILITY
2.2 COUNTING RULES USEFUL IN PROBABILITY
Example 2:
The number of permutations of letters a, b, c, d.
3rd Rule: Permutation Rule
Given a single set of n distinctively different elements, you wish to select r elements from the n and arrange them
𝑛!
within r positions. The number of permutations of the n elements taken r at a time is 𝑛𝑃𝑟 = 𝑛−𝑟 !
Example 3:
In one year, three awards (research, teaching, and service) will be given for a class of 25 graduate students in a
statistics department. If each student can receive at most one award, how many possible selections are there?
Example 4:
A president and a treasurer are to be chosen from a student club consisting of 50 people. How many different
choices of officers are possible if
Example 5:
In a college football training session, the defensive coordinator needs to have 10 players
standing in a row. Among these 10 players, there are 1 freshman, 2 sophomores, 4 juniors, and
3 seniors, respectively. How many different ways can they be arranged in a row if only their
class level will be distinguished?
Example 6:
A young boy asks his mother to get five Game-Boy cartridges from his collection of 10 arcade
and 5 sports games. How many ways are there that his mother will get 3 arcade and 2 sports
games, respectively?
Example 1:
A student goes to the library. The probability that she checks out (a) a work of fiction is 0.40, (b) a
work of non-fiction is 0.30, and (c) both fiction and non-fiction is 0.20. What is the probability that the
student checks out a work of fiction, non-fiction, or both?
Solution:
Let F = the event that the student checks out fiction;
Let N = the event that the student checks out non-fiction.
Then, based on the rule of addition:
P(A ∪ B) = P(F) + P(N) – P(F ∩ N)
P(A ∪ B) = 0.4 + 0.3 – 0.2 = 0.5
Dependent – Two outcomes are said to be dependent if knowing that one of the outcomes has
occurred affects the probability that the other occurs.
Conditional Probability – an event B in relationship to an event A is the probability that event B
occurs after event A has already occurred. The probability is denoted by P(B|A).
Independent Occurrent: P(A|B) = P(A) ; P(B|A) = P(B)
Rule 2: When two events are dependent, the probability of both occurring is:
P(A ∩ B) = P(A) * P(B|A)
P(A ∩ B)
Where P(B|A) = , provided that P(A) ≠ 0
P(A)
Solution:
Let A = event that the first parts selected is defective
Let B = event that the second part selected is defective.
P(B|A) = ?
If the first part is defective, prior to selecting the second part, the batch contains 849
parts, of which 49 are defective, therefore
P(B|A) = 49/849
Solution:
Let A = the event that the first marble is black;
Let B = the event that the second marble is black.
• In the beginning, there are 10 marbles in the urn, 4 of which are black. Therefore,
P(A) = 4/10
• After the first selection, there are 9 marbles in the urn, 3 of which are black.
Therefore, P(B|A) = 3/9
P (A ∩ B) = P(A)* P(B|A)
4 3
P (A ∩ B) = = 0.133
10 9
Solution:
Let A = first card which is a queen
Let B = second card which is also a queen
P (A ∩ B) = P(A)* P(B|A)
4 3
P(A) = and P(B|A) =
52 51
4 3 1
P (A ∩ B) = 52 51
= 221 = 0.004525
P(A) = 1 – P(A’)
Example 1:
The probability of Bill not graduating in college is 0.8. What is the probability that Bill
will graduate from college?
Solution:
P(A) = 1 – 0.8 = 0.2