0% found this document useful (0 votes)
20 views7 pages

EDA - Midterms - Reviewer

The document provides an overview of engineering data analysis, detailing concepts such as population, sample, variables, and types of data. It explains statistical methods including descriptive and inferential statistics, as well as various sampling methods and levels of measurement. Additionally, it covers data organization techniques, including frequency distribution and graphical representation of data.

Uploaded by

toyt647
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views7 pages

EDA - Midterms - Reviewer

The document provides an overview of engineering data analysis, detailing concepts such as population, sample, variables, and types of data. It explains statistical methods including descriptive and inferential statistics, as well as various sampling methods and levels of measurement. Additionally, it covers data organization techniques, including frequency distribution and graphical representation of data.

Uploaded by

toyt647
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Engineering Data Analysis – Engr. Joyce Chimicag 1.

Numerical Variable
Evangelista, MSCE
➢ Take on values with equal units such as
INTRODUCTION TO DATA weight in pounds and time in hours

Population 2. Categorical Variable

Is a collection of persons, things, or objects ➢ Place the person or thing into a category
under study
Examples of Variable and Data
Commonly referred to as the symbol N Variable
Sample ➢ Name, age, course, height
Is a portion taken from the population Data
Commonly referred to as the symbol n ➢ Marielle, 19 years old, civil engineering,
Sampling 4’11

Is the procedure to select a portion (or a MEAN AND PROPORTION


subset) of the larger population and study that Mean
portion (the sample) to gain information about
the population Average of the given data
Parameter Proportion
Is a number that is a property of the Ratio of the two given variables concerning
population one another.
Statistic STATISTICS AS DESCRIPTIVE AND INFERENTIAL
Is an estimate of a population parameter Descriptive Statistics
It is a number that represents a property of Numerical measures of Data
the sample
Deals with organizing and summarizing data.
There are two ways to summarize data:
1. By Graphing
2. By Using numbers in tabulation
Inferential Statistics
Methods of making decisions or predictions
VARIABLES AND DATA about a population based on sampled data.

Data Deals with concluding “good” data.

The actual values of the variable Uses probability to determine how confident
we can be that our conclusions are correct.
They may be numbers, or they may be words
Probability
Datum is a single value
Mathematical tool used to study randomness
Variable
Deals with the chance (the likelihood) of an
Is a characteristic of interest for each person event to happen
or thing in a population. Represented by capital
letters such as X and Y Quantitative description of chances
associated with various outcomes.
It is a characteristic that changes or varies
over time
There are two types:
Probability Calculations 2.2. Bar Graph
Used in statistics to analyze and interpret data ❖ The length of the bar for each category
is proportional to the number or
It proves a bridge between descriptive and percent of individuals in each category.
inferential statistics
❖ Bars may be vertical or Horizontal.
Types of Data
1. Qualitative (Categorical)
➢ Results of categorizing or describing POPULATION VS SAMPLE
attributes of a population. Population Sample
➢ It measures the quality or characteristics µ = mean ̄ x = mean
of each experimental unit. σ = standard s = standard
deviation deviation
➢ They are generally described by words or
letters.
Census
2. Quantitative (Numerical)
Information/data gathered from every member
➢ Are always numbers of the population
➢ They are the result of counting or Data
measuring attributes of a population.
Information from the sample of the population
2.1. Quantitative Discrete
❖ Are the results of counting Sampling plan/method
❖ Ex. Number of laborers, Number of Selecting the group where the researchers will
cars collect data from.
2.2. Quantitative Continuous Sampling Methods
❖ Are the results of measuring 1. Non-Probability
❖ Ex. Time, Amount of Gas ➢ Individuals of the population are not given
an equal opportunity to become a part of
Organizing and Displaying Data
the sample.
1. Statistical Table
1.1. Convenience Sampling
➢ Use a data distribution to describe:
❖ Choosing samples based on easy or
❖ What Values have been measured convenient access

❖ How often each value has occurred 1.2. Quota Sampling


(frequency, relative frequency,
❖ Choosing samples to fill a specific
percentage)
quota.
2. Graphs
❖ They are chosen according to traits or
➢ Are more helpful in understanding the qualities.
data.
1.3. Judgmental Sampling
➢ There are no strict rules concerning which
❖ Called purposive sampling or
graphs to use.
authoritative sampling
2.1. Pie Chart
❖ Sample members are chosen only
❖ Categories of data are represented in a based on the researcher’s knowledge
circle and are proportional in size to the and judgment
percentage of individuals in each
category.
1.4. Snowball Sampling Disadvantages
❖ The sample chosen provides referrals ❖ Time-Consuming
to recruit samples for a research study.
❖ Expensive
2. Probability
❖ Can be biased upon the attitude or the
➢ Members of the population are given an appearance of the surveyor
equal chance to be a part of the sample.
2. Self-Administered Surveys
2.1. Simple Random Sampling (SRS)
Advantages
❖ Choosing a representative by rolling a
die for instance or using a number ❖ Respondent can complete in his or her
generator. free time

2.2. Systematic Sampling ❖ Less expensive than face-to-face


interviews
❖ Choosing a representative using a
regular interval, ❖ Anonymity causes more honest results

❖ say every “r-th” individual to be a part of Disadvantages


the study. ❖ Lower response rates
2.3. Cluster Sampling Designing a Survey
❖ Ideal for extremely large populations 1. Determine the Goal of your survey
and/or populations distributed over a
large geographic area. 2. Identify the sample population

2.4. Stratified Sampling 3. Choose an interview method

❖ Choosing members of the sample 4. Decide what questions you will ask
when there are clearly defined 5. Conduct the interview
subgroups in the population
# 𝑜𝑓 𝑚𝑒𝑚𝑏𝑒𝑟𝑠 𝑖𝑛 𝑒𝑎𝑐ℎ 𝑠𝑡𝑟𝑎𝑡𝑎
# 𝑜𝑓 𝑚𝑒𝑚𝑏𝑒𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑡𝑟𝑎𝑡𝑢𝑚 LEVELS OF MEASUREMENT
= ∗ (𝑛)
𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 1. Nominal Scale Level
➢ Qualitative/ Categorical
3 Features to keep in MIND while constructing a ➢ Names, colors, LABELS
sample
➢ Order does not matter
1. Consistency
➢ Cannot perform mathematical
2. Diversity operations, frequencies and proportions
can be applied
3. Transparency
➢ Can be represented through a bar or pie
Conducting a Survey
chart
1. Face-To-Face interviews
2. Ordinal Scale Level
Advantages
➢ Ranking
❖ Fewer misunderstood questions
➢ The Order Matters
❖ High response rates
➢ Differences cannot be measured (not
❖ Additional information can be equal intervals)
collected from the respondents
➢ Can perform mathematical operations
3. Interval Scale Level GROUPED DATA AND UNGROUPED DATA
➢ The order matters Collection of Data
➢ Differences can be measured First step in the field of research
➢ No true “zero” or starting point Presentation of Data
➢ Can perform math operations: mean, To look for ways to condense and arrange the
median, and SD. data and to study their characteristics.
➢ Line graph, bar chart, and histogram Ungrouped Data
4. Ratio Scale Level Data in its original form
➢ Order matters Raw data
➢ Differences are measurable (including List of numbers that do not convey anything
ratios)
No summarization or aggregation
➢ Contains a “0” starting point
Grouped Data
➢ Can perform math operations: mean,
median, and SD. Data that is bundled together in different
classes or categories
➢ Line graphs, box plots, bar charts, and
histogram
In ungrouped data
Data Nominal Ordinal Interval Ratio
Given the data:
3, 2, 4, 5, 6, 8, 2, 5, 8, 7, 9, 8, 8, 8, 11, 10, 12, 11, 9
Labeled Yes Yes Yes Yes
Determine the following:
Ordered No Yes Yes Yes a. Frequency table

Measurable b. Relative frequency


No No Yes Yes
Differences
c. Cumulative frequency
Zero is
N/A Yes Yes No d. Cumulative relative frequency
Arbitrary
Table for Frequency and Cumulative Frequency
Distribution
Nominal Ordinal Interval Ratio
X f ≤CFD ≥CFD
Mode Yes Yes Yes Yes 2 2 2 19
Median No Yes Yes Yes 3 1 3 17
4 1 4 16
Mean No No Yes Yes
5 2 6 15
Frequency 6 1 7 13
Yes Yes Yes Yes
Distribution 7 1 8 12
Range No Yes Yes Yes 8 5 13 11
Add & 9 2 15 6
No No Yes Yes
Subtract 10 1 16 4
Multiply 11 2 18 3
No No No Yes
and Divide 12 1 19 1
Standard
No No Yes Yes
Deviation
Table for relative frequency and cumulative relative a. Xmin = 10.3 ; Xmax = 97.6
frequency
b. K = 5
X f Rf Cu RF 97.6−10.3
2 2 2/19 2/19 c. 𝑐 = = 17.46 = 17.5
5
3 1 1/19 3/19
4 1 1/19 4/19
5 2 2/19 6/19
6 1 1/19 7/19
7 1 1/19 8/19
8 5 5/13 13/19
9 2 2/19 15/19 Class Interval F
10 1 1/19 16/19 Lower limit Upper limit
11 2 1/19 18/19 10 27.4 1
12 1 1/19 19/19 or 1 27.5 44.9 5
19 1 45 62.4 8
62.5 79.9 12
80 97.4 3
In Grouped Data
97.5 114.9 1
Steps in constructing a frequency distribution: 30
1. Determine the largest and smallest value in the
data Class Boundaries
Are the true class limit. This is necessary so
2. Determine the number of class intervals (k)
that no values can be observed exactly on a
Recommended k values from Juran and Gyrna: boundary.
𝑈𝐿𝑖 + 𝐿𝐿𝑖 + 1
Recommended 𝑈𝐶𝐵𝑖 = ; 𝑤ℎ𝑒𝑟𝑒 𝑡ℎ𝑒 𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠
Number of observations, 2
number of classes,
n
k Class
20 – 50 5 or 6 Class Interval F
Boundary
51 – 100 7 Lower Upper
LCB UCB
101 – 200 8 limit limit
201 – 500 9 10 27.4 1 9.95 27.45
501 – 1000 10 27.5 44.9 5 27.45 44.95
Over 1000 11 - 20 45 62.4 8 44.95 62.45
62.5 79.9 12 62.45 79.95
80 97.4 3 79.95 97.45
Sturges offers a mathematical formula:
97.5 114.9 1 97.45 114.95
𝑘 = 1 + 3.222(log 𝑛) 𝑜𝑟 𝑘 = √𝑛
3. Determine the approximate class size (c), class Xi is the midpoint of each interval
size is also known as bin size or class width 𝐿𝐿 = 𝑈𝐿 𝐿𝐵 + 𝑈𝑃
𝑋𝑖 = =
𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛 2 2
𝑐=
𝑘
Class Xi
Class Interval
4. Determine the lower and upper limits of the Boundary
class interval F
Lower Upper
LCB UCB
limit limit
5. Write down the class intervals starting with the
10 27.4 1 9.95 27.45 18.7
decided lower- and upper-class limits of the
first-class interval. Add the class size to the 27.5 44.9 5 27.45 44.95 36.2
lower- and upper-class limits to obtain the next 45 62.4 8 44.95 62.45 53.7
class interval and so on. 62.5 79.9 12 62.45 79.95 71.2
80 97.4 3 79.95 97.45 88.7
6. Determine the number of observations falling 97.5 114.9 1 97.45 114.95 106.2
under each class interval. Find the class 30
frequency.
STATISTICAL MEASURES 2. Median
Measures of the Center of the Data  Divide the data into two equal
parts.
 Measure of Central Tendency or Measure
of Central Location  It corresponds to the value of the
middle item when the data are
 A single number that gives a summary of
arranged in an increasing or
the characteristics of a given set of data.
decreasing order of magnitude.
 The most common measures of central
 Characteristics:
tendencies are:
a. It always exists
1. Mean (Average)
b. It is unique
 Average of the measurements
c. It is not easily affected by
 Described as the center of gravity
extreme values.
 Characteristics:
 For a sample (n): that is arranged in
a. It always exists. It can be increasing order of magnitude.
calculated for any set of
𝑇ℎ𝑒 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑑𝑖𝑎𝑛 𝑖𝑠
numerical data. 𝑛+1
=
b. It is unique. A set of numerical 2
data has only one mean. 3. Mode
c. It can be combined with other  Occurs at the most frequently
data sets. It leads itself to further observed value of the variable.
statistical manipulation.
 The value that occurs with the
d. It is reliable for inference- highest frequency.
making. The mean of many
samples drawn from the same  Characteristics:
population generally does not a. It requires no calculations
vary or fluctuate.
b. It applies to both qualitative and
e. It considers every data point. It quantitative data.
may be affected by extreme
values. c. It may not exist. It happens if all
the values are observed with
Σ𝑋𝑖 𝑓𝑖 the same frequency.
𝑥̄ = (𝑓𝑜𝑟 𝑎 𝑠𝑎𝑚𝑝𝑙𝑒)
𝑛
d. If it exists, it may not be unique.
Σ𝑋𝑖
𝜇= (𝑓𝑜𝑟 𝑎 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛)
𝑛
 The Law of Large Numbers and the Measures of Spread
Mean
1. Range
 If you take samples of larger and
larger size from any population, the  It is the spread of data from the lowest
mean ̄ x of the sample is very likely to the highest value.
to get closer to the population  Simplest measure of variability.
mean 𝜇.
𝑅𝑎𝑛𝑔𝑒 = 𝑀𝑎𝑥 − 𝑀𝑖𝑛
2. Average Deviation Measures of Position
 Provides the average of different 1. Quartiles
variations from a data set.
 Divides the data into 4 equal parts.
 To measure the distance of a deviation
2. Percentiles
from the data set’s mean or median.
 Divides the data into 100 equal
Σ|𝑋𝑖 − x̄ |
𝐴𝑣𝑒. 𝐷𝑒𝑣. = parts.
𝑛
3. Deciles
 Divides the data into 10 equal
parts.
3. Variance
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
 Measurement of the spread between 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 (𝑃) = (𝑛 + 1)
100
numbers in a data set.
Measures of Symmetry
 How far is each number in the set
1. Skewness
from the mean?
 Measure of the asymmetry of a
Σ(𝑋𝑖 − x̄ )2
2
𝑆 = distribution.
𝑛
Measures of Kurtosis
 Degree of peakedness of unimodal
4. Standard Deviation
distribution
 Measure the amount of variation or
dispersion of a set of values.
 A low standard deviation indicates
that the values tend to be close to the
mean of the set.
 A high standard deviation indicates
that the values spread over a wider
range.

Σ(𝑋𝑖 − x̄ )2
𝑆= √
𝑛

Given the situation of Brand A, the mean is 30.

̄ ) (X -
X (X-X |X - ̄X| ̄ )2 √(X − X̄)2
X
27 -3 3 9 3
29 -1 1 1 1
30 0 0 0 0
31 1 1 1 1
33 3 3 9 3

You might also like