Revised - Engineering - Data Organization and Visualization - 2024
Revised - Engineering - Data Organization and Visualization - 2024
• This table shows the count of non-missing values for each variable (Valid)
• This table shows the count of missing values for each variable (Missing)
• This table specifies the count of each
category in the variable. (frequency)
• It specifies the percentage of each
category in the variable (percent)
• It specifies the cumulative percent of
each category in the variable
(cumulative percent).
• Note that the counts and percentages
are in one decimal place. The entire
number can be seen after double-
clicking on the target cell.
• This frequency view of the variable
enables us to discern how many
categories are available for us to recode
Frequency tables for categorical variables
(Cont’d)
1. Which is the least seen fuel type in the dealership?
2. Which is the most seen fuel type in the dealership?
3. What total percentage of Diesel-fuelled and LPG-fuelled cars
are in the dealership?
4. How many CNG-fuelled cars are in the dealership?
5. What is the relative frequency of CNG-fuelled cars in the
dealership?
6. Construct a relative frequency column for the dataset
Frequency tables for grouped quantitative
data
• Frequency distributions for continuous data are
usually presented as grouped frequency table
• This is a frequency table that is used in an
instance where the range of data is large and
the data must be grouped into classes that are
more than one unit in width.
Steps: a grouped frequency distribution
1. Click on the menubar on 6. Enter a name for the
Transform new binned variable
2. Click on Visual Binning 7. Select if you want the
3. Click on the scale upper endpoint to be
variable(s) you would to included or excluded
bin
8. Click on Make Cutpoints
4. Move them to the
Variables to bin section
5. Click on continue
Steps: a grouped frequency distribution
•Step 9: Find the range
✓The range is the difference between the
largest value and the smallest value
•Step 10: Find the desired number of bins
✓Usually between 5 and 20, inclusive. Use the
𝑘
the 2 rule i.e. k is the number of bins where
𝑘
2 > 𝑛. 𝑛 is the number of observations/cases
Steps: a grouped frequency distribution (Cont’d)
• Step 11: Find the bin width
✓by dividing the range by the number of bins. Round the
answer up to the nearest whole number if there is a
remainder (Rounding up is different from rounding off. A
number is rounded up if there is any decimal remainder
when dividing. For example, 85 ÷ 6 = 14.167 and is
rounded up to 15. Also, 53 ÷ 4 = 13.25 and is rounded up
to 14.)
Steps: a grouped frequency distribution (Cont’d)
• Step 12: Selecting starting cut-off point
✓Select a starting cut-off point by adding the
smallest value to the width
Step 13: Enter the starting cut-off point
Step 14: Enter the width
Grouped frequency table
Grouped frequency table (Cont’d)
1. Which interval had the highest number for cars?
2. What total percentage of cars were made from 2001 to
2009?
3. How many cars were made before the year 2004?
4. Construct a cumulative frequency column for the
dataset.
Peaks and Outliers
•The peaks show which class or classes have the
most data values compared to the other
classes.
•Extreme values, called outliers, show unusually
large or small data values as compared to
other data values in the dataset.
DATA VISUALIZATION
PIE CHARTS AND BAR GRAPHS
Reasons for graphing
i. It is easier for most people to comprehend
the meaning of data presented graphically if
the users have little or no statistical
knowledge.
ii. Graphs are also useful in getting the
audience’s attention in a publication or a
speaking presentation.
Reasons for graphing (Cont’d)