0% found this document useful (0 votes)
5 views

Revised - Engineering - Data Organization and Visualization - 2024

Uploaded by

Malack Chagwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Revised - Engineering - Data Organization and Visualization - 2024

Uploaded by

Malack Chagwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

PROBABILITY AND STATISTICS

DATA ORGANISATION AND


VISUALIZATION
DATA ORGANIZATION
FREQUENCY TABLES
Data Organization
• In this course data will be organized in tables called
frequency tables.
• A frequency table organizes data in table form, using
values of a variable and how often they appear in the
dataset.
Frequency tables for categorical variables
• These are tables that enable us to discover how many scores for a
particular variable were recorded.

• This table shows the count of non-missing values for each variable (Valid)
• This table shows the count of missing values for each variable (Missing)
• This table specifies the count of each
category in the variable. (frequency)
• It specifies the percentage of each
category in the variable (percent)
• It specifies the cumulative percent of
each category in the variable
(cumulative percent).
• Note that the counts and percentages
are in one decimal place. The entire
number can be seen after double-
clicking on the target cell.
• This frequency view of the variable
enables us to discern how many
categories are available for us to recode
Frequency tables for categorical variables
(Cont’d)
1. Which is the least seen fuel type in the dealership?
2. Which is the most seen fuel type in the dealership?
3. What total percentage of Diesel-fuelled and LPG-fuelled cars
are in the dealership?
4. How many CNG-fuelled cars are in the dealership?
5. What is the relative frequency of CNG-fuelled cars in the
dealership?
6. Construct a relative frequency column for the dataset
Frequency tables for grouped quantitative
data
• Frequency distributions for continuous data are
usually presented as grouped frequency table
• This is a frequency table that is used in an
instance where the range of data is large and
the data must be grouped into classes that are
more than one unit in width.
Steps: a grouped frequency distribution
1. Click on the menubar on 6. Enter a name for the
Transform new binned variable
2. Click on Visual Binning 7. Select if you want the
3. Click on the scale upper endpoint to be
variable(s) you would to included or excluded
bin
8. Click on Make Cutpoints
4. Move them to the
Variables to bin section
5. Click on continue
Steps: a grouped frequency distribution
•Step 9: Find the range
✓The range is the difference between the
largest value and the smallest value
•Step 10: Find the desired number of bins
✓Usually between 5 and 20, inclusive. Use the
𝑘
the 2 rule i.e. k is the number of bins where
𝑘
2 > 𝑛. 𝑛 is the number of observations/cases
Steps: a grouped frequency distribution (Cont’d)
• Step 11: Find the bin width
✓by dividing the range by the number of bins. Round the
answer up to the nearest whole number if there is a
remainder (Rounding up is different from rounding off. A
number is rounded up if there is any decimal remainder
when dividing. For example, 85 ÷ 6 = 14.167 and is
rounded up to 15. Also, 53 ÷ 4 = 13.25 and is rounded up
to 14.)
Steps: a grouped frequency distribution (Cont’d)
• Step 12: Selecting starting cut-off point
✓Select a starting cut-off point by adding the
smallest value to the width
Step 13: Enter the starting cut-off point
Step 14: Enter the width
Grouped frequency table
Grouped frequency table (Cont’d)
1. Which interval had the highest number for cars?
2. What total percentage of cars were made from 2001 to
2009?
3. How many cars were made before the year 2004?
4. Construct a cumulative frequency column for the
dataset.
Peaks and Outliers
•The peaks show which class or classes have the
most data values compared to the other
classes.
•Extreme values, called outliers, show unusually
large or small data values as compared to
other data values in the dataset.
DATA VISUALIZATION
PIE CHARTS AND BAR GRAPHS
Reasons for graphing
i. It is easier for most people to comprehend
the meaning of data presented graphically if
the users have little or no statistical
knowledge.
ii. Graphs are also useful in getting the
audience’s attention in a publication or a
speaking presentation.
Reasons for graphing (Cont’d)

iii. They can be used to discuss an issue and


reinforce a critical point.
iv. They can also be used to discover a
trend or pattern in a situation over a
period of time
Pie Charts
• A pie chart is a circle that is divided into sections or
wedges according to the percentage of frequencies in
each category of the distribution.
• The purpose of the pie graph is to show the relationship
of the parts to the whole by visually comparing the sizes
of the sections.
• Percentages or proportions can be used.
Pie Charts (Cont’d)
1.What percentage was taken up
by LPG-fuelled cars?
Bar Graphs

•A bar graph is a graphical device for depicting


categorical data summarized in a frequency,
relative frequency, or percent frequency
distribution
•A bar graph represents the data by using
vertical or horizontal bars whose heights or
lengths represent the frequencies of the data.
Types of bar charts

•Simple bar charts


•Multiple/compound/grouped bar chart
Simple bar chart
• Frequencies are plotted for one level of the
categorical variable.
Multiple/Compound/Grouped Bar Chart

•It is an extension of a simple bar graph


where frequencies are plotted for levels of
two categorical variables instead of one.
•It has a primary categorical variable and a
secondary categorical variable.
• A primary categorical variable dictates axis locations
for each bar cluster
• A secondary categorical variable dictates the number
of bars to plot in each cluster
• NOTE that the shading and positioning of the bars are
consistent within each cluster.
Multiple/Compound/Grouped Bar Chart
(Cont’d)
• Create a write-up explaining the visualization.
Multiple/Compound/Grouped Bar Chart
(Cont’d)
•Create a write-up explaining the visualization.
Pro-tip: Explore IELTS Questions and Answers on
graphs for great examples on interpretation
Histogram
•The histogram is a graph that displays the
data by using contiguous vertical bars (unless
the frequency of a class is 0) of various
heights to represent the frequencies of the
classes.
Histogram (Cont’d)
• Comment on the histogram
Thank You.

You might also like