0% found this document useful (0 votes)
21 views

02 Exploratory Data Analysis

Statistics

Uploaded by

sofiazanders4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

02 Exploratory Data Analysis

Statistics

Uploaded by

sofiazanders4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Exploratory Data Analysis in Excel

Zeren Lucky L. Cabanayan


STAT2100 – Statistical Analysis with Software Application
1st Semester, 2024-2025

CENTRAL LUZON STATE UNIVERSITY


DEPARTMENT of
STATISTICS Learning Outcomes
After completing this chapter, the student must be able to
• Create qualitative frequency distribution table in Excel
• Create a bar chart in Excel
• Create a pie chart in Excel
• Create other chart for qualitative data in Excel
• Compute skillfully the different measures of location in Excel.
• Compute skillfully the different measures of central tendency in Excel.
• Compute skillfully the different measures of variability in Excel.
• Create a line chart in Excel.
• Create a histogram in Excel.
• Create a boxplot in Excel.
Exploratory Data Analysis in Excel | 2
DEPARTMENT of
STATISTICS Different techniques of presenting qualitative dataset
Qualitative variable/dataset - outcomes of the variables are expressed non-
numerically or categorically

Examples:
sex or gender, religion, cellphone number, eye color, marital status, etc.

Qualitative dataset can be presented through


• Qualitative frequency distribution table
• Graphical presentations such as
o Bar graph
o Pie graph

Exploratory Data Analysis in Excel | 3


DEPARTMENT of
STATISTICS Different techniques of presenting qualitative dataset

Frequency Distribution Table


➢ a tabular summary of data showing the number (frequency) of items in
each of several nonoverlapping classes
➢ organization of data in tabular form, using classes (or intervals) and
frequencies

frequency – number of times the value occurs in the data set

Advantages
o Easier to detect the smallest and largest value
o Easier to find the measure of position and frequency

Exploratory Data Analysis in Excel | 4


DEPARTMENT of
STATISTICS Different techniques of presenting qualitative dataset
Qualitative FDT – represent data that can be placed in specific categories,
such as gender, hair color, or religious affiliation
Example 1:
The year level of 24 randomly selected students are given below.
Summarize the data using FDT Table 1. FDT of Year Levels of 24 students
2nd year 3rd year 4th year 2nd year Year Level Frequency
3rd year 4th year 4th year 4th year 1st year 8
2nd year 1st year 1st year 2nd year 2nd year 6
1st year 1st year 1st year 4th year 3rd year 5
3rd year 2nd year 2nd year 1st year 4th year 5
3rd year 1st year 3rd year 1st year Total 24
Exploratory Data Analysis in Excel | 5
DEPARTMENT of
STATISTICS

Example 1:
The year level of 24 randomly selected
students are given in the previous
example coded as

1 = 1st year
2 = 2nd year
3 = 3rd year
4 = 4th year

Step 1: Set-up the data in Excel

Exploratory Data Analysis in Excel | 6


DEPARTMENT of
STATISTICS

Step 2: Create a frequency table


Insert ⟶ Table ⟶ Create Table ⟶ OK

Input the range


where you want
to put your table.
Note: consider
the number of
columns and row
you need in your
table

Exploratory Data Analysis in Excel | 7


DEPARTMENT of
STATISTICS

Step 3: Customize the


frequency table.

Exploratory Data Analysis in Excel | 8


DEPARTMENT of
STATISTICS OPTION 1

Step 4: Compute the frequency using the function


=COUNTIF(range, criteria)⟶ input the range ⟶input the criteria ⟶ enter
Note: repeat the process up to the last category

Range of the data

The Criteria
for 1st year

=COUNTIF(B2:B25,1)
=COUNTIF(B2:B25,2)
=COUNTIF(B2:B25,3)
=COUNTIF(B2:B25,4)

Exploratory Data Analysis in Excel | 9


DEPARTMENT of
STATISTICS OPTION 2

Step 4: Compute the frequency using the function


=COUNTIF(range, criteria)⟶ input the range ⟶input the criteria ⟶ enter
Note: repeat the process up to the last category

Range of the data

The Criteria
for 1st year

Exploratory Data Analysis in Excel | 10


DEPARTMENT of
STATISTICS OPTION 2

Step 4: Compute the frequency using the function


=COUNTIF(range, criteria)⟶ input the range ⟶input the criteria ⟶ enter
Note: repeat the process up to the last category

Range of the data

The Criteria
for 1st year

Note: If the values are non-numeric, use “ ”


enclosing the criteria that you want to count.

Exploratory Data Analysis in Excel | 11


DEPARTMENT of
STATISTICS
OPTION 1
The final output:
=COUNTIF(B2:B25,1)
=COUNTIF(B2:B25,2)
=COUNTIF(B2:B25,3)
=COUNTIF(B2:B25,4)
=SUM(G5:G9)

OPTION 2

=COUNTIF(B2:B25,”1st year”)

=COUNTIF(B2:B25, ”2nd year”)

=COUNTIF(B2:B25,”3rd year”)

=COUNTIF(B2:B25,4)
=SUM(G5:G9)
Note: The range on the formula depend on how you encoded you data. In my
case, the data were in B2:B25 and the frequencies were found in G5:G9

Exploratory Data Analysis in Excel | 12


DEPARTMENT of
STATISTICS

How to get the relative frequency


or percentage:

Note: The formula in finding the


𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑅𝐹 = × 100
𝑡𝑜𝑡𝑎𝑙

Step 1: Insert new column from your


frequency table.
Right click on the mouse or laptop touchpad ⟶ insert
⟶ table column to the left or right

Exploratory Data Analysis in Excel | 13


DEPARTMENT of
STATISTICS

Step 2: Apply the formula for relative frequency in Excel.

Note:
• The formula in finding the
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑅𝐹 = × 100
𝑡𝑜𝑡𝑎𝑙
• In excel: / = division
* = multiplication

Exploratory Data Analysis in Excel | 14


DEPARTMENT of
STATISTICS to increase or
decrease the
decimal places

Exploratory Data Analysis in Excel | 15


DEPARTMENT of
STATISTICS

Step 3: The final output will be

=(G5/G9)*100
=(G6/G9)*100

=(G7/G9)*100
=(G8/G9)*100
=SUM(H5:H9)

Note: The range on the formula depend on how you encoded you data.
In my case, the data were in G5:G9 and the frequencies were found in H5:H9

Exploratory Data Analysis in Excel | 16


DEPARTMENT of
STATISTICS Qualitative Frequency Distribution Table

Note: If the values are non-


numeric, use ” “ enclosing the
criteria that you want to count

It varies depending on the


Excel version, some
version uses semicolon (;)
and some uses comma (,)

Exploratory Data Analysis in Excel | 17


DEPARTMENT of
STATISTICS Qualitative Frequency Distribution Table
Example 2:

Note: The formula in finding the


𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑅𝐹 = 𝑋100
𝑡𝑜𝑡𝑎𝑙

In Excel: / = division
* = multiplication

Exploratory Data Analysis in Excel | 18


DEPARTMENT of
STATISTICS Qualitative Frequency Distribution Table

to increase or decrease the


number of decimal places

Exploratory Data Analysis in Excel | 19


DEPARTMENT of
STATISTICS Graphical Presentation for Qualitative Data Set

1. Bar Graph - consists of a series of rectangular bars where the length


of the bar represents the quantity or frequency for each category if
the bars are arranged horizontally

2. Pie Graph - a circular graph that is useful in showing how a total


quantity is distributed among a group of categories

Exploratory Data Analysis in Excel | 20


DEPARTMENT of
STATISTICS Graphical Presentation

Step 1: Highlight the cells that contain the data you want to use in the chart
Step 2: Click the insert tab on the ribbon
Step 3: Select the type of chart you want to create
to see the chart definition, point the
cursor pointer of each chart icons

Exploratory Data Analysis in Excel | 21


DEPARTMENT of
STATISTICS

Step 4: You'll see many options when you select this button, such as 2-D columns and 3-D
columns, as well as 2-D and 3-D bars. For these purposes, we're selecting 2-D columns.

to see a
preview of the
bar graph,
point the
cursor pointer
on the chart
icons

Exploratory Data Analysis in Excel | 22


DEPARTMENT of
STATISTICS
Vertical Bar Graph

Click the chart tools


to edit the design
and format of the
chart

to edit the
chart elements,
styles, and
filters

Exploratory Data Analysis in Excel | 23


DEPARTMENT of
STATISTICS Horizontal Bar Graph

to see a preview of the


bar graph, point the
cursor pointer on the
chart icons

Exploratory Data Analysis in Excel | 24


DEPARTMENT of
STATISTICS
Horizontal Bar Graph

Exploratory Data Analysis in Excel | 25


DEPARTMENT of
STATISTICS Pie Chart
Step 1: Highlight the cells that contain the data you want to use in the chart. In
.
this graph, instead of using frequency, relative frequency must be used.
Step 2: Click the insert
tab on the ribbon

Step 3: Select the pie


chart you want to
create

Exploratory Data Analysis in Excel | 26


DEPARTMENT of
STATISTICS Pie Chart
.

Step 4: The chart will appear.


Customize bar chart through
”chart design”, “format”, “Quick
Layout”.
Exploratory Data Analysis in Excel | 27
DEPARTMENT of
STATISTICS Other charts in Excel: Doughnut

Exploratory Data Analysis in Excel | 28


DEPARTMENT of
STATISTICS Other charts in Excel: Tree map

Exploratory Data Analysis in Excel | 29


DEPARTMENT of
STATISTICS Different techniques of presenting quantitative dataset

• Quantitative variable/dataset - outcomes of the variables are expressed


numerically that are meaningful or indicate some sort of amount

Examples:
age, allowance, number of classrooms, weight, height, etc.

• Qualitative dataset can be presented through


a. Numerical summary measures
b. Graphical presentations such as
1. Histogram
2. Box plot

Exploratory Data Analysis in Excel | 30


DEPARTMENT of
STATISTICS Measures of Location
1. Percentile
• Percentiles are one type of quantiles—or fractiles—which partition data
into groups with roughly the same number of values in each group.
• Percentiles are measures of location, denoted 𝑃1 , 𝑃2 , … , 𝑃99 which
divide a set of data into 100 groups with about 1% of the values in each
group.
• The process of finding the percentile that corresponds to a particular
data value x is given by the following:

number of values less than 𝑥


percentile of value 𝑥 = × 100
total number of values

Exploratory Data Analysis in Excel | 31


DEPARTMENT of
STATISTICS Measures of Location
2. Quartiles
• Quartiles are measures of location, denoted by 𝑄1 , 𝑄2 , 𝑄3 which divide a
set of data into four groups with about 25% of the values in each group.
𝐐𝟏 (First quartile): Separates the bottom 25% of the sorted values from the top
75%. (To be more precise, at least 25% of the sorted values are
less than or equal to and at least 75% of the values are greater
than or equal to )
𝐐𝟐 (Second quartile): Same as the median; separates the bottom 50% of the sorted
values from the top 50%.
𝐐𝟑 (Third quartile): Separates the bottom 75% of the sorted values from the top
25%. (To be more precise, at least 75% of the sorted values are
less than or equal to and at least 25% of the values are greater
than or equal to )
Exploratory Data Analysis in Excel | 32
DEPARTMENT of
STATISTICS

Example: The following data shows the total sales per person

Total sales per person


189.05 539.40 299.85 131.34 9.03
999.50 449.10 479.04 479.04 151.24
179.64 57.71 86.43 68.37 1139.43
539.73 1619.19 1183.26 719.20 18.06
167.44 174.65 413.54 625.00 54.89
299.40 250.00 1305.00 309.38 1879.06
149.25 255.84 19.96 686.95 139.72
449.10 251.72 139.93 1005.90
63.68 575.36 249.50 825.00
Exploratory Data Analysis in Excel | 33
DEPARTMENT of
STATISTICS

First, put the data into Array for or arrange from smallest to highest value.
Highlight the data ⟶ Sort & Filter ⟶ Sort Smallest to Largest

Exploratory Data Analysis in Excel | 34


DEPARTMENT of
STATISTICS

Example 1: Computation of Percentile


in excel
=PERCENTILE(array, k)

Where
array is the sorted data from lowest to highest
k is the percentile in decimal being used

The output will be

Exploratory Data Analysis in Excel | 35


DEPARTMENT of
STATISTICS

Example 2: Computation of Percentile


in excel
=PERCENTILE(array, k)

Where
array is the sorted data from lowest to highest
k is the percentile in decimal being used

The output will be

Exploratory Data Analysis in Excel | 36


DEPARTMENT of
STATISTICS

Example 3: Computation of Quartile in


excel
=QUARTILE(array, quart)

Where
array is the sorted data from lowest to highest
quart is the quartile being used

The output will be

Exploratory Data Analysis in Excel | 37


DEPARTMENT of
STATISTICS

Example 4: Computation of Quartile in


excel
=QUARTILE(array, quart)

Where
array is the sorted data from lowest to highest
quart is the quartile being used

The output will be

Exploratory Data Analysis in Excel | 38


DEPARTMENT of
STATISTICS

Measures of Central Tendency


• A number that is meant to convey the idea of ‘centralness’ for the data set
• A value about which the set of observations tend to cluster
• Typical/average value of the data set

3 measures of central tendency:


1. Mean
2. Median
3. Mode

Exploratory Data Analysis in Excel | 39


DEPARTMENT of
STATISTICS Measures of Central Tendency

1. Mean
- arithmetic average obtained by adding up all the data values and dividing
by the total number of observations

Population mean
σ 𝑥𝑖 where:
𝜇= 𝑥𝑖 = value at ith observation
𝑁
Sample mean
N = number of observations in population
σ 𝑋𝑖
𝑥ҧ =
𝑛 n = number of observations in sample

Exploratory Data Analysis in Excel | 40


DEPARTMENT of
STATISTICS

Example: Computation of mean in


excel (using the mean formula)

=SUM(number1, [number2], ...)/


COUNT(value1, [value2], ...)

where (number1, [number2], ...) and


(value1, [value2], ...) are the range of data

The output will be

Exploratory Data Analysis in Excel | 41


DEPARTMENT of
STATISTICS

Example: Computation of mean in excel


(using the average function)

=AVERAGE(number1, [number2], ...)

where (number1, [number2], ...)


is the range of data

The output will be

Exploratory Data Analysis in Excel | 42


DEPARTMENT of
STATISTICS Measures of Central Tendency

2. Median
- denoted by 𝑋෨ or Md
- value that divides an array of observations into two equal parts, so that
half of the cases are above it and half below it
- middle value, or average middle value in an array of observations

In symbols;
✓Check first if the data is in array
𝑿𝒏+𝟏 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
𝟐
෨ ൞𝑿𝒏 + 𝑿𝒏+𝟐
𝑋=
𝟐 𝟐
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
𝟐

Exploratory Data Analysis in Excel | 43


DEPARTMENT of
STATISTICS

Example: Computation of median in excel

=MEDIAN(number1, [number2], ...)

where (number1, [number2], ...)


is the range of data

The output will be

Exploratory Data Analysis in Excel | 44


DEPARTMENT of
STATISTICS Measures of Central Tendency

3. Mode
- value(quantitative) or category(qualitative) with the largest frequency (or
percentage) in the distribution
- Denoted by 𝑋෠ or Mo
- Locates the point where the observation values occur with the greatest density
- Generally a less popular measure than the mean or the median
- Determined by counting the frequency of each value and finding the value with
the highest frequency of occurrence

Example: 2, 5, 2, 3, 5, 2, 1, 4, 2, 2, 2, 1, 2, 2, 2, 3, 2, 2, 2, 2
Answer: To find the mode, find the frequency for each observation.
Therefore, the mode is the observation with the highest frequency which is 2.

Exploratory Data Analysis in Excel | 45


DEPARTMENT of
STATISTICS

Example: Computation of mode in excel

=MODE(number1, [number2], ...)

where (number1, [number2], ...)


is the range of data

The output will be

Exploratory Data Analysis in Excel | 46


DEPARTMENT of
STATISTICS

Example: Computation of mode in excel

=MODE(number1, [number2], ...)

where (number1, [number2], ...)


is the range of data

The output will be

Exploratory Data Analysis in Excel | 47


DEPARTMENT of
STATISTICS Measures of Variability
• It indicates how observations in a data set are scattered about an average
• A measures of variability is a
quantity that measures the
spread or variability of the
observation in a given
population.
Common measures of variability:
1. Range
2. Variance
3. Standard deviation
4. Standard error of the mean
5. Coefficient of variation
Exploratory Data Analysis in Excel | 48
DEPARTMENT of
STATISTICS Measures of Variability

1. Range
– measures how far the highest value is from the lowest value
– a rough measure of dispersion
– difference between the highest value (HV) and the lowest value (LV) in the
population
– it uses only the extreme values
– it fails to communicate any information about the clustering or the lack of
clustering of the values between the extremes
– a weakness is that an outlier can greatly alter its value

𝐑 = HV – LV = max– min

Exploratory Data Analysis in Excel | 49


DEPARTMENT of
STATISTICS

Example: Computation of range in excel

Minimum: =MODE(number1, [number2], ...)

Minimum: =MODE(number1, [number2], ...)

Range = (minimum-maximum)

The output will be

Exploratory Data Analysis in Excel | 50


DEPARTMENT of
STATISTICS Measures of Variability
2. Variance
– the average squared difference of the observations from the mean
– comes in the square of the unit of measure of the given set of values
Population Sample Characteristics of the Variance
σ 𝑋 𝑖 − ത
𝑋 2 • Always non-negative
2
𝑠 = • A large variance corresponds to
𝑛−1 a highly dispersed set of values
σ 𝑋𝑖 − ത
𝑋 2 or
𝜎2 = 2 • Easy to manipulate for further
𝑁 ( σ 𝑋 )
σ 𝑋𝑖2 − 𝑖 mathematical computations
2
𝑠 = 𝑛 • Make use of all the
𝑛−1 observations in the data
Note: Denominator is • Comes in a unit that is the
n if n ≥ 30 and n-1 if n < 30 squares of the unit in the data

Exploratory Data Analysis in Excel | 51


DEPARTMENT of
STATISTICS

Example: Computation of variance in excel

Population Variance:
=VAR.P(number1, [number2], ...)

Sample Variance:
=VAR.S(number1, [number2], ...)

The output will be

Exploratory Data Analysis in Excel | 52


DEPARTMENT of
STATISTICS Measures of Variability
3. Standard Deviation
- the positive square root of variance
- the standard deviation has the same units of measurement (such as minutes
or grams or dollars) as the original data values.
- the average deviation between the individual scores in the distribution and
the mean for the distribution; square root of the variance
- Values close together have a small standard deviation, but values with
much more variation have a larger standard deviation.
- It is affected by the value of every observation. It may be distorted by few
extreme values
Population Sample
𝜎= 𝜎2 𝑠= 𝑠2

Exploratory Data Analysis in Excel | 53


DEPARTMENT of
STATISTICS

Example: Computation of standard deviation


in excel
Population Standard Deviation:
=STDEV.P(number1, [number2], ...)

Sample Standard Deviation:


=STDEV.S(number1, [number2], ...)

The output will be

Exploratory Data Analysis in Excel | 54


DEPARTMENT of
STATISTICS Measures of Variability
4. Standard Error of the mean
– a measure of statistical accuracy of an estimate
– indication of reliability of the mean
– a small SE is an indication that the sample mean is more accurate reflection
of the true mean
– standard deviation of the sampling distribution of the mean

Population Sample Where: 𝜎 population standard deviation


𝜎 𝑠 N population size
𝜎𝜇 = 𝑠𝑥ҧ = s sample standard deviation
𝑁 𝑛
n sample size

Exploratory Data Analysis in Excel | 55


DEPARTMENT of
STATISTICS

Example: Computation of standard error of


the mean in excel
𝑠
Note: Where: 𝑠 standard deviation
𝑛
n sample size
OPTION 1 OPTION 2

where: 43 is the sample size of the data where: COUNT give the number
SQRT means square root of entries in a data range

The output will be

Exploratory Data Analysis in Excel | 56


DEPARTMENT of
STATISTICS Measures of Variability
5. Coefficient of Variation (CV)
- defined as the ratio of the standard deviation and the mean and is expresses
in percent
- unitless; useful for comparing two data sets with different units of
measurement

Population Sample Where: 𝜎 population standard deviation


N population size
𝜎 𝑠
𝐶𝑉 = 𝑥 100% 𝐶𝑉 = 𝑥 100% s sample standard deviation
𝜇 𝑋ത
n sample size

Exploratory Data Analysis in Excel | 57


DEPARTMENT of
STATISTICS

Example: Computation of coefficient of


variation in excel
s Where: 𝑠 standard deviation
Note: ҧ
x100
𝑥 𝑥ҧ sample size

The output will be

Exploratory Data Analysis in Excel | 58


DEPARTMENT of
STATISTICS Numerical Summary Measures using Data Analysis in Excel
1. Click the Data tab on the ribbon
2. Choose Descriptive Statistics, then click OK

Exploratory Data Analysis in Excel | 59


DEPARTMENT of
STATISTICS Descriptive Statistics
3. Input the range that contain the data
4. From the output options, check summary statistics
then click OK Range of the data

a range of cells
where the tool
will give you
output after its
analysis

Worksheet name where


the tool will give you
output after its analysis

Exploratory Data Analysis in Excel | 60


DEPARTMENT of
STATISTICS Graphical Presentation of Quantitative Data

• Line Chart

• Histogram

• Boxplot

Exploratory Data Analysis in Excel | 61


DEPARTMENT of
STATISTICS Graphical Presentation of Quantitative Data
Line Chart
- graphical representation of data especially useful for showing trends over
a period of time
8

4 2008

2009

3 2010

0
CHSI CAS Ced Cag CF

Figure 1. Line Chart on Enrollees (in 1000)

Exploratory Data Analysis in Excel | 62


DEPARTMENT of
STATISTICS Line Chart
Example: Line chart in Excel
Step 1: Highlight the cells that contain the data you want to use in the chart

Exploratory Data Analysis in Excel | 63


DEPARTMENT of
STATISTICS Line Chart

Step 2: Click the insert tab on the ribbon


Step 3: Select the type of chart you want to create

Exploratory Data Analysis in Excel | 64


DEPARTMENT of
STATISTICS Line Chart
Step 4: You'll see many
options when you select this
button, such as 2-D line and
3-D line, as well as 2-D and
3-D lines. For these
purposes, we're selecting 2-
D line.

Exploratory Data Analysis in Excel | 65


DEPARTMENT of
STATISTICS Line Chart

Step 5:
The chart will appear.
Customize bar chart
through ”chart design”,
“format”, “Quick
Layout”.

Exploratory Data Analysis in Excel | 66


DEPARTMENT of
STATISTICS Line Chart

Step 6: The final output will be

2020 Sales
10
8.1
8 7.2 7.5
6.9 6.7
FREQUENCY

6.0 5.7
6 4.9 5.1 5 5.4
4.2
4
2
0
Jan Feb Mar Apr May June Jul Aug Sep Oct Nov Dec
PERIOD

Exploratory Data Analysis in Excel | 67


DEPARTMENT of
STATISTICS Histogram

➢ A graph in which the classes are


marked on the horizontal axis and
the class frequencies on the
vertical axis. The class frequencies
are represented by the heights of
the bars, and the bars are drawn
adjacent to each other.

➢ A Histogram visually represent the


distribution of a continuous
variable.

Exploratory Data Analysis in Excel | 68


DEPARTMENT of
STATISTICS Histogram
Steps in Constructing Quantitative FDT
1. Put the data in array because it is easier to detect the smallest and largest
value and it is easier to find the frequency and measure of position.
2. Determine the range (R):
𝑹 = 𝐡𝐢𝐠𝐡𝐞𝐬𝐭 𝐯𝐚𝐥𝐮𝐞 − 𝐥𝐨𝐰𝐞𝐬𝐭 𝐯𝐚𝐥𝐮𝐞
3. Solve for the number of classes or class intervals (k):
𝒌 = 𝑵 or 𝒌 = 𝒏
where N is the number of observations.
4. Determine the class size (c).
Note: Round off c where c has the same number of decimal places in the raw data.
𝑹
𝐜=
𝒌
Exploratory Data Analysis in Excel | 69
DEPARTMENT of
STATISTICS Histogram
Steps in Constructing Quantitative FDT
5. Determine and enumerate the classes. Each class is an interval of values
defined by its lower and upper class limits. As a rule, the lowest value in
the data becomes the lower class limit (LL) of the first class interval. Adding
c to the lower class limit of the preceding class interval obtains succeeding
lower limits. Upper class limits (UL) are obtained using the following
formula:
UL = LL + c – 1 unit of measure
Data One unit of measure
0.34, 24.56, 119.02, 3.67 0.01
0.4, 5.5, 123.8, 2.7, 12.3 0.1
2. 17, 29, 6, 176 1
Exploratory Data Analysis in Excel | 70
DEPARTMENT of
STATISTICS Histogram

Range
=SQRT(43) or
= SQRT(COUNT(data range))

=R/k (=1870.03/7)

Bin: upper class limit

=MIN(data range)

=LL + c - 1 unit of measure


(see previous slide) Exploratory Data Analysis in Excel | 71
DEPARTMENT of
STATISTICS Histogram

1. Click the Data tab on the


ribbon
2. Choose Histogram,
then click OK

Exploratory Data Analysis in Excel | 72


DEPARTMENT of 3. Input the range that contain the data Histogram
STATISTICS
4. Input the range that contain the bin
5. From the output options, choose where you want
to place the output, then select Chart Output

Exploratory Data Analysis in Excel | 73


DEPARTMENT of
STATISTICS Histogram

this will be the output:

Exploratory Data Analysis in Excel | 74


DEPARTMENT of
STATISTICS Histogram

Click the graph then change the


layout of the graph into histogram

Change the bean column into classes


and delete more from the bin

Exploratory Data Analysis in Excel | 75


DEPARTMENT of
STATISTICS Histogram

The final output will be


Histogram
25

Classes Frequency
9.03-276.17 21 20

276.18-543.32 10
15

FREQUENCY
543.33-810.47 4
810.48-1077.62 3 10

1077.63-1344.77 3
5

1344.78-1611.92 0
0
1611.93-1879.07 2 9.03-276.17 276.18-543.32 543.33-810.47 810.48-1077.62 1077.63-1344.77 1344.78-1611.92 1611.93-1879.07
TOTAL SALES

Exploratory Data Analysis in Excel | 76


DEPARTMENT of
STATISTICS Boxplot
Boxplots - give us information about the distribution and spread of the data.
Procedure for Constructing a Boxplot
1. Find the 5-number summary consisting of the minimum value 𝑸𝟏 , the
median𝑸𝟐 , and the maximum value 𝑸𝟑 .
2. Construct a scale with values that include the minimum and maximum
data values.
3. Construct a box (rectangle) extending from 𝑸𝟏 to 𝑸𝟑 and draw a line in
the box at the median value.
4. Draw lines extending outward from the box to the minimum and
maximum data values.
*In boxplot, a data value is an outlier if it is above 𝑸𝟑 + (1.5×IQR) or below
𝑸𝟏 − (1.5×IQR).
Exploratory Data Analysis in Excel | 77
DEPARTMENT of
STATISTICS Boxplot

Interquartile Range
(IQR)

whisker whisker
Minimum/ Maximum/
Lower Fence Median Upper Fence
𝑸𝟏 𝑸𝟑
(𝑸𝟏 − 1.5 ∗ IQR) (25th Percentile) (75th Percentile) (𝑸𝟑 + 1.5 ∗ IQR)

Exploratory Data Analysis in Excel | 78


DEPARTMENT of
STATISTICS Boxplot

1. Compute the
=MIN(data range)
=QUARTILE(Array,1)
Minimum value,
=QUARTILE(Array,2) 1st Quartile,
=QUARTILE(Array,3) 2nd Quartile,
=MAX(selected data range) 3rd Quartile, and
Maximum value.

Exploratory Data Analysis in Excel | 79


DEPARTMENT of
STATISTICS Boxplot

2. Create a scatter
plot using the 5
computed values
and the column of
1’s.

Exploratory Data Analysis in Excel | 80


DEPARTMENT of
STATISTICS Boxplot

Exploratory Data Analysis in Excel | 81


DEPARTMENT of
STATISTICS Boxplot

Change the
maximum
bound of Y
axis

Exploratory Data Analysis in Excel | 82


DEPARTMENT of
STATISTICS Boxplot

3. Construct a box (rectangle)


extending from 𝑸𝟏 to 𝑸𝟑 and draw a
line in the box at the median value.

Exploratory Data Analysis in Excel | 83


DEPARTMENT of
STATISTICS Boxplot

Exploratory Data Analysis in Excel | 84


DEPARTMENT of
STATISTICS

Contingency Tables
• Contingency tables (also called crosstabs) are useful as a rudimentary
tool to analyze the relationship between two variables.
• In a contingency table, one variable is presented in the columns and the
other in the rows.
• By looking at the distribution of one variable across categories of the
other, we are able to gain preliminary insight into the association among
variables.
• Contingency tables are most useful when variables have a limited
number of response categories.

Exploratory Data Analysis in Excel | 85


DEPARTMENT of
STATISTICS

Step 1: Identify the two variables you


wish to analyze in a contingency table.
Identify which variable you would
consider to be the independent variable
and the dependent variable.

Exploratory Data Analysis in Excel | 86


DEPARTMENT of
STATISTICS

Step 2: Click anywhere


within your data on
your spreadsheet
Create a contingency
table using the “pivot
table” (feature in the
“insert tab”).

Exploratory Data Analysis in Excel | 87


DEPARTMENT of
STATISTICS

Step 3: Select the


table or range of
data

Exploratory Data Analysis in Excel | 88


DEPARTMENT of
STATISTICS

Step 5: A menu will appear on the right side of the


screen called “Pivot Table Fields.” In the top of this
menu you will see you all of the variables in your data
set. In the bottom part of the menu, there is a grid with
four fields: filters, rows, columns, and values. Drag the
variable sex into the box titled Columns, drag the
variable major into the box titles Rows and drag the
variable students into the box titles Values

Note: If Values first appears as “Sum of Order Number”


simply click the dropdown arrow and select Value Field
Settings. Then choose Count and click OK.

Exploratory Data Analysis in Excel | 89


DEPARTMENT of
STATISTICS

The frequency values will automatically be populated in the contingency


table:

Exploratory Data Analysis in Excel | 90


DEPARTMENT of
STATISTICS

• To create graphical presentation of the contingency table, highlight the


count or frequencies, then go to the chart tabs.
Go to the Charts tab and select Column ➞ 2-D Clustered Column.
8

4 Female
Male

0
Econ Math Politics

Exploratory Data Analysis in Excel | 91

You might also like