0% found this document useful (0 votes)
48 views

Descriptive Statistics Using Microsoft Excel

The document discusses descriptive statistics and how to perform them in Microsoft Excel. It explains how to analyze a dataset using Excel's descriptive statistics tool to calculate measures like mean, median, mode, standard deviation, variance, minimum, maximum and more. These statistics can help understand the distribution of data and identify any outliers.

Uploaded by

Vasant bhoknal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Descriptive Statistics Using Microsoft Excel

The document discusses descriptive statistics and how to perform them in Microsoft Excel. It explains how to analyze a dataset using Excel's descriptive statistics tool to calculate measures like mean, median, mode, standard deviation, variance, minimum, maximum and more. These statistics can help understand the distribution of data and identify any outliers.

Uploaded by

Vasant bhoknal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Descriptive Statistics Using Microsoft Excel

These tutorials briefly explain the use and interpretation of standard statistical analysis techniques. The
examples include how-to instructions for Excel. Although there are different version of Excel in use, these should
work about the same for most recent versions. They also assume that you have installed the Excel Analysis Pak
which is free and comes with Excel (Go to Tools, Addins... if it is not already installed in your version of Excel.)

See www.stattutorials.com/EXCELDATA for files mentioned in this tutorial, � TexaSoft, 2008

Descriptive Statistics in Excel


Usually when you create or acquire a set of numbers you will want to examine the data to learn more about
its distribution and to discover information such as the minimum and maximum values and to determine if
there are outliers. This is an important step in any analysis since it helps you understand if you data meet
assumptions required by other analyses such as t-tests and regression.

For this example, we will look at the data set called EXAMPLE.XLS. The first few records are shown here:

GRO AG TIM TIM TIM TIM STAT


UP E E1 E2 E3 E4 US
A 12 22.3 25.3 28.2 30.6 5
A 11 22.8 27.5 33.3 35.8 5
B 12 22.8 30.0 32.8 31.0 4
A 12 18.5 26.0 29.0 27.9 5
B 9 19.5 25.0 25.3 26.6 5
B 11 23.5 28.8 34.2 35.6 5
C 8 22.6 26.7 28.0 33.4 3
B 8 21.0 26.7 27.5 29.5 5

As an initial example, we will examine the variable AGE.

1. In Excel, select Tools/Data Analysis/Descriptive Statistics. (If the Data Analysis option is
not on your Tools menu, you must first install it using Tools/Add ins)

2. Select the input range for the AGE variable. In this case it is $B$2:$B$51.

3. Be sure to select the check boxes Summary Statistics and Confidence level for mean (95% is
okay). The output created is shown here:

Column1

Mean 10.46
Standard Error 0.343107052
Median 11
Mode 12
Standard Deviation 2.42613323
Sample Variance 5.886122449
Kurtosis -0.261061479
Skewness -0.511921947
Range 11
Minimum 4
Maximum 15
Sum 523
Count 50
Confidence
Level(95.0%) 0.689499422
Information you should notice includes:

1. Search for outliers: Look at the Minimum and Maximum values to see if these values fall
within your expected range for these data. If a value is unexpectedly small or large, you
should examine your original data to see if it was miscoded. If there are corrections that need
to be made, make them before continuing. If you have values that are unexpectedly large are
small, but are actual values, it may indicate that your data are not normally distributed. In that
case you may consider using nonparametric procedures in further analyses. It may also
indicate that the mean is not the best value to report to describe the central tendency of this
data set.

2. Symmetry: Another measure that helps you decide normality is Skewness and Kurtosis. The
Skewness measure indicates the level of non-symmetry. If the distribution of the data are
symmetric then skewness will be close to 0 (zero). The further from 0, the more skewed the
data. A negative value indicates a skew to the left. How do you tell if the skewness is large
enough to case concern. Excel doesn�t give you this value, but a measure of the standard
error of skewness can be calculated as =SQRT(6/N) or =SQRT(6/50) which is 0.346. If the
skewness is more than twice this amount, then it indicates that the distribution of the data is
non-symmetric. In this case 0.346 * 2 = 0.69. The skewness reported by Excel is -0.512, so
the data can be assumed to be fairly symmetric (although somewhat marginally so.)
However, this does NOT indicate that the data are normally distributed.

3. Kurtosis is a measure of the peakedness of the data. Again, for normally distributed data the
kurtosis is 0 (zero). As with skewness, if the value of kurtosis is too big or too small, there is
concern about the normality of the distribution. In this case, a rough formula for the standard
error for kurtosis is =SQRT(24/N) = 0.692. Twice this amount is 1.39. Since the value of
kurtosis falls within two standard errors (-0.26) the data may be considered to meet the
criteria for normality by this measure. These measures of skewness and kurtosis are one
method of examining the distribution of the data. However, they are not definitive in
concluding normality. You should also examine a graph (histogram) of the data and consider
performing other tests for normality such as the Shapiro-Wilk or Kolmogorov-Smirnov test
(not provided by Excel).

4. Estimate of central tendency: For normally distributed data the mean (arithmetic average)
is the typical value to use in a report or journal article. However, this point estimate is of
limited value without some estimate of the varibility of the data. Therefore you should at least
report three values � the mean, the standard error of the mean, and the sample size. In this
case, those values are 10.5, 0.34 and 50. (Typically report means to one more decimal place
than what was measured and the standard error to two decimal places beyond the unit
measure.) Many journals also prefer that you report a confidence limit on the mean. This is
calculated (estimated) as MEAN + or � t(.05,N-1) * SEM. The value of t is reported by Excel as
the confidence level = 0.6895. Thus, the lower bound for the 95 % CI is given as = 10.46 �
0.6895 = 9.77 and the upper limit � 10.46 + 0.6895 = 11.15. The median is another measure
of central tendency and is usually reported when the data are not normally distributed. The
mode, or the most frequent value, is a third measure of central tendency. In the case of the
median or mode, the range is often given as a measure of variability, although a better
measure is the interquartile range (not reported by Excel).

5. Measures of variability: The measures of variability reported in Excel�s descriptive


statistics include the standard error, the standard deviation and the variance as well as the
range. The first three measures are related in the following ways:

Standard Deviation = SQRT(Variance)

Standard Error = Standard Deviation / SQRT(N)


6. Coefficient of Variation: Another measure often reported is the coefficient of variation. This
measure provides a unitless measure of the variation of the sate by translating it into a
percentage of the mean value. This measure not provided by Excel, but is easily calculated by
the formula:

CV = (Standard Deviation / Mean) * 100

For this data, the value, expressed as a percentage, is

CV = (0.343/10.46) * 100 = 3.279

This CV value can be used when comparing two samples that have different means and
standard deviations. When the mean is close to 0, the CV value becomes of little use.

7. Visualizing your data: It is always a good idea to examine your data visually to understand
what�s going on. To produce a histogram of the AGE variable, select Tools/Data
Analysis/Histogram. Select the range for the AGE variable, $B$2:$B$51, and check the
option Chart output. This produces the following table and chart:

4 1
5.57142
9 0
7.14285
7 6
8.71428
6 4
10.2857
1 11
11.8571
4 8
13.4285
7 17
More 3
This is not the best looking histogram because Excel selects its own bin sizes (class intervals)
on the X-axis, which in this case are less than desirable. You can improve on the plot by
creating a list of bin sizes such as in the table below:

Bin
Sizes
5.00
7.00
9.00
11.00
13.00
15.00

Redo the histogram, this time indicating the bin sizes as the �Bin Range.� This produces
the following histogram:
This plot is much cleaner looking and suitable for reporting. It shows that the data are
visually not symmetric with more values appearing to the right of the mean (10.46) than we
would expect in normally distributed data.

This tutorial shows ways for you to examine a variable for normality, symmetry, and to
visually inspect the distribution of your data. These are important beginning exercises you
should perform on your data before using it any other analysis.

You might also like