0% found this document useful (0 votes)
35 views26 pages

Descriptive Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views26 pages

Descriptive Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Descriptive

Statistics
MODULE 2
Data – descriptive facts and figures collected, analyzed, and summarized for presentation and
interpretation.

Data mining – analytical techniques to better understand patterns and relationships


◦ data collection,
◦ cleaning,
◦ exploratory data analysis (ex. Data query),
◦ identifying variables for further analysis,
◦ model building (regression, decision trees, etc. ),
◦ pattern discovery (predictions),
◦ presentation and integration.
4 Vs of Big Data
Volume – how do we store data?

Velocity – how do we keep real-time/up-to-date data

Variety – how to analyze different data formats? (text, audio, video)

Veracity – how much uncertainty is in the data?


Variables vs. Population vs.
Observation Sample
Quantitative vs. Categorical Data – Frequency Tables, Measures of
Location (arithmetic can be performed).
Cross-sectional vs. time-series data – same point in time vs. several time
periods.
Modifying
Data
MODULE 2
In your respective fields, how can we
utilize this function in order to make
Sorting and Filtering data analysis manageable?
Conditional Formatting

Quick Analysis tool


How can we utilize/apply
this function in your
respective fields? What
data query can you come
up with?
Creating Distributions from Data
Frequency Distribution – summary of
data that shows the number
(frequency) of observations in each of
several nonoverlapping classes (bins).

Percentage frequency distribution –


estimating the probability distribution
that characterizes its variability.
Creating Distributions from Data
SPSS – Frequency Tables

Data transformation – Variable View (bottom left) set the Values and Measure
Creating Distributions from Data
SPSS – Frequency Tables
Creating Distributions from Data
Frequency Table for a Quantitative/Numerical Data What numerical data can be
grouped into bins/categories?
◦ 1. Determine the number of nonoverlapping bins.
◦ 2. Determine the width of each bin. (largest – smallest data value / number of bins)
◦ 3. Determine the bin limits (upper and lower limit)

=COUNTIFS >=,<= or Histogram may be used


Measures of Central Tendency
Mean – Average
◦ Arithmetic Mean – simple average (additive data, linear relationship)
◦ Geometric Mean – growth rates/ratios or other variables that compounds over time (multiplicative data)
◦ Mean Rate of Change over several successive periods.
◦ Compounded Annual Growth Rate (CAGR) = Rate of Return = (Ending value/Beginning value)^(1/n)-1
Example: Annual Growth Rates for year 1,2,3 are as follow: 15%; -10%; 21%.
EXCEL - =RRI (nper,present,future)^(1/n-1)-1

Median – middle value when arranged in ascending order.


Excel: = MEDIAN (data range)
Mode – value that occurs most frequently. What is the importance of understanding measures of
Excel: = MODE (data range) central tendency/location in the context of
management?
Measures of Variability/Dispersion
Range – Max and Min values (difference); sensitive to outliers or extreme values.

Excel: =MAX(data range)-MIN(data range)

Variance- (S²) = variability based on the deviation from the mean =∑ (xi –x bar)^2 / n-1(unbiased
estimate of the population variance)

Excel: VAR.S(data range)

Standard Deviation = √S² (square root of the variance) ; measured in the same units as the
original data.

Excel: STDEV.S (data range) What is the importance of understanding variance


and deviations in the management context?
Measures of Variability/Dispersion
Z-Scores – measures the relative location of a
value in the data set. Also referred to as
“standardized value” . Z= (X−μ)/ σ
*Compute for the Mean and SD first.
EXCEL: =STANDARDIZE (data point, Mean, SD)

Z-Scores as used in management:


- Competitiveness (example: price
competitiveness)
- Consumer Behavior (seasonality, customer
satisfaction)
- Demand and Capacity (optimal z-scores for
maximizing revenue)
Empirical Rule

How do we interpret the z-values of ticket


prices with reference to the standard
deviation?
How do we interpret the Box plot presented here? What insights can we generate
from the visualization?
Other Visualizations
Data-Ink Ratio
◦ Maximize the use of ink to represent and communicate
actual data, minimizing non-data ink.
◦ = Ink Used to Display Data/Total Ink Used in the Graphic

Maximizing Data-Ink Ratio:


1. Remove non-essential Ink.
2. Simplify Label.
3. Minimize Chartjunk (decorative elements).
4. Maximize Data Density (e.g. use of small multiples
thru shared axes)
Other Visualizations
Ask yourself these questions:

•Who is my audience?

•What questions do they have?

•What answers am I finding for them?

•What am I trying to say?

•What other questions will my visualization inspire or


what conversations may result?
Other Visualizations
Scatter Chart – relationship between two quantitative variables.
- Check if trend line fits into
the data.
- R2 can be used in
assessing how well the
trendline fits the data.
- Value of .30 up explains
that 30% of the variability
observed in the target
variable is explained by the
regression model.
Other Visualizations
Line Chart
◦ Useful for time series data (e.g. sales performance over the past 12months).
◦ Can be used for comparative and trend analysis, and for numerical/quantitative data Y axis, and
sequential data.

Bar Chart – provides graphical summary of categorical data.

Bubble Chart – used for three variable visualization in a single plot. Each bubble represents
magnitude or size.
Measures of Shape
Normal Distribution

Detection of outliers.

Making probabilistic statements about population parameters.

It simplifies interpretation and makes it easier to draw meaningful conclusions from statistical
analyses.
Measures of Shape
Skewness – measure of symmetry or more precisely, the lack of symmetry. A dataset is symmetric if it
looks the same to the left and to the right of the center point.
Pearson’s correlation of Skewness = Mean-Median/Standard Deviation
- Between -0.5 and +0.5 (nearly symmetrical)
- Between -1 and -0.5 (negative skewed); +1 and 0.5 (positive skewed)
- lower than -1 and higher than 1 (extremely skewed)
-
Kurtosis – measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution.
- Mesokurtic; Leptokurtic; Platykurtic
Measures of Associations
Nominal/Categorical Data
Crosstabulations/Contingency Tables
•The chi-square statistic is a measure of the
difference between the observed and
expected frequencies. A larger chi-square
value indicates a greater deviation from
expected values.
•The null hypothesis assumes independence,
and a significant chi-square test suggests that
the variables are not independent.
Measures of Associations
Continuous/numerical variables

Covariance – descriptive measure of the linear association of continuous variables.

- magnitude of the covariance is difficult to interpret. If value is >0 = they are positively related; <0
= they are negatively related; =0 not related.

=COVARIANCE.S(x datarange; y datarange)

Correlation Coefficient – relationship between 2 continuous variables, but units of measurement


does not affect the calculation. Magnitude is measured. Value is between -1 and +1.

= CORREL(xdatarange;ydatarange)
Summary Tables

You might also like