0% found this document useful (0 votes)
9 views

Eco2061 Week 2

Chapter 1 of ECO 2061: Statistics introduces the importance of statistics in business decision-making, differentiating between descriptive and inferential statistics. It covers data collection methods, types of data, and the significance of organizing data through tables and graphs. Additionally, it outlines the basic vocabulary of statistics, including variables, populations, samples, and different types of variables and their measurements.

Uploaded by

yagizzguloglu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Eco2061 Week 2

Chapter 1 of ECO 2061: Statistics introduces the importance of statistics in business decision-making, differentiating between descriptive and inferential statistics. It covers data collection methods, types of data, and the significance of organizing data through tables and graphs. Additionally, it outlines the basic vocabulary of statistics, including variables, populations, samples, and different types of variables and their measurements.

Uploaded by

yagizzguloglu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 68

ECO 2061: Statistics

Chapter 1:

*Slides are based on Statistics for Managers: Using Microsoft Excel, 6th Edition by David M. Levine, Mark L.
Berenson, Timothy C. Krehbiel, David F. Stephan; Statistics for Business and Economics, by Newbold, P.,
Carlson, W.L., Thorne, B., 8th ed. Pearson; and Statistics for Business and Economics by Anderson, Sweeney,
Williams, Freeman, Shoesmith3rd ed., slides from
Outline
 Why learn statistics
 How business uses statistics
 turning data into information to facilitate decision making
 Why collect data?
 Source of data
 Difference between descriptive and inferential statistics
 The basic vocabulary of statistics
 The types of data used in business
 To develop tables and charts for categorial and numerical data
 Presenting graphs
Why Learn Statistics

Make better sense of the world Make better business decisions


 Internet articles / reports  Business memos

 Magazine articles  Business research

 Newspaper articles  Technical journals

 Television & radio reports  Technical reports


In Business, Statistics Has Many
Important Uses

 To summarize business data

 To draw conclusions from business data

 To make reliable forecasts about business


activities

 To improve business processes


Why Collect Data?

 A marketing research analyst needs to assess the effectiveness


of a new television advertisement.

 An auditor wants to review the financial transactions of a


company in order to determine whether the company follows
generally accepted accounting principles.
A Step-by-Step Process For Examining &
Concluding From Data Is Helpful

 Define the variables for which you want to reach


conclusions
 Collect the data from appropriate sources
 Organize the data collected by developing tables
 Visualize the data by developing charts
 Analyze the data by examining the appropriate
tables and charts (and in later chapters by using
other statistical methods) to reach conclusions
Two Different Branches Of Statistics Are
Used In Business

Statistics
The branch of mathematics that transforms data into
useful information for decision makers.

Descriptive Statistics Inferential Statistics

Collecting, summarizing, Using data collected from a


presenting and analyzing data small group to draw conclusions
about a larger group
Descriptive Statistics

 Collect data
 e.g., Survey
 Present data
 e.g., Tables and graphs
 Characterize data

e.g., The sample mean

1-8
Inferential Statistics
 Estimation
 e.g., Estimate the population

mean weight using the sample


mean weight
 Hypothesis testing
 e.g., Test the claim that the

population mean weight is 120


pounds

Drawing conclusions about a large group of


individuals based on a smaller group.

1-9
Sources of Data

 Primary Sources: The data collector is the one using the data
for analysis
 Data from a political survey
 Data collected from an experiment
 Observed data
 Secondary Sources: The person performing data analysis is
not the data collector
 Analyzing census data
 Examining data from print journals or data published on the internet.
Sources of data fall into four categories

 Data distributed by an organization or an


individual

 A designed experiment

 A survey

 An observational study
Examples Of Data Distributed By
Organizations or Individuals

 Financial data on a company provided by


investment services.

 Industry or market data from market research


firms and trade associations.

 Stock prices, weather conditions, and sports


statistics in daily newspapers.
Examples of Data From A Designed
Experiment

 Consumer testing of different versions of a


product to help determine which product should
be pursued further.

 Material testing to determine which supplier’s


material should be used in a product.

 Market testing on alternative product


promotions to determine which promotion to
use more broadly.
Examples of Survey Data

 Political polls of registered voters during


political campaigns.

 People being surveyed to determine their


satisfaction with a recent product or service
experience.
Examples of Data From Observational
Studies

 Market researchers utilizing focus groups to


elicit unstructured responses to open-ended
questions.

 Measuring the time, it takes for customers to be


served in a fast-food establishment.

 Measuring the volume of traffic through an


intersection to determine if some form of
advertising at the intersection is justified.
Basic Vocabulary of Statistics

VARIABLES
Variables are a characteristics of an item or individual and are what you
analyze when you use a statistical method.

DATA
Data are the different values associated with a variable.

OPERATIONAL DEFINITIONS
Data values are meaningless unless their variables have operational
definitions, universally accepted meanings that are clear to all associated
with an analysis.
Basic Vocabulary of Statistics
POPULATION
A population consists of all the items or individuals about which
you want to draw a conclusion. The population is the “large
group”

SAMPLE
A sample is the portion of a population selected for analysis. The
sample is the “small group”

PARAMETER
A parameter is a numerical measure that describes a
characteristic of a population.

STATISTIC
A statistic is a numerical measure that describes a characteristic of
a sample.
Population vs. Sample

Population Sample

Measures used to describe the Measures used to describe the


population are called parameters sample are called statistics
Random Sampling

Simple random sampling is a procedure in which

 each member of the population is chosen strictly by


chance,
 each member of the population is equally likely to be
chosen,
 every possible sample of n objects is equally likely to
be chosen

The resulting sample is called a random sample


Systematic Sampling

For systematic sampling,

 Assure that the population is arranged in a way that is not


related to the subject of interest
 Select every jth item from the population…
 …where j is the ratio of the population size to the
sample size, j = N/n
 Randomly select a number from 1 to j for the first item
selected

The resulting sample is called a systematic sample


Systematic Sampling

Example:
Suppose you wish to sample n = 9 items from a
population of N = 72.

j = N / n = 72 / 9 = 8

Randomly select a number from 1 to 8 for the first item to


include in the sample; suppose this is item number 3.

Then select every 8th item thereafter


(items 3, 11, 19, 27, 35, 43, 51, 59, 67)
Types of Variables

 Categorical (qualitative) variables have values that


can only be placed into categories, such as “yes” and
“no.”

 Numerical (quantitative) variables have values that


represent quantities.
 Discrete variables arise from a counting process
 Continuous variables arise from a measuring process
Types of Variables

Variables

Categorical Numerical
(Qualitative) (Quantitative)

Examples:
 Marital Status
 Political Party Discrete Continuous
 Eye Color
(Defined categories)
Examples: Examples:
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured characteristics)
Levels of Measurement
Categorical Variables

A nominal scale classifies data into distinct


categories in which no ranking is implied.

Categorical Variables Categories

Personal Computer Yes / No


Ownership

Type of Stocks Owned Growth / Value / Other

Internet Provider Microsoft Network / AOL/ Other


Levels of Measurement
Categorical Variables

An ordinal scale classifies data into distinct


categories in which ranking is implied

Categorical Variable Ordered Categories

Student class designation Freshman, Sophomore, Junior,


Senior
Product satisfaction Satisfied, Neutral, Unsatisfied

Faculty rank Professor, Associate Professor,


Assistant Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades A, B, C, D, F
Levels of Measurement:
Numerical Variables

 An interval scale is an ordered scale in which the


difference between measurements is a meaningful
quantity, but the measurements do not have a true
zero point.

 A ratio scale is an ordered scale in which the


difference between the measurements is a
meaningful quantity and the measurements have a
true zero point.
Interval and Ratio Scales
Numerical Variables
Graphical
Presentation of Data

 Data in raw form are usually not easy to use


for decision making
 Some type of organization is needed

Table

Graph
 The type of graph to use depends on the
variable being summarized
Graphical
Presentation of Data

Categorical Numerical
Variables Variables

• Frequency distribution • Frequency distribution


• Cross table • Line chart (time series)
• Bar chart • Histogram and ogive
• Pie chart • Stem-and-leaf display
• Pareto diagram • Scatter plot
Tables and Graphs for
Categorical Variables

Categorical
Data

Tabulating Data Graphing Data

Frequency
Distribution Bar Pie Pareto
Table Chart Chart Diagram
Categorical Data Are Organized By Utilizing
Tables

Categorical
Data

Tallying Data

One Two
Categorical Categorical
Variable Variables

Summary Contingency
Table Table
Data is collected with tally marks which are frequencies or occurrences in a
tally chart.
Organizing Categorical Data: Summary
Table

 A summary table indicates the frequency, amount, or percentage of items


in a set of categories so that you can see differences between categories.

Summary Table From A Survey of 1000 Banking Customers

Banking Preference? Percent


ATM 16%
Automated or live telephone 2%
Drive-through service at branch 17%
In person at branch 41%
Internet 24%
Bar and Pie Charts

 Bar charts and Pie charts are often used


for qualitative (categorical) data

 Height of bar or size of pie slice shows


the frequency or percentage for each
category
Visualizing Categorical Data:
The Bar Chart
 In a bar chart, a bar shows each category, the length of which
represents the amount, frequency or percentage of values falling
into a category which come from the summary table of the variable.

Banking Preference

Banking Preference? % Internet


ATM 16%
In person at branch
Automated or live 2%
telephone
Drive-through service at 17% Drive-through service at branch
branch
In person at branch 41% Automated or live telephone
Internet 24%
ATM

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%


Visualizing Categorical Data:
The Pie Chart
 The pie chart is a circle broken up into slices that represent categories.
The size of each slice of the pie varies according to the percentage in
each category.
Banking Preference

Banking Preference? %
16% ATM
ATM 16% 24%
Automated or live 2% 2% Automated or live
telephone telephone

Drive-through service at 17% Drive-through service at


branch 17% branch
In person at branch 41% In person at branch

Internet 24%
Internet
41%
A Contingency Table Helps Organize Two
or More Categorical Variables

 Used to study patterns that may exist between


the responses of two or more categorical
variables

 Cross tabulates or tallies jointly the responses


of the categorical variables

 For two variables the tallies for one variable are


in the rows and the tallies for the second
variable are in the columns
Cross Tables

 Cross Tables (or contingency tables) list the


number of observations for every combination
of values for two categorical variables

 If there are r categories for the first variable


(rows) and c categories for the second
variable (columns), the table is called an r x c
cross table
Cross Table Example

 3 x 3 Cross Table for Investment Choices by Investor


(values in $1000’s)
Investment Investor A Investor B Investor C Total
Category
Stocks 46 55 27 128
Bonds 32 44 19 95
Cash 15 20 33 68

Total 93 119 79 291


Graphing
Multivariate Categorical Data
 Side by side horizontal bar chart
Graphing
Multivariate Categorical Data
 Stacked bar chart
Vertical
Side-by-Side Chart Example
 Sales by quarter for three sales territories:
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Ea st 20.4 27.4 59 20.4
W e st 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
Visualizing Categorical Data:
The Pareto Chart

 Used to portray categorical data


 A bar chart, where categories are shown in
descending order of frequency
 A cumulative polygon is often shown in the
same graph
 Used to separate the “vital few” from the “trivial
many”
Visualizing Categorical Data:
The Pareto Chart (con’t)

Pareto Chart For Banking Preference

100% 100%
% in each category

80% 80% Banking Preference? %

Cumulative %
(line graph)
(bar graph)

60% 60%
ATM 16%
40% 40%
Automated or live 2%
20% 20% telephone
Drive-through service at 17%
0% 0% branch
In person Internet Drive- ATM Automated
at branch through or live In person at branch 41%
service at telephone Internet 24%
branch
Graphical
Presentation of Data

Categorical Numerical
Variables Variables

• Frequency distribution • Frequency distribution


• Cross table • Line chart (time series)
• Bar chart • Histogram and ogive
• Pie chart • Stem-and-leaf display
• Pareto diagram • Scatter plot
Graphs to Describe
Time-Series Data

 A line chart (time-series plot) is used to show


the values of a variable over time

 Time is measured on the horizontal axis

 The variable of interest is measured on the


vertical axis
Line Chart Example
Line Chart Example

Price ($)
5.000

4.500

4.000

3.500

3.000
Price ($)
2.500

2.000

1.500

1.000

0.500

0.000
1975 1980 1985 1990 1995 2000 2005 2010
Visualizing Numerical Data
By Using Graphical Displays
Numerical Data

Frequency Distributions
Ordered Array and
Cumulative Distributions

Stem-and-Leaf
Display Histogram Ogive
Organizing Numerical Data:
Ordered Array

 An ordered array is a sequence of data, in rank order, from the


smallest value to the largest value.
 Shows range (minimum value to maximum value)
 May help identify outliers (unusual observations)

Age of Day Students


Surveyed
College
16 17 17 18 18 18
Students 19 19 20 20 21 22
22 25 27 32 38 42
Night Students
18 18 19 19 20 21
23 28 32 33 41 45
Stem-and-Leaf Display

 A simple way to see how the data are distributed


and where concentrations of data exist

METHOD: Separate the sorted data series


into leading digits (the stems) and
the trailing digits (the leaves)
Organizing Numerical Data:
Stem and Leaf Display
 A stem-and-leaf display organizes data into groups (called
stems) so that the values within each group (the leaves)
branch out to the right on each row.
Age of College Students

Age of Day Students Day Students Night Students


Surveye
16 17 17 18 18 18
d College Stem Leaf Stem Leaf
Students 19 19 20 20 21 22
22 25 27 32 38 42 1 67788899 1 8899
Night Students 2 0012257 2 0138
18 18 19 19 20 21
23 28 32 33 41 45 3 28 3 23
4 2
4 15
Organizing Numerical Data:
Frequency Distribution
 A frequency distribution is a way to summarize
data
 containing class groupings
 and the corresponding frequencies with which
data fall within each class or category
 The distribution condenses the raw data into a
more useful form and allows for a quick visual
interpretation of the data
Class Intervals
and Class Boundaries
 The frequency distribution is a summary table in
which the data are arranged into numerically ordered
classes.
 Each class grouping has the same width
 Determine the width of each interval by
largest number  smallest number
w interval width 
number of desired intervals
 Use at least 5 but no more than 15-20 intervals
 Intervals never overlap
 Round up the interval width to get desirable interval
endpoints
Organizing Numerical Data:
Frequency Distribution Example

Example: A manufacturer of insulation randomly selects 20


winter days and records the daily high temperature

24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53,
27
Organizing Numerical Data:
Frequency Distribution Example

 Sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
 Find range: 58 - 12 = 46
 Select number of classes: 5 (usually between 5 and 15)
 Compute class interval (width): 10 (46/5 then round up)
 Determine class boundaries (limits):

Class 1: 10 to less than 20

Class 2: 20 to less than 30

Class 3: 30 to less than 40

Class 4: 40 to less than 50

Class 5: 50 to less than 60
 Compute class midpoints: 15, 25, 35, 45, 55
 Count observations & assign to classes
Organizing Numerical Data: Frequency
Distribution Example

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Class Midpoints Frequency

10 but less than 20 15 3


20 but less than 30 25 6
30 but less than 40 35 5
40 but less than 50 45 4
50 but less than 60 55 2
Total 20
Organizing Numerical Data: Relative &
Percent Frequency Distribution Example

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Class Frequency Frequency Percentage

10 but less than 20 3 .15 15


20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
Organizing Numerical Data: Cumulative
Frequency Distribution Example

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage

10 but less than 20 3 15% 3 15%


20 but less than 30 6 30% 9 45%
30 but less than 40 5 25% 14 70%
40 but less than 50 4 20% 18 90%
50 but less than 60 2 10% 20 100%
Total 20 100 20 100%
Frequency Distributions:
Some Tips

 Different class boundaries may provide different pictures for


the same data (especially for smaller data sets)

 Shifts in data concentration may show up when different


class boundaries are chosen

 As the size of the data set increases, the impact of


alterations in the selection of class boundaries is greatly
reduced

 When comparing two or more groups with different sample


sizes, you must use either a relative frequency or a
percentage distribution
Visualizing Numerical Data:
The Histogram

 A vertical bar chart of the data in a frequency distribution is


called a histogram.

 In a histogram there are no gaps between adjacent bars.

 The class boundaries (or class midpoints) are shown on the


horizontal axis.

 The vertical axis is either frequency, relative frequency, or


percentage.

 The height of the bars represent the frequency, relative


frequency, or percentage.
Visualizing Numerical Data:
The Histogram
Relative
Class Frequency Frequency Percentage

10 but less than 20 3 .15 15


20 but less than 30 6 .30 30
30 but less than 40 5 .25 25

40 but less than 50 4 .20 20


50 but less than 60 2 .10 10

Total 20 1.00 100

His togram : Daily High Te m pe rature


7 6
(In a percentage 6
histogram the vertical 5
axis would be defined to Frequency 5 4
show the percentage of 4
observations per class) 3
3 2
2
1 0 0
0
0 10 20 30 40 50 60
How Many Class Intervals?

 Many (Narrow class intervals) 3.5


3

may yield a very jagged distribution 2.5

Frequency
with gaps from empty classes 2
1.5

Can give a poor indication of how 1
0.5
frequency varies across classes 0

4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
Temperature

 Few (Wide class intervals) 12

may compress variation too much and


10

8

Frequency
yield a blocky distribution 6
4

can obscure important patterns of 2

variation. 0
0 30 60 More
Temperature
(X axis labels are upper class endpoints)
Visualizing Numerical Data:
The Polygon

 A percentage polygon is formed by having the midpoint of


each class represent the data in that class and then connecting
the sequence of midpoints at their respective class
percentages.

 The cumulative percentage polygon, or ogive, displays the


variable of interest along the X axis, and the cumulative
percentages along the Y axis.

 Useful when there are two or more groups to compare.


Visualizing Numerical Data:
The Frequency Polygon
Class
Class Midpoint Frequency
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5
40 but less than 50 45 4 Frequency Polygon: Age Of Students
50 but less than 60 55 2

7
6
Frequency

5
4
3
2
(In a percentage
1
polygon the vertical axis 0
would be defined to
show the percentage of 5 15 25 35 45 55 65
Class Midpoints
observations per class)
Visualizing Numerical Data:
The Ogive
Lower % less
class than lower
Class boundary boundary
10 but less than 20 10 15
20 but less than 30 20 45
30 but less than 40 30 70
40 but less than 50 40 90 Ogive: Age Of Students
50 but less than 60 50 100
100
80
Cumulative Percentage 60
40
20
In an ogive the percentage 0
of the observations less
than each lower class 10 20 30 40 50 60
boundary are plotted versus
the lower class boundaries. Lower Class Boundary
Visualizing Two Numerical
Variables: The Scatter Plot
 Scatter plots are used for numerical data consisting of paired
observations taken from two numerical variables

 One variable is measured on the vertical axis and the other


variable is measured on the horizontal axis

 Scatter plots are used to examine possible relationships


between two numerical variables
Scatter Plot Example

Volume Cost per


Cost per Day vs. Production Volume
per day day
23 125 250

26 140 200
Cost per Day

150
29 146
100
33 160 50
38 167 0
20 30 40 50 60 70
42 170
Volume per Day
50 188
55 195
60 200
Data Presentation Errors

 Unequal histogram interval widths

 Compressing or distorting the vertical axis

 Providing no zero point on the vertical axis

 Failing to provide a relative basis in


comparing data between groups

You might also like