0% found this document useful (0 votes)
39 views

01 Data & Statistics

Statistics can refer to numerical facts about data or the science of collecting and analyzing data. There are different scales of measurement for data including nominal, ordinal, interval, and ratio scales. Descriptive statistics summarize data in tables, graphs, and numerical measures in order to describe patterns in the data. Common descriptive statistics include frequency distributions, measures of central tendency like the mean, and graphical displays like histograms and bar charts.

Uploaded by

Wahyu Adil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

01 Data & Statistics

Statistics can refer to numerical facts about data or the science of collecting and analyzing data. There are different scales of measurement for data including nominal, ordinal, interval, and ratio scales. Descriptive statistics summarize data in tables, graphs, and numerical measures in order to describe patterns in the data. Common descriptive statistics include frequency distributions, measures of central tendency like the mean, and graphical displays like histograms and bar charts.

Uploaded by

Wahyu Adil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Data & Statistics

W. Rofianto
1.Data & Statistics
2.Scales of Measurement
3.Summarizing Data for a
Categorical Variable
4.Summarizing Quantitative
Data
What is Statistics?

The term statistics can refer to numerical facts such


as averages, medians, percentages, and maximums
that help us understand a variety of business and
economic situations.

Statistics can also refer to the art and science of


collecting, analyzing, presenting, and interpreting data.
Applications in Business and
Economics
 Accounting - Public accounting firms use statistical sampling
procedures when conducting audits for their clients.
 Economics - Economists use statistical information in making
forecasts about the future of the economy or some aspect of it.
 Finance - Financial advisors use price-earnings ratios and
dividend yields to guide their investment advice.
 Marketing - Electronic point-of-sale scanners at retail checkout
counters are used to collect data for a variety of marketing
research applications.
 Production - A variety of statistical quality control charts are used
to monitor the output of a production process.
 Information Systems - A variety of statistical information helps
administrators assess the performance of computer networks.
Basic Terms
• Data are the facts and figures collected, analyzed, and summarized for
presentation and interpretation.
• All the data collected in a particular study are referred to as the data set
for the study.

• Elements are the entities on which data are collected.


• A variable is a characteristic of interest for the elements.
• The set of measurements obtained for a particular element is called an
observation.
Scales of Measurement
• Scales of measurement include
• Nominal
• Ordinal
• Interval
• Ratio

• The scale determines the amount of information contained in the


data.
• The scale indicates the data summarization and statistical analyses
that are most appropriate.
Nominal scale
• Data are labels or names used to identify an attribute of the
element.
• A nonnumeric label or numeric code may be used.

Example
Students of a university are classified by the school in which they are
enrolled using a nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for the school variable
(e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes
Education, and so on).
Ordinal scale
• The data have the properties of nominal data and the order or
rank of the data is meaningful.
• A nonnumeric label or numeric code may be used.

Example
Students of a university are classified by their class standing using a
nonnumeric label such as Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for the class standing
variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so
on).
Interval scale
• The data have the properties of ordinal data, and the interval
between observations is expressed in terms of a fixed unit of
measure.
• Interval data are always numeric.

Example
Melissa has an TOEFL score of 550, while Kevin has an TOEFL score
of 500. Melissa scored 50 points more than Kevin.

SD SMP SMA

450 475 550


Ratio scale
• Data have all the properties of interval data and the ratio of
two values is meaningful.
• Zero value is included in the scale.
• Ratio data are always numerical.

Example:
Price of a book at a retail store is $ 200, while the price of the
same book sold online is $100. The ratio property shows that
retail stores charge twice the online price.

Book A $200
Book B $100

TOEFL A 600
TOEFL B 400
Categorical and Quantitative Data
• Data can be further classified as being categorical or quantitative.
• The statistical analysis that is appropriate depends on whether the data for the
variable are categorical or quantitative.
• In general, there are more alternatives for statistical analysis when the data are
quantitative.

Categorical Data
• Labels or names are used to identify an attribute of each element
• Often referred to as qualitative data
• Use either the nominal or ordinal scale of measurement
• Can be either numeric or nonnumeric
• Appropriate statistical analyses are rather limited

Quantitative Data
• Quantitative data indicate how many or how much.
• Quantitative data are always numeric.
• Ordinary arithmetic operations are meaningful for quantitative data.
Scales of Measurement
Cross-Sectional vs Time Series Data
Cross-sectional data are collected at the same or approximately the same
point in time.
Example
Data detailing the number of building permits issued in November 2018 in each of
cities of Indonesia.

Time series data are collected over several time periods.


Example
Data detailing the number of building permits issued in Jakarta in each of the last
36 months.

Graphs of time series data help analysts understand


• what happened in the past
• identify any trends over time, and
• project future levels for the time series
Data Sources
Existing Sources
• Internal company records
• Business database services
• Government agencies
• Industry associations
• Internet

Statistical Studies
• Observational or Survey
• Experiment
Descriptive Statistics
• Most of the statistical information in newspapers, magazines, company
reports, and other publications consists of data that are summarized and
presented in a form that is easy to understand.
• Such summaries of data, which may be tabular, graphical, or numerical,
are referred to as descriptive statistics.
Example
The manager of Hudson Auto would like to have a better understanding of the
cost of parts used in the engine tune-ups performed in her shop. She examines 50
customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar,
are listed below.
Descriptive Statistics
Frequency and Percent Frequency

Graphical Summary: Histogram

Numerical Descriptive Statistics


 The most common numerical
descriptive statistic is the mean
(or average).

 Hudson’s mean cost of parts,


based on the 50 tune-ups studied
is $79 (found by summing up the
50 cost values and then dividing
by 50).
Statistical Inference
Statistical inference: The process of using data obtained from a
sample to make estimates and test hypotheses about the
characteristics of a population.

• Population: The set of all elements of interest in a particular study.


• Sample: A subset of the population.
• Census: Collecting data for the entire population.
• Sample survey: Collecting data for a sample.
Summarizing Data for a Categorical
Variable
• Frequency Distribution
• Relative Frequency Distribution
• Percent Frequency Distribution
• Bar Chart
• Pie Chart
Frequency, relative frequency & %
frequency Distribution
• A frequency distribution is a tabular summary of data showing the
number (frequency) of observations in each of several non-overlapping
categories or classes. The objective is to provide insights about the data
that cannot be quickly obtained by looking only at the original data.

• A relative frequency distribution is a tabular summary of data showing


the relative frequency for each class. The relative frequency of a class is
the fraction or proportion of the total number of data items belonging
to the class.

• A percent frequency distribution is a tabular summary of a set of data


showing the percent frequency for each class. The percent frequency of
a class is the relative frequency multiplied by 100.
Example: Marada Inn
Guests staying at Marada Inn were asked to rate the quality of their
accommodations as being excellent, above average, average, below
average, or poor. The ratings provided by a sample of 20 guests are:

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
Frequency, relative frequency & %
frequency Distribution

Rating Rating Frequency Relative Frequency Percent Frequency


Poor Poor 2 .10 10
Below Average
Below Average 3 .15 15
Average Average 5 .25 25
Above Average
Above Average 9 .45 45
Excellent Excellent 1 .05 5
Total 20 Total 1.00 100
Bar Chart
• A bar chart is a graphical display for depicting qualitative data.
• On one axis (usually the horizontal axis), we specify the labels that
are used for each of the classes.
• A frequency, relative frequency, or percent frequency scale can be
used for the other axis (usually the vertical axis).
Pie Chart
• The pie chart is a commonly used graphical display for presenting
relative frequency and percent frequency distributions for categorical
data.
• First draw a circle; then use the relative frequencies to subdivide the
circle into sectors that correspond to the relative frequency for each
class. Since there are 360 degrees in a circle, a class with a relative
frequency of .25 would consume .25(360) = 90 degrees of the circle.
Summarizing Quantitative Data
• Frequency Distribution
• Relative Frequency and Percent Frequency Distributions
• Histogram
• Cumulative Distributions
• Scatter diagram
Frequency Distribution
• The three steps necessary to define the classes for a frequency
distribution with quantitative data are:
1. Determine the number of non-overlapping classes.
2. Determine the width of each class.
3. Determine the class limits.

Guidelines for Determining the Number of Classes


 Use between 5 and 20 classes.
 Data sets with a larger number of elements usually require a larger number of
classes. Smaller data sets usually require fewer classes.
 The goal is to use enough classes to show the variation in the data, but not so
many classes that some contain only a few data items.
Frequency Distribution
Guidelines for Determining the Width of Each Class
• Use classes of equal width to reduces the chance of inappropriate
interpretations
• Approximate Class Width =

Note on Number of Classes and Class Width


• In practice, the number of classes and the appropriate class width are
determined by trial and error.
• Ultimately, the analyst uses judgment to determine the combination of
the number of classes and class width that provides the best frequency
distribution for summarizing the data.
Frequency Distribution
Guidelines for Determining the Class Limits
• Class limits must be chosen so that each data item belongs to one and
only one class.
• The lower class limit identifies the smallest possible data value assigned
to the class. The upper class limit identifies the largest possible data
value assigned to the class.
• The appropriate values for the class limits depend on the level of
accuracy of the data.
• An open-end class requires only a lower class limit or an upper class
limit.

Class Midpoint
• In some cases, we want to know the midpoints of the classes in a frequency
distribution for quantitative data.
• The class midpoint is the value halfway between the lower and upper class
limits.
Example: Hudson Auto Repair
Sample of Parts Cost($) for 50 Tune-ups

If we choose six classes:


Approximate Class Width =
(109 - 50)/6 = 9.83  10
Relative Frequency and Percent Frequency
Distributions
Histogram
Cumulative Distributions

• The last entry in a cumulative frequency distribution always equals the total
number of observations.
• The last entry in a cumulative relative frequency distribution always equals 1.
• The last entry in a cumulative percent frequency distribution always equals 100.
Ogive
In statistics, an ogive is a
free-hand graph showing
the curve of a cumulative
distribution function. The
points plotted are the upper
class limit and the
corresponding cumulative
frequency.
Scatter Diagram and Trendline
• A scatter diagram is a graphical presentation of the relationship between
two quantitative variables.
• One variable is shown on the horizontal axis and the other variable is
shown on the vertical axis.
• The general pattern of the plotted points suggests the overall
relationship between the variables.
• A trendline provides an approximation of the relationship.

A Positive Relationship A Negative Relationship


Scatter Diagram Attendance Level
71
Final Score
80
78 82
86 85
78 77
93 90
100 95
78 85

You might also like