0% found this document useful (0 votes)
4 views

Lecture 7 Review Basic Statistics

The document provides a comprehensive review of basic statistics, covering key concepts such as definitions of data, types of data, sampling techniques, data collection methods, organization of data, and data analysis including measures of central tendency and dispersion. It explains qualitative and quantitative data, various sampling methods like simple random and stratified sampling, and outlines methods for data collection such as interviews and questionnaires. Additionally, it discusses how to analyze data through measures like mean, median, mode, variance, and standard deviation.

Uploaded by

pkoralyostevelyn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 7 Review Basic Statistics

The document provides a comprehensive review of basic statistics, covering key concepts such as definitions of data, types of data, sampling techniques, data collection methods, organization of data, and data analysis including measures of central tendency and dispersion. It explains qualitative and quantitative data, various sampling methods like simple random and stratified sampling, and outlines methods for data collection such as interviews and questionnaires. Additionally, it discusses how to analyze data through measures like mean, median, mode, variance, and standard deviation.

Uploaded by

pkoralyostevelyn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Review on Basic Statistics

Historically, statistics is state arithmetic. In very early times the state or government kept records to aid
in the collection of taxes and the provision of military service.

Objectives;

Given that this course incorporates the topic of basic statistics only as a review on previous
knowledge, the main objective is subsequently to consolidate previous understanding on the following
important areas;

o Definition of data
o Different types of data
o Sampling techniques
o Different methods of data collection
o Organization of data
o Analysis of Data
 Measures of central tendency
 Measures of dispersion

1. Data

The word data refers to a collection of related facts gathered via one of the four data collecting
methods. Data is plural, singular datum.

2. Types of Data

Data collected or gathered may fall into two categories – qualitative or quantitative.

Qualitative Data: refers to those data that which cannot be expressed numerically but by some code.
For example, data gathered on sex – male/female, color – red/blue/yellow/pink, etc.

Quantitative Data: On the other hand, quantitative data refers to those data that which can be
expressed numerically like data on students’ height and age or number of vehicles passing through a
certain spot, etc. For example, Peter’s height could be expressed as 1.78 m and his age 19 years, or the
total number of vehicles passing through a certain point, say, the University Circle, expressed as being
25 vehicles.

Further divisions are created. For example, quantitative data can further be divided into either discrete
or continuous.

Discrete Data: discrete refers to those that can be expressed using only whole (or counting) numbers
usually referred to as integers. Whole (counting) numbers can be used to represent count of objects
like total number of vehicles or people.

Continuous Data: refers to those that can be expressed using real (decimal) numbers. Variables like
heights or distance can assume any values on a real number line, including the decimal numbers.
3. Collecting Data

In any survey, the surveyor must first identify the quantity to measure. Secondly, based on the type of
measure, he/she engages an appropriate method of selecting the subjects for the survey.

The survey can be performed on two sizes of data.

(i) Census - Where all individual elements in the population become subjects of the survey
or data collection exercise. Sampling is therefore not an option.

(ii) Sample - Where a representative portion of the population is selected to be


subjects of the survey. Here different sampling techniques can be used based on the
subject on which the measurements is to be conducted.

4. Sampling Techniques.

a) Simple Random Sampling – the subject is picked at random from the population
where each member of the population has an equal chance of being selected.

Also called representative or proportionate sampling since all groups of the


population should be proportionately represented in the sample.

b) Stratified sampling – population is separated into strata that share the same
characteristics before applying simple random sampling.

Example, in surveying views on Equal Rights Amendments, use gender as a basis


for creating two strata. Use SRS to collect data from each stratum.

c) Systematic sampling – where the surveyor engages the kth element as a subject for
the measure.

Example, Use a telephone directory of 10, 000 names from which we choose 200 of
the names. This is done by randomly selecting from the first 50 names then we
choose the 50th name after the randomly selected name.

d) Cluster sampling – First divide the population into sections and then randomly
select a few of those sections. Can then choose all the members from these sections.

Example, in conducting a pre-election poll, for the governor of Central Province, we


could randomly select 15 wards and then survey 50 people from each of the selected
wards.
This would be much more efficient then selecting one person from each of the
available wards in Central Province.
5. Methods of Data Collection

Four ways of collecting data:


 Interviews
 Questionnaires
 Observation
 Records review

Review each data collection method and summarize.

Methods of data collection

Method advantage Disadvantage


-easier to interpret information from how the Expensive in terms of time and
INTERVEW information is relayed (body language) money.
-subject is more forthright when answering in spoken
A pre-planned meeting form.
between information gatherer -open minded and flexible questions prompts
and one or more subjects. individuality in the manner that the subject replies.
Two types
Structured and non-structured
Interviews.

OBSERVATION Data are of high quality due to the observation of Problems with observation
actual event. requiring the observer to be on
Observer watches, or walks the scene when the event
through, the actual process Data is of real time value occurs.
associated with the subject of
interest. Data is highly believable

QUESTIONARE
High-volume responses, inexpensive to administer, No flexibility of questions, no
A special-purpose document fast and efficient. probing questions, no follow-
that requests specific up questions.
information from May have low return of
respondents. questionnaire

RECORDS REVEW Very inexpensive method of data collection. Records may not be available
If records exists then all information are accessible.
A research into old records to
extract data

6. Organization of Data

Tables
- an array of data as in the fire engine call array of data
- the popular use of the frequency distribution, in table form
- the use of the frequency distribution table to include cumulative frequency
Graphical methods
- The ogive, a graph of the cumulative frequency against scores.
- Frequency histogram – a vertical bar graph of frequencies against scores
- A frequency Polygon – constructed from linking the midpoints of the apex of each
frequency columns.
7. Data Analysis

a) Measures of central Tendency

In most frequency distributions, the majority of cases or scores tend to cluster about some central value
with a few cases at the upper end and a few cases at the lower end. Of the many measures of central
tendency used to describe this state of affairs, we shall consider the mode, the mean and the median.

The mode is the score which occurs the most. It is the score with the highest frequency.
The mean is the average of all the scores, the sum of all the scores divided by the number of scores.
The median is the middle score when the scores are arranged in order of size from smallest to largest.

Read on the method of evaluation of each of these measures and perform the relevant exercises in your
CLN Text

The mean

For simple data From FDT From FDT


(countable number of (single) (Grouped Data)
observations)

∑𝑥 ∑ 𝑓𝑥 ∑ 𝑓𝑥
𝑥̅ = 𝑥̅ = 𝑥̅ =
𝑛 ∑𝑓 ∑𝑓

𝑥 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑓𝑥,


𝑖𝑠 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑐𝑒𝑛𝑡𝑟𝑒.

The median

The middle observation when all scores are arranged in increasing order
When n is odd the median is one value. When n is even, the median is the average of the two middle
observations.

Easily read from a cumulative frequency distribution with the median corresponding to the 50th
percentile of frequencies.

In grouped data the median class contains the median value.


Example: Determine the median score from the following FDT

Number of Cumulative
Magazines Frequency frequency
x f
0 2 2
1 12 14
2 49 63
3 64 127
4 43 170
5 20 190
6 9 199
7 1 200

f = 200

The 63rd score is the last observation of 2, the 64th is the first observation of 3 and the
127th score is the last observation of 3.

Hence, the 100th score is a 3 and the 101st score is also a 3.

Median = middle score


= (100th score + 101st score)/2
= (3 + 3)/2
= 3

The mode is by comparison, the easiest to evaluate from a set of data. It is the score or observation that
occurs most often, or simply that with the highest frequency. In grouped data, the modal class is the class
with the highest frequency.

b) Measures of Dispersion

The second stage in analyzing the numerical information is to investigate the variability of the data; that
is the spread or scatter of values from the ‘average’.

In this course the following measures will be reviewed;

(a) the range,


(b) mean deviation, ( left for student to review)
(c) the variance, and
(d) the standard deviation,

The Range -

This is the difference between the highest and lowest score. It is not an ideal measure of dispersion; its
weakness is that it is based only on two extreme values of the distribution and thus may not give
sufficient detail about the scatter of all the scores from the mean.

This measure is important only in some relevant data reporting scenarios. An example of such a
relevance is weather statistics where there is compulsory reporting on the extremes of atmospheric
temperature, rainfall or pressure.
The variance is a widely used measure of dispersion but it has the disadvantage that it is expressed in
units which are the squares of the original units. For many purposes it is desirable that a measure of
dispersion be expressed in the same units as the original data and their mean. Such a measure of
dispersion is the standard deviation which is obtained by simply taking the square root of the variance.

Variance and Standard Deviation


The steps:
 Find the deviation of each score from the mean, 𝑥 − 𝑢
 Square the deviations from the mean, (𝑥 − 𝑢)2
 The average of the squared deviations from the mean, called the variance, is used as a measure of
dispersion.

Variance Standard deviation

For simple data


∑(𝑥 − 𝜇)2
2 = ∑(𝑥 − 𝜇)2
𝑛 = √
𝑛

From FDT(large data size) ∑ 𝑓(𝑥 − 𝜇)2


2 = ∑ 𝑓(𝑥 − 𝜇)2
∑𝑓 = √
∑𝑓

Similar, for grouped data, except that 𝑥 is the class center. The class center is used to
calculate the individual deviations from the mean.
Example:

Question:
Complete the table and determine the variance and standard deviation of the
distribution.

class f
21 -25 5
26 - 30 11
31 - 35 4
36 - 40 13
41 - 45 7
46 - 50 10

Solution

Steps

1. Calculate the class centers by calculating the average of the upper and lower values.

2. Evaluate f, the product of each score (class center) with corresponding frequency.

3. Calculate sum of 𝑓𝑥 , ∑ 𝑓𝑥

∑ 𝑓𝑥
4. Calculate the mean, 𝜇 = = 1830/50 = 36.6
∑𝑓

5. Calculate the individual deviations from the mean, 𝑥 − 𝑢

6. Square the deviations, ( 𝑥 − 𝑢 )2

7. Evaluate the product of frequency and squared deviations, 𝑓(𝑥 − 𝜇)2

8. Evaluate the sum, ∑𝑓(𝑥 − 𝜇)2

Use the appropriate formula to calculate the variance and the standard deviation.

Variance =

∑ 𝑓(𝑥−𝜇)2 3402
2 = ∑𝑓
= 50
= 68.04

Standard deviation =

∑ 𝑓(𝑥−𝜇)2
= √ ∑𝑓
= √68.04 = 8.2
The Frequency Distribution Table.

∑ 𝑓𝑥 1830
Mean = 𝜇 = ∑𝑓
= 36.6
50

∑ 𝑓(𝑥−𝜇)2 3402
2 = ∑𝑓
= 50
= 68.04

∑ 𝑓(𝑥−𝜇)2
= √ ∑𝑓
= √68.04 = 8.2

Identify.

a) the median class and


b) the modal class

You might also like