0% found this document useful (0 votes)
432 views48 pages

STA121 Lecture Note

The document is a set of lecture notes for STA121: Descriptive Statistics I, authored by Dr. Emmanuel Torsen at Modibbo Adama University. It covers various topics including definitions of statistics, types and sources of statistical data, methods of data collection, and data presentation techniques. The notes also include exercises and examples to illustrate the concepts discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
432 views48 pages

STA121 Lecture Note

The document is a set of lecture notes for STA121: Descriptive Statistics I, authored by Dr. Emmanuel Torsen at Modibbo Adama University. It covers various topics including definitions of statistics, types and sources of statistical data, methods of data collection, and data presentation techniques. The notes also include exercises and examples to illustrate the concepts discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STA121: DESCRIPTIVE STATISTICS I 2 UNITS

(LECTURE NOTES)

Dr. EMMANUEL TORSEN

(Senior Lecturer and Head, Department of Statistics)

DEPARTMENT OF STATISTICS, FACULTY OF PHYSICAL

SCIENCES, MODIBBO ADAMA UNIVERSITY, YOLA

February 13, 2025


Table of Contents

Table of Contents i

List of Tables iv

List of Figures v

1 COURSE CONTENTS 1

2 STATISTICAL DATA 2
2.1 Definition of Statistics 2
2.2 Levels of Measurement 2
2.2.1 Areas of Application 2
2.3 Uses of Statistical Data 4
2.4 Types of Statistical Data 5
2.5 Sources of Statistical Data 5
2.5.1 Primary Sources of Statistical Data 5
2.5.2 Secondary Sources of Statistical Data 6
2.6 Methods of Data Collection 7
2.6.1 Surveys and Questionnaires Method 8
[Link] Advantages of Surveys and Questionnaires Method 8
[Link] Disadvantages of Surveys and Questionnaires Method 8
2.6.2 Interview Method 9
[Link] Advantages of Interview Method 9
[Link] Disadvantages of Interview Method 9
2.6.3 Observation (Experiment) Method 10

i
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

[Link] Advantages of Observation Method 10


2.6.4 Documentary Method 10
[Link] Advantages of Documentary Method 11
[Link] Disadvantages of Documentary Method 11

3 DATA PRESENTATION: DESCRIPTIVE STATISTICS 16


3.1 Introduction 16
3.2 Frequency Distribution 16
3.3 Basic Definitions 16
3.4 Construction of Grouped Frequency Distribution 18
3.5 Graphical Description of Data 20
3.5.1 Pie Chart 20
3.5.2 Bar Chart 22
3.5.3 Component and Multiple Bar Charts 23
3.5.4 Histogram 24
3.5.5 Frequency Polygon 25
3.5.6 Ogive 26
3.6 Exercises 27

4 NUMERICAL DESCRIPTION OF DATA 29


4.1 Measures of Central Tendency (Location) 29
4.1.1 The Mean 29
4.1.2 The Median 31
4.1.3 The Mode 34
[Link] Obtaining the Mode from Histogram 35
4.2 Measures of Dispersion 36
4.2.1 The Range 36
4.2.2 Mean Deviation 37
4.2.3 Variance 37
4.2.4 Standard Deviation 37
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

4.2.5 Coefficient of Variation (CV) 38


List of Tables

3.1 Scores obtained by 40 students from a mathematics examination 18


3.2 Frequency distribution table 19
3.3 Frequency distribution table showing class boundaries 20
3.4 Primary school enrollment for 4 LGAs in Adamawa state 21
3.5 Number of students enrolled for different Department in an University
of Education 27

iv
List of Figures

3.1 Pie Chart showing primary school enrollment in 4 LGAs 22


3.2 Bar Chart showing primary school enrollment in 4 LGAs 23
3.3 Component bar chart showing enrollment of male and female students 24
3.4 Multiple bar chart showing enrollment of male and female students 24
3.5 A typical Histogram 25
3.6 A typical Frequency Polygon 26
3.7 A typical Ogive 26

4.1 Histogram showing how to obtain mode 36

v
Chapter 1

COURSE CONTENTS
(1) Definition of Statistics.

(2) Statistical data:

(i) Types,

(ii) Sources, and

(iii) Methods of data collection.

(3) Areas of Statistical applications.

(4) Presentation of data:

(i) Tables,

(ii) Charts and graphs: Bar charts, pie charts, histogram, polygon and curves.

(5) Frequency and cumulative distributions.

(6) Numerical description of data:

(i) Measures of central tendency (location).

(ii) Measures of partition.

(iii) Measures of dispersion.

(iv) Measures of skewness and kurtosis.

1
Chapter 2

STATISTICAL DATA

2.1 Definition of Statistics

Statistics can be defined as a scientific method of collecting, organizing, summarizing,


presenting and analyzing data as well as making valid conclusion based on the analysis
carried out.

Statistics is the science of collecting, organizing, presenting, analyzing, and interpret-


ing data to assist in making more effective decisions.

2.2 Levels of Measurement

(i) Nominal: Data that represent categories without any specific order (e.g., types
of schools: public, private).

(ii) Ordinal: Data that represent categories with a meaningful order but no consis-
tent intervals (e.g., class ranks).

(iii) Interval: Numeric data with meaningful intervals but no true zero point (e.g.,
SAT scores).

(iv) Ratio: Numeric data with a true zero, allowing for meaningful comparisons
(e.g., attendance rates).

2.2.1 Areas of Application

Statistics involve manipulating and interpreting numbers. The numbers are intended
to represent information about the subject to be investigated. The science of statistics

2
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

deals with information gathering, condensation and presentation of such information


in a compact form, study and measurement of variation and of relation between two
or more similar or identical phenomena. It also involves estimation of the character-
istics of a population from a sample, designing of experiments and survey and testing
of hypothesis about populations. Statistics is concerned with analysis of information
collected by a process of sampling in which variability is likely to occur in one or more
outcomes.

Statistics can be applied in any field in which there is extensive numerical data. Exam-
ples include engineering, sciences, medicine, accounting, business administration and
public administration. Some major areas where statistics is widely used are discussed
below.

(a) Industry: Making decision in the face of uncertainties is a unique problem faced
by businessmen and industrialist. Analysis of history data enables the business-
men to prepare well in advance for the uncertainties of the future. Statistics has
been applied in the market and product research, feasibility studies, investment
policies, quality control of manufactured products selection of personnel, the
design of experiments, economic forecasting, auditing and several others.

(b) Biological Science: Statistics is used in the analysis of yield of varieties of crops
in different environmental conditions using different fertilizers. Animal response
to different diets in different conditions could also be studied statistically to en-
sure optimum application of resources. Recent advancement in medicine and
public health has been greatly enhanced by statistical principles.

(c) Physical Sciences: Statistical metrology has been used to aid findings in astron-
omy, chemistry, geology, meteorology, and oil explorations. Samples of mineral
resources discovered at a particular environment are taken to examine its essen-
tial and natural features before a decision is made on likely investment on its
exploration and exploitation. Laboratory experiments are conducted using sta-
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

tistical principles.

(d) Government: A large volume of data is collected by government at all levels on


a continuous basis to enhance effective decision making, Government requires
an up-to-date knowledge of expenditure pattern, revenue, estimates, human pop-
ulation, health, defense and internal issues. Government is the most important
user and producer of statistical data.

(e) Education:

(f) Health:

(g) Finance:

(h) Judiciary:

(i) and many more

2.3 Uses of Statistical Data

The following are uses of statistical data:

(i) Statistics summarizes a great bulk of numerical data constructing out of them
source representative qualities such as mean, standard deviation, variance and
coefficient of variation.

(ii) It permits reasonable deductions and enables us to draw general conclusions


under certain conditions.

(iii) Planning is absorbed without statistics. Statistics enables us to plan the future
based on analysis of the historical data.

(iv) Statistics reveals the nature and pattern of the variations of a phenomenon through
numerical measurement.

(v) It makes data representation easy and clear.


Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

2.4 Types of Statistical Data

There are two types of Statistical data:

(i) Qualitative (Categorical) Data: Describes characteristics or attributes (e.g.,


student gender, level of education).

(ii) Quantitative (Numerical) Data: Numerical measures (e.g., test scores, gradu-
ation rates).

Quantitative data can be further classified into:

(i) Discrete Data: Countable items, such as the number of students in a class.

(ii) Continuous Data: Data that can take any value within a range, such as the time
spent studying for an examination, weight of students in a class e.t.c.

2.5 Sources of Statistical Data

Statistical data can be obtained from various sources. These sources can be broadly
categorized into primary and secondary sources, and they serve different research,
policy-making, and evaluation purposes.

(i) The primary data, and

(ii) The secondary data

2.5.1 Primary Sources of Statistical Data

Primary data refers to information collected directly from the field for a specific re-
search purpose.
When the researcher decides to obtain statistical information by going to the origin
of the source, we say that such data are primary data. This happens when there is
no existing reliable information on the topic of interest. The first hand collection of
statistical data is one of the most difficult and important tasks a statistician would carry
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

out. The acceptance of and reliability of the data so called will depend on the method
employed, how timely they were collected, and the caliber of people employed for the
exercise.

Advantages of Primary Sources of Statistical Data

(i) The investigator has confidence in the data collected

(ii) The investigator appreciates the problems involved in data collection since he or
she is involved at every stage.

(iii) The report of such a survey is usually comprehensive.

(iv) Definition of terms and units are usually included.

(v) It normally includes a copy of schedule use to collect the data.

Disadvantages of Primary Sources of Statistical Data

(i) The method is time consuming.

(ii) It is very expensive.

(iii) It requires considerable manpower.

(iv) Sometimes the data may be obsolete at the time of publication.

2.5.2 Secondary Sources of Statistical Data

Secondary data refers to data that has already been collected and published by other
researchers, institutions, or governments. These sources are typically used when con-
ducting large-scale analyses or historical comparisons.
Sometimes statistical data may be obtain from existing published or unpublished sources,
such as statistical division in various ministries, banks, insurance companies, print me-
dia, and research institutions. In all these areas data are collected and kept as part of the
routine jobs. There are may be no particular importance attached to the data collected.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Thus, the figure on vehicle license renewals and new registration of vehicles can first
be obtained from the Board of Internal Revenue through their daily records. The in-
vestigator interested in studying the type of new vehicles brought into the country for a
particular year will start with the data from the custom department or Board of Internal
Revenue.

Advantages of Secondary Sources of Statistical Data

(i) They are cheap to collect.

(ii) Data collection is less time consuming as compared to primary source.

(iii) The data are easily available.

Disadvantages of Secondary Sources of Statistical Data

(i) It could be misused, misrepresented or misinterpreted.

(ii) Some data may not be easily obtained because of official protocol.

(iii) Then information may not be conformed to the investigators needs.

(iv) It may not be possible to determine the precision and reliability of the data,
because the method used to collect the data is usually not known.

(v) It may contain mistakes due to errors in transcription from the primary source.

2.6 Methods of Data Collection

Data are generally and collected to provide useful and meaningful information about
the observation under study. The entire planning and execution a survey depends on
data availability which is greatly influenced by the method of data collection. Decision
and choice of the method of the collection should be arrived after careful consideration
of aims and objectives of the survey. The nature of information is needed, the pop-
ulation under study degree of accuracy, practically, time and cost. The following are
methods of data collection:
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(a) Surveys and Questionnaire method

(b) Interview method

(c) Observation method

(d) Documentary method

2.6.1 Surveys and Questionnaires Method

This method involves the use of questionnaire (Statistical format) to collect the needed
statistical information. Questionnaires are specially designed forms meant to extract
available information from respondent (i.e. persons, group of persons, organization or
institutions), Questionnaire can comprise of different logical arranged question which
are supposed to be answered by respondent. After answering the question, the respon-
dents is expected to send the format back to the source. This method is widely and
commonly used in collecting information because of certain advantages it has.

[Link] Advantages of Surveys and Questionnaires Method

(i) Wide coverage: Questionnaire can be distributed to a large number of individ-


uals, groups or institutions etc.

(ii) Saves time: Questionnaire can be distributed, filled and return back to the source
within a very short time.

(iii) Less cost: money cost incurred by using a questionnaire to collect information
is relatively small e.g. you send questionnaire to the institute in London and get
the questionnaire back with little amount of money.

(iv) Different type of information can be collected by using questionnaire method.

[Link] Disadvantages of Surveys and Questionnaires Method

(i) Problem of designing questionnaire itself: Care should be taken in its design
to remove ambiguity, repetition and out of point questions.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(ii) Non-response: there is tendency that the respondent may not respond to a ques-
tion due to low level of awareness or knowledge of the importance of such ques-
tionnaire.

(iii) Incomplete or inaccurate: there is tendency that the respondent may not com-
plete the questionnaire or give accurate answers to the questions.

(iv) Wrong or false information: the respondents may deliberately give wrong or
false information without the investigator knowledge.

2.6.2 Interview Method

The method involves a personal contact of the interviewer with the respondent (inter-
view) during which the interviewer asks the respondent a series of questions concern-
ing the subject matter and the respondent is expected to answer. It is an oral interaction
between the two parties. Interview can both be face to face question and answer or
through the telephone.

[Link] Advantages of Interview Method

(i) It allows free face to face interaction between the interviewer and the respondent.

(ii) It allows more detailed information to be collected with full explanation

(iii) It allows the interviewer to guides or directs the respondent accurately in com-
pleting information.

(iv) False information can be checked or corrected when noticed by the interviewer.

[Link] Disadvantages of Interview Method

(i) Only small areas is covered because it is not possible to interview a large number
of persons.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(ii) Time wasting when covering large number of individuals, high cost; since the
interviewer has to follow the respondent one after the other to respective desti-
nation.

(iii) Appointment should be booked in advance with the respondent before conduct-
ing the interview.

(iv) Respondent personal feelings may influence the accuracy of the information
given.

2.6.3 Observation (Experiment) Method

This is a method of systematic and scientific enquiry used to collect data in almost
all disciplines and for controlled experiments such as biological, social, economic and
physical laboratory experiments. It is on the spot watch of an event taking place or
happening. It is rampantly used in experiment because of its high degree of accuracy
and efficiency in providing the needed information.

[Link] Advantages of Observation Method

(i) Information collected are directly from the life happening since events are recorded
as they happens.

(ii) There is contact between the observer and the subject matter being observed.

(iii) The observation can guide control and influence the process to obtain the most
accurate information.

(iv) The information recorded through this method is highly rehired since the recorded
information is what the observers has seen with his/her own eyes.

2.6.4 Documentary Method

The use of the already existing report (documents) of past/present to predict future
events or phenomena are known as documentary method of data collection. Documents
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

may either be official or unofficial, in this situation; if information is collected from an


official document then the data is referred to as official and if otherwise is unofficial.
This method makes use of books, newspapers, report, bulletin from official sources.

[Link] Advantages of Documentary Method

(i) Save time, information can extracted from an existing document within a short
time.

(ii) Provide old information about an event happening can be recorded.

(iii) Less labor input involved is collecting information through documents.

(iv) Less cost, money cost incurred by this method is small.

(v) Provide official information.

[Link] Disadvantages of Documentary Method

(i) If the information collected was from initial stage wrong, inaccurate or incom-
plete information will be collected.

(ii) If false official and documented information is recorded in a document, there is


practically no room to correct such information.

These documentary sources includes but is not limited to:

(A) Government Publications and Reports

Governments frequently collect and publish statistical data. These publications


often contain a wealth of demographic, enrollment, and performance-related
data. Examples are:

(i) UNESCO Institute for Statistics (UIS): Provides global and regional
data on education (e.g., enrollment rates, literacy rates).
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(ii) National Bureau of Statistics (NBS): In countries like Nigeria, NBS pub-
lishes reports on educational attainment, school enrollment, and examina-
tion performance.

(iii) Ministry of Education Reports: Most countries have annual or periodic


reports that cover various educational statistics, such as student-teacher ra-
tios, school infrastructure, and examination outcomes.

(B) International Organizations

Several international organizations collect, compile, and disseminate educational


data across countries, often for comparison and policy formulation purposes.
Examples are:

(i) OECD (Organization for Economic Cooperation and Development):


Publishes the PISA (Programme for International Student Assessment) data,
which compares student performance in mathematics, reading, and science
across countries.

(ii) World Bank Education Statistics: Contains global data on education,


including school participation and government spending on education.

(iii) UNICEF Education Reports: Provides data on access to education, eq-


uity, and the impact of policies on children’s learning outcomes.

(C) Institutional Records

Schools, universities, and other educational institutions maintain various records


and data systems that can be valuable sources of data. Examples are:

(i) Student attendance records.

(ii) Graduation and dropout rates.

(iii) Grade and performance reports.

(iv) Admission and enrollment statistics.


Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(D) Educational Databases

Numerous databases compile data on different aspects of education, from student


outcomes to teacher effectiveness. These databases are often used for research,
benchmarking, and policymaking. Examples are:

(i) NCES (National Center for Education Statistics): The primary federal
entity for collecting and analyzing data related to education in the United
States. It conducts surveys such as the National Assessment of Educational
Progress (NAEP).

(ii) EdStats (World Bank): A comprehensive source of data on education


systems worldwide, providing access to data and metadata for more than
200 countries.

(iii) ERDC (Educational Research Data Center): Provides data on students,


schools, and the overall education system in several countries.

(E) Academic Journals and Research Papers

Researchers and scholars frequently publish studies that involve the collection
and analysis of educational data. These studies are a rich source of secondary
data, especially for specific topics or specialized areas of research. Examples
are:

(i) Published articles in journals like the Journal of Educational Psychology,


Educational Researcher, and Review of Educational Research.

(ii) Dissertations and theses available in university repositories that provide


statistical analysis of educational phenomena.

(F) Online Learning Platforms

Online education platforms collect vast amounts of data on student learning be-
haviors, course completion rates, and instructional effectiveness. Examples are:
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(i) Data from Massive Open Online Courses (MOOCs) platforms like Cours-
era and edX.

(ii) Learning analytics from school management software such as Blackboard


or Canvas.

(G) Large-Scale Educational Surveys

Several large-scale surveys and assessments are designed to collect educational


data across different countries, regions, or populations.

(H) National Educational Surveys

Many countries conduct nationwide surveys to collect data on the state of edu-
cation, student performance, and school infrastructure. Examples are:

(i) Annual School Census: Conducted in countries like Nigeria to collect


school-level data on enrollment, staff, and facilities.

(ii) National Education Surveys: These are periodic surveys to assess literacy
rates, educational access, and quality across different demographic groups.

(I) International Educational Assessments

These assessments collect and analyze data on student performance globally,


providing insights into educational systems’ effectiveness. Examples are:

(i) PISA (Programme for International Student Assessment): Adminis-


tered by the OECD, PISA measures 15-year-old students’ abilities in read-
ing, mathematics, and science across the world.

(ii) TIMSS (Trends in International Mathematics and Science Study): Con-


ducted by the International Association for the Evaluation of Educational
Achievement (IEA), TIMSS assesses mathematics and science knowledge
of fourth and eighth-grade students globally.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(iii) PIRLS (Progress in International Reading Literacy Study): A global


assessment measuring reading comprehension at the fourth-grade level.

(J) Social Media and Web-Based Data

In recent years, social media platforms and educational websites have become
valuable sources of informal educational data. Examples are:

X
(i) Data from platforms like Twitter ( ), Facebook, and Reddit can be used
to analyze public opinions on education policies or learning trends.

(ii) Educational websites like Google Scholar and ResearchGate offer access
to peer-reviewed articles and research papers that provide data or analysis
on various educational topics.
Chapter 3

DATA PRESENTATION: DESCRIPTIVE

STATISTICS

3.1 Introduction

Data arise from measurements taken on variables. Variables are certain characteristics
that can assume different values e.g height, weight, income age, etc. Some of these
variables assume values which can be stated in numbers and are said to be numerical
or quantitative variables e.g. age, temperature etc. other variables cannot assume
numerical values and can only be designated by names or classified and are called
qualitative variables.

3.2 Frequency Distribution

Raw data are collected data, which have not been organized or arrange in any form.
When a raw data have been arranged in either ascending or descending order of mag-
nitude, it is known as an array. A tabular arrangement of data showing the frequency
of the measurements (or the frequency of the different class intervals for grouped data)
is called: frequency distribution or frequency table.

3.3 Basic Definitions

(a) Class: A group of observations or measurements

(b) Class Limits:

(i) Lower class limit; which gives the lower bound of a class.

(ii) Upper class Limit; which gives the upper bound of a class.

16
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

This means that values higher than the upper class limit cannot belong to that
class similarly values lover then the lower class limit cannot belong to that class.

(c) Clans Interval: This indicates the range of values that can be classified as mem-
ber of the class e.g. the interval 20 − 29, takes care of values from 20 and up to
29. An interval with upper and lower limits is called closed interval. An inter-
val where either upper limit or lo lower limit is not indicated is called an open
interval e.g. 20 and above, less the n 50, etc.

(d) Class Boundaries: the upper class boundary of a class to the lower class limit
of the next higher class and the sum divided by 2. Similarly, the lower class
boundary of a class is obtained the class interval by adding the lower class and
the sum divided by 2. Class boundaries are also called true class limits.

(e) Class Size: This is the difference between the upper class boundary and the
lower class boundary of any class. It is also called the class width. It can also be
obtained by subtracting the lower class limit of a class from the lower limit of
next higher class.

(f) Class Marks or Midpoints: These arc the averages of the upper and lower class
limits (or boundaries).

(g) Frequency is the number of observations in a class.

(h) Relative Frequency: This is the ratio of a class frequency to the sum of fre-
quencies of all the classes. Sometimes it is expressed in percentages.

(i) Cumulative Frequency: The cumulative frequency of a given class is obtained


by the addition of all class frequencies up to that class starting from the lowest
class.

(j) Cumulative Frequency Distribution: This is a tabular arrangement of data


which shows the number of observations that are less than a given upper class
boundary.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

3.4 Construction of Grouped Frequency Distribution

(a) From the raw data, find the range. This is the difference between the largest and
the lowest values.

(b) Divide the range into a convenient number of non-overlapping class intervals,
making sure that the lowest and the highest classes can accommodate the least
and largest values respectively. Note that the size of a class interval is approxi-
mately equal to the range divided by the number of classes.

(c) Determined the number of observations falling into each class interval by tally-
ing. For case of counting the tallies, tallies are in bundles of 5 tallies.

Note that there is nothing sacrosanct about constructing a frequency table; the steps
given above are just as guide. The important thing i to have a convenient number of
non-overlapping class intervals that will accommodate the data under consideration.
Example 2.1: Consider the following marks obtained by students from a mathematics
examination.

Table 3.1: Scores obtained by 40 students from a mathematics examination

Construct a frequency table of 8 classes.


Solution:
Range
N umberof Classes
is approximately equal to the size of the class intervals
71
=⇒ 8
= 8.87 ≈ 9
So, we are going to have class intervals of size 9 as follows:
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Table 3.2: Frequency distribution table

Class Interval Tally Frequency

22-30 ; 5

31-39 :: 2

40-48 ;: 6

49-57 :::: 4

58-66 ;:: 7

67-75 ;::: 8

76-84 ::: 3

85-93 ; 5

Total ... 40

Note: The value of the lower class limit of the first class need not be the need not be
the least value in the data and again the upper class limit of the last class need not be
the largest value in the data. The important thing is that the first class should be able
to absorb the least value and the last class should be able to absorb the largest values.
From Table 2.1, we can have the following.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Table 3.3: Frequency distribution table showing class boundaries

3.5 Graphical Description of Data

3.5.1 Pie Chart

This kind of chart is most suitable for qualitative data. It is a circle divided into sectors
with sector angles proportional to the frequency of the category it represents. The
sector angles are obtained by multiplying the relative frequency of each category or
class by 360o .
Example 2.2: The table below shows primary school enrollment (in 000) for four
LGAs in Adamawa State for he year 2023.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Table 3.4: Primary school enrollment for 4 LGAs in Adamawa state

Represent the information on a pie chart.


Solution: First we obtain the sector angles
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Figure 3.1: Pie Chart showing primary school enrollment in 4 LGAs

3.5.2 Bar Chart

This consists of separated vertical bars (rectangles) of equal width in which the height
of each bar is proportional to the frequency of the class it represents. The gap separat-
ing the bars must be equal. The horizontal axis represents the classes and the vertical
axis represents the frequency. Having vertical bars, we could use horizontal bars and
the choice of vertical or horizontal representation is largely a matter of taste. In bar
chart, there are no set of rules to be observed in drawing bar charts. The following
consideration will be quite useful.
Note that:

(i) Bar chart is applicable only to discrete, categorical, nominal and ordinal data.

(ii) Bar should be neither too short nor very long and narrow.

(iii) Bar should be separated by spaces which are about one and half of the width of
a bar.

(iv) The length of the bar should be proportional to frequencies of the categories.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(v) Guide note should be provided to ease the reading of the chart.

Bar charts are used for making comparisons among categories. In the simple form
several items are presented graphically by horizontal or vertical bars of uniform width,
with lengths proportional to the values they represent.
Example 2.3: Using the table in Example 2.2 we have the following Bar Chart.

Figure 3.2: Bar Chart showing primary school enrollment in 4 LGAs

3.5.3 Component and Multiple Bar Charts

Component and Multiple Bar Charts could be used to present information where there
are sub-classes in the class.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Figure 3.3: Component bar chart showing enrollment of male and female students

Figure 3.4: Multiple bar chart showing enrollment of male and female students

3.5.4 Histogram

This is essentially a bar graph where the bars are erected on the class boundaries of
a grouped frequency distribution. The heights of the bars are determined by the class
frequencies. The width of the bars corresponds to the size of the class and the centre
of any bar is the midpoint of that class. Unlike the case of a bar chart, in histogram,
the bars are not separated they are joined together.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Figure 3.5: A typical Histogram

Class Work 2.1: Represent the information on a Histogram.

3.5.5 Frequency Polygon

This is a line graph of class frequencies against the corresponding class marks. To con-
struct a frequency polygon, we first locate the class marks for each class interval, then
plot the number of observations within a particular class interval (frequency) on the
vertical axis against the class marks on the horizontal axis. Finally, we connect adjoin-
ing points with straight lines. It can also be constructed by connecting the midpoints
of the top of the bars in a histogram.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Figure 3.6: A typical Frequency Polygon

3.5.6 Ogive

This is also called the cumulative frequency curve. This is a curve obtained by plotting
the cumulative frequency less than any upper class boundary against the upper class
boundaries. It is a very important graph because from the Ogive we can obtain the
following: median, quartiles, percentiles and other quantiles.

Figure 3.7: A typical Ogive


Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

3.6 Exercises

Exercise 2.1:

Table 3.5: Number of students enrolled for different Department in an University of


Education

Given the information Table 2.5, represent this information on a:

(i) Pie chart

(ii) Bar chart

Exercise 2.2: Using the frequency distribution below, construct a/an:

(i) Bart chart


Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(ii) Histogram

(iii) Frequency polygon

(iv) Ogive

Exercise 3: The scores in Educational Statistics of 64 students at Modibbo Adama


University, Yola are recorded below

(i) Construct a frequency distribution of the above data using a suitable class size.

(ii) Using the frequency table in (i) construct a bar chart, a histogram, a frequency
polygon, and an Ogive.
Chapter 4

NUMERICAL DESCRIPTION OF DATA

4.1 Measures of Central Tendency (Location)

These are descriptive statistics that summarize central tendency or location of a distri-
bution of measurements.

4.1.1 The Mean

One of the most common and useful measures of location in the avalanche average
also called the arithmetic mean or simply the mean The mean of a set of measurements
is equal to the sum of the measurements divided by the number of measurements(n).
Mathematically, we have the following expressions for the mean given a set of mea-
surements X1 , X2 , ..., Xn , the mean is defined as

n
1X
X̄ = Xi (4.1)
n i=1

Example 3.1: Suppose that some Geography students went on a practical and obtained
the following as records of relative humidity in a city: 82, 74, 52, 68, 67, 67, 81, 71,
79, 72. Obtain the average humidity of the city.
Solution:

n
1X
X̄ = Xi
n i=1
1
= (82 + 74 + 52 + 68 + 67 + 67 + 81 + 71 + 79 + 72)
10
1 (4.2)
= (713)
10
713
=
10
= 71.3

29
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Hence, the average (mean) humidity of the city during the period is 71.3.
The Mean of a Grouped (or frequency) Data is given by

Pn
i=1 fi Xi
X̄ = P n (4.3)
i=1 fi

Example 3.2: Given the following frequency table, calculate the Mean.

Solution:

Therefore, Pn
i=1 fi Xi
X̄ = P n
i=1 fi
1102.5 (4.4)
=
39
= 28.269
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Remarks
The mean has the following characteristics:

(i) The mean is unique- there can only be one mean for a set of [Link].

(ii) It is reliable compared to other measures of central tendency.

(iii) It takes into account the size of each individual measurement, i.e., a change in
one value changes the mean value.

(iv) It is affected by extreme values.

4.1.2 The Median

The median of any set of data is the score which divides the data into two equal parts.
It is the score which occupies the center point of the data when arranged in descending
or ascending order of magnitude. The median of a set of observations arranged in order
of magnitude is the middle value or average of the middle values (if two falls at the
center). That is;

M edian = X( n+1 ) if n is odd (4.5)


2

X( n2 ) + X( n2 +1)
M edian = if n is even (4.6)
2

Example 3.3: What is the median of the following set of data? 8, 3, 11, 2, 6.
Solution: Rearrange the data in order of magnitude, then pick the middle value as the
median since the number of items is odd: 2, 3, 6, 8, 11, with n = 5, odd.

M edian = X( n+1 )
2

= X3 (4.7)

=6

Example 3.4: Suppose that some Geography students went on a practical and obtained
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

the following as records of relative humidity in a city: 82, 74, 52, 68, 67, 67, 81, 71,
79, 72. Obtain the Median humidity of the city.
Solution: We rearrange the data in order of magnitude: 52, 67, 67, 68, 71,72, 74, 79,
81, 82, with n = 10, even.

X( n2 ) + X( n2 +1)
M edian =
2
X( 10 ) + X( 12 )
2 2
=
2
X5 + X6
=
2 (4.8)
71 + 72
=
2
143
=
2
= 71.5

For Grouped Data, the median is obtained using the formula

!
N +1
2
− cfb
M edian(Md ) = Ll + C (4.9)
f Md

Where
Ll = Lower class boundary of the median class.
N = ni=1 fi = Total frequency.
P

cfb = Cumulative frequency of the class just before the median class.
fMd = Frequency of the median class.
C = Class size.
Example 3.5: Calculate the median for the following set of data.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Solution: Using the formula:


!
N
2
− cfb
M edian(Md ) = Ll + C (4.10)
f Md

N 80
=⇒ 2
= 2
= 40; the median class is 66 − 73
Ll = 65.5, fm = 26, cfb = 22, C = 73.5 − 65.5 = 8
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

!
80
2
− 22
M edian(Md ) = 65.5 + 8
26
!
40 − 22
= 65.5 + 8
26
 18 
= 65.5 + 8 (4.11)
26
= 65.5 + 5.538

= 71.038

≈ 71

4.1.3 The Mode

The mode for a set of un-grouped measurement is the measurement with the highest
frequency.
Example 3.6:

(i) the modal measurement m the set 5, 8, 3, 2, 5, 7, 6, 7, 3, 7. The Mode = 7. This


is because 7 occurs the highest number of times.

(ii) 8, 3, 11, 2, 6. No mode in this case

(iii) 2, 8, 5, 5, 3, 4, 5, 8, 9, 8. Here there are two modes; 5 and 8 since these mea-
surements occur the same number of times and are the most frequently occurring
measurements. This kind of data set is said to be bimodal.

Remarks
The Mode has the following characteristics:

(i) It is easy to compute.

(ii) It is not unique.

(iii) It is unreliable and it does not take into account the size of individual measure-
ments.
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

For Grouped Data, the mode is obtained using the formula


!
∆1
M ode(Mo ) = Ll + C (4.12)
∆1 + ∆2

Where
Ll = Lower class boundary of the modal class.
∆1 = Difference between the modal class frequency and the frequency of the class just
before the modal class.
∆2 = Difference between the modal class frequency and the frequency of the class just
after the modal class.
C = Class size.
The modal class is the class with the highest frequency.
Example 3.7: Use the information in Example 3.5 to calculate the mode.
Solution:
The modal class is the class with the highest frequency, which is 66 − 73.
Ll = 65.5, ∆1 = 26 − 14 = 12, ∆2 = 26 − 18 = 8, C = 73.5 − 65.5 = 8.
So that, !
∆1
Mo = Ll + C
∆1 + ∆2
!
12
= 65.5 + 8
12 + 8
!
12
= 65.5 + 8 (4.13)
20

= 65.5 + 4.8

= 70.3

≈ 70

[Link] Obtaining the Mode from Histogram

Given any histogram, the modal class is the class with the highest rectangle (bar or
bin). To obtain the mode from a histogram, the method is as follows: From the left
vertex of the highest rectangle in the histogram draw a line to stop at the left vertex of
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

the nest rectangle on the right side of the highest rectangle. And from the right vertex
of the highest rectangle draw a line to stop at the right vertex of the next rectangle on
the left side of the highest rectangle. From the intersection of these two lines,trace a
straight line to the horizontal axis to pick your modal value.

Figure 4.1: Histogram showing how to obtain mode

4.2 Measures of Dispersion

These are numerical measures used to measure the [Link] a set of data about a fixed
point, usually about the mean. Dispersion is very important characteristic of a set of
measurements; the measure of this characteristic is necessary for the construction of
the mental picture of frequency distribution and spread of the data. Numerous mea-
sures of dispersion exist, and we shall discuss a few of them.

4.2.1 The Range

The simplest measure of dispersion is the range. It is defined by the difference between
the minimum and maximum measurements in any set of data. Since it depends on the
extreme values, it does not provide a good summary of the dispersion among the bulk
of the measurements. For example, the range for the following set of data 5, 8, 3, 2, 5,
7, 6, 7, 3, 7 is
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Range = M aximum(Highest) − M inimum(Lowest) = 8 − 2 = 6.

4.2.2 Mean Deviation

This is the mean of the absolute deviations of the measurements from the mean and is
defined as follows

Pn
i=1 Xi − X̄
MD = (4.14)
n

Where Xi are the measurements and X̄ is the mean of the X ′ s.


If the measurements X1 , X2 , ..., Xn occur with frequencies f1 , f2 , ..., fn respectively,then

Pn
fi Xi − X̄
i=1
MD = Pn (4.15)
i=1 fi

For grouped data, Xi are the class marks.

4.2.3 Variance

The variance of a set of measurements is defined as equal to the average of the squares
of the deviations of the measurements about their mean. Given a set of measurements
X1 , X2 , ..., Xn the variance is expressed as follows

Pn 2
2 i=1 Xi − X̄
V ar(X) = S = (4.16)
n−1

For grouped data with class marks X1 , X2 , ..., Xn and corresponding frequencies f1 , f2 , ..., fn
the variance is defined as follows

Pn 2
f i X i − X̄
V ar(X) = S 2 = i=1
Pn (4.17)
i=1 fi − 1

4.2.4 Standard Deviation

This is the positive square root of the variance.


Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

Std = S
√ (4.18)
= S2

4.2.5 Coefficient of Variation (CV)

The coefficient of variation is one of the measures of relative dispersion. It measures


the dispersion of a set of measurement in relation to the measurements and is defined
as 100 times the ratio of the standard deviation to the mean. Hence,

100 × S
CV = (4.19)

where S is the standard deviation; X̄ is the mean of the set of measurements.


The coefficient of variation is usually expressed in percentages. The coefficient of
variation is particularly useful in two circumstances, comparing dispersion in:

(i) Data sets that have remarkably different means.

(ii) Sets of measurements that arc expressed in different units.

Example 3.8: Calculate

(i) Mean,

(ii) Mean Deviation,

(iii) Variance,

(iv) Standard Deviation, and

(v) Coefficient of Variation.

for the ungrouped data set: 2.9, 6.1, 6.9, 0.8, 3.5, 2.9, measured in 107 watts. The
observations represent the daily consumption of electrical energy over a one week
period in a certain educational institution in Jimeta, Adamawa State.
Solution:
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(i) Mean: Pn
i=1 Xi
X̄ =
n
30
=
7
= 4.28571428571 (4.20)

= 4.3(1d.p)

= 4.3 × 107 watts

(ii) Mean Deviation: Pn


i=1 Xi − X̄
MD =
n
14.1
=
7
= 2.014285714285714 (4.21)

= 2.0(1d.p)

= 2.0 × 107 watts

(iii) Variance:
V ar(X) = S 2
Pn 2
i=1 Xi − X̄
=
n−1
33.57
=
7−1
33.57 (4.22)
=
6
= 5.695

= 5.6(1d.p)

= 5.6 × 1014 watts2


Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(iv) Standard Deviation:


Std = S

= S2

= 5.695
(4.23)
= 2.386419912756

= 2.4(1d.p)

= 2.4 × 107 watts

(v) Coefficient of Variation:

100 × S
CV =

100 × 2.386419912756
=
4.28571428571
238.6419912756 (4.24)
=
4.28571428571
= 55.6831312977037

= 55.7%(1d.p)

Example 3.9: Use the frequency distribution table below to obtain the following:
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(i) Mean,

(ii) Mean Deviation,

(iii) Variance,

(iv) Standard Deviation, and

(v) Coefficient of Variation.

Solution:

(i) Mean: Pn
i=1 fi Xi
X̄ = P n
i=1 fi
12182.5
=
135 (4.25)
= 90.2407

= 90.24(2d.p)

(ii) Mean Deviation: Pn


i=1fi Xi − X̄
MD = Pn
i=1 fi
1876.46
=
135 (4.26)
= 13.8997037

= 13.90(2d.p)
Dept. of Statistics: STA121 (Descriptive Statistics) Lecture Note by Dr. E. Torsen

(iii) Variance:
V ar(X) = S 2
Pn 2
i=1 fi Xi − X̄
= Pn
i=1 fi − 1
35685.926
=
135 − 1 (4.27)
35685.926
=
134
= 266.312880597

= 266.31(2d.p)

(iv) Standard Deviation:


Std = S

= S2

= 266.312880597 (4.28)

= 16.31909558147

= 16.32(2d.p)

(v) Coefficient of Variation:

100 × S
CV =

100 × 16.31909558147
=
90.2407
1631.909558147 (4.29)
=
90.2407
= 18.083963867158

= 18.08%(2d.p)

You might also like