Chapter 1 (1)
Chapter 1 (1)
CHAPTER 1
1. Introduction
The word statistics is derived from the Latin word “status” which means a political state or
government. Thus, the scope of statistics in the ancient times was primarily limited to the collection
of demographic and property, and wealth data of a country by governments for framing military and
fiscal policies. Now days, statistics is used almost in every field of study, such as natural science,
social science, engineering, medicine, agriculture, etc. The improvements in computer technology
make it easier than ever to use statistical methods and to manipulate massive amounts of data.
Most people think of statistics as the tables of figures giving births, deaths, marriages, divorces,
accidents etc. Some people think of statistics as information about an activity (like production,
population, national income etc.) expressed in numbers. Still some others think of the term statistics
as a subject or as a body of knowledge like other sciences.
Examples:
a) Height of five students in a class; 1.5m, 1.6m, 1.7m, 1.3m, 1.4m
Mean = = 1.5m
Average height of five students’ is 1.5m, but we can’t say average heights of statistical students are
1.5m under descriptive statistics, because it gives generalization about the whole students.
b) The median household income for people aged 25-34 is $35,888.
c) Expenditures for the cable industry were $5.66 billion in the year 1996.
d) Nine out of ten on-the-job fatalities are men.
e) The national average annual medicine expenditure per person is $1052
f) The starting salaries for Mathematics and Statistics students in different organizations. …etc
B) Inferential Statistics
Inferential statistics is the science which deals with the systematic collection, organization,
presentation, analysis and interpretation of data for the purpose of making decision about population
based on sample data by using chi-square test of association, regression and odds ratio…etc. It
consists of performing hypothesis testing, determining relationships among variables and making
predictions. For example, the average income of all families (the population) in Ethiopia can be
estimated from figures obtained from a few hundred (the sample) families.
Examples:
a) Average height of the class students is 1.5m.
b) Drinking decaffeinated coffee can raise cholesterol levels by 7%.
c) Experts say that mortgage rates may soon hit bottom.
d) From past figures, it has been predicted that 90% of registered voters will vote in the November
election.
e) The average age of students in Bule Hora University is 19.1 years.
f) There is a relationship between smoking tobacco and an increased risk of developing cancer, etc
Survey can also be done in different methods, the three most common methods are:
Telephone survey
Mailed questionnaire
Personal interview.
Activity: Discuss the advantage and disadvantage of the above three methods with respect to one
another.
2) Data Organization: Here the collected data should be organized. It should be edited by carefully
so that inconsistent/irrelevant answers and wrong composition in the turn of survey should be
corrected or adjusted or omitted. The next step is Classification. After the data is edited, then it
will be classified according to its resemblance (similarity). The last step here is Tabulation.
Tabulation is an arrangement of data by using row and column.
NB: Classification and Tabulation are the two methods that are used to condense (reduce) the data.
3) Data Presentation: is the process of re-organization, classification, compilation, and
summarization of data in a meaningful form by using Table, Graph and Diagram. The main
purpose of data presentation is to facilitate analysis.
4) Data Analysis: The process of extracting relevant information from the summarized data,
mainly through the use of highly complex and sophisticated mathematical techniques. The main
purpose of data analysis is to dig out useful information for decision making/interpretation.
5) Interpretation of data: Interpretation means drawing conclusions from the data which form the
basis of decision making. Correct interpretation requires a high degree of skill and experience
and is necessary in order to draw valid conclusions.
The population could be finite or infinite. A population is said to be finite if it consists of finite
number of units. Number of workers in a factory, Total number of students studying in a school or
college, Total number of books in a library, and total number of houses in a village or town are some
examples of finite population. A population is said to be infinite if it has infinite number of units. For
example the number of stars in the sky, the number of people seeing the Television programs at this
time etc. The total number of units in a population is called population size (N), and total number of
units in the sample is called sample size (n).
Limitations
Some of limitations of statistics are:
a) It doesn’t deal with individual values: as discussed earlier, statistics deals with aggregate of
values. For example, the population size of a country for some given year does not help us for
comparative studies, unless it can be compared with some other countries or with a set standard.
b) It doesn’t deal with qualitative data: statistics is not applicable to qualitative characteristics
such as beauty, honesty, poverty, standard of living and so on since these cannot be expressed
in quantitative terms. These characteristics, however, can be statistically dealt with if some
quantitative values can be assigned to these with logical criterion.
c) Statistical conclusions are not universally true: since statistical conclusions are true only
under certain assumptions, statistics is not an exact science. Also, the field deals extensively
with the laws of probability which at best are educated guesses. For example, if we toss a coin
10 times where the chances of a head or tail are 1:1, we cannot say with certainty that there will
be 5 heads and 5 tails. Thus the statistical laws are only approximations, not mathematically
correct.
d) It is sensitive for misuse: Statistics can be easily misused and therefore should be used by
experts. The number of car accidents committed in a city in a particular year by women drivers
is 10 while committed by men drivers is 40. Hence women drivers are safe drivers.
b) Quantitative variables: are variables that can be quantified or can have numerical values.
Examples:
Height, Area, Income, Temperature.
Number of rooms in the hotel, number of children in a family.
Balance in checking account, Weights of fish caught in Lake George, Times taken to cut a lawn.
Number of bicycles sold in 1 year by a large sporting goods store…..etc.
In mathematical terms measurement is a functional mapping from the set of objects {Oi} to the set of
real numbers {M (Oi)}.
The goal of measurement systems is to structure the rule for assigning numbers to objects in such a
way that the relationship between the objects is preserved in the numbers assigned to the objects.
The different kinds of relationships preserved are called properties of the measurement system.
a) Order
The property of order exists when an object that has more of the attribute than another object, is
given a bigger number by the rule system. This relationship must hold for all objects in the "real
world". The property of ORDER exists:
When for all i, j if Oi > Oj, then M(Oi) > M(Oj).
b) Distance
The property of distance is concerned with the relationship of differences between objects.
More precisely, an equal difference between two numbers assigned for objects reflects an equal
difference between the two objects in the "real world". In order to define the property of distance in
the mathematical notation, four objects are required: Oi, Oj, Ok, and Ol. The difference between
objects is represented by "-" sign; Oi - Oj refers to the actual "real world" difference between object i
and object j, while M (Oi) – M (Oj) refers to differences between numbers.
c) Fixed Zero
A measurement system possesses a rational zero (fixed zero) if an object that has none of the
attribute in question is assigned the number zero by the system of rules. The object does not need to
really exist in the "real world", as it is somewhat difficult to visualize a "man with no height".
Defining O0 as the object with none of the attribute in question, the definition of a rational zero
becomes:
The property of FIXED ZERO exists if M (O0) = 0.
The property of fixed zero is necessary for ratios between numbers to be meaningful.
Scale Types
Four levels of measurement scales are commonly distinguished: nominal, ordinal, interval, and
ratio, and each possessed different properties of measurement systems.
1) Nominal scale: Nominal is a Latin word, meaning “name”. Nominal scales are measurement
systems that possess none of the three properties stated above.
Level of measurement which classifies data into mutually exclusive and all inclusive
categories in which ordering or ranking the data is impossible.
Arithmetic and relational operations are not applicable.
Examples:
o Political party preference (Republican, Democrat, or Other,)
o Gender (Male or Female.)
o Marital status (married, single, widow, divorce)
o Country code (Ethiopia = +251, Somali = +252, Kenya = +254, Eritrea = +291, etc )
o Regional differentiation of Ethiopia (Tigray, Afar, Amhara, Oromia, etc).
o Car colors for a certain model (red, silver, blue and black).
2) Ordinal scale: Ordinal is a Latin word, meaning “order”. Ordinal Scales are measurement
systems that possess the property of order, but not the property of distance. The property of fixed
zero is not important if the property of distance is not satisfied.
Levels of measurement which classifies data into categories that can be ranked
Differences between the ranks do not exist.
Arithmetic operations are not applicable but relational operations are applicable.
Ordering is the sole property of ordinal scale.
Examples:
o Letter grades (A, B, C, D, F)
o Rating scales (Excellent, Very good, Good, Fair, poor)
o Military Status (Army, Navy, Air Force, Marines, and Coast Guard)
o Man A weighs more than man B.
o Ethiopian athletes got 1st and 2nd ranks in the 10,000m women’s final in Beijing.
o Of 17 fishing reels rated: 6 were rated good quality, 4 were rated better quality, and 7 were
rated best quality.
o Out of a high school class of 319, Walter ranked 4th, June ranked 12th, and Jim ranked 20th.
3) Interval Level: possess the properties of Order and distance, but not the property of fixed zero.
Data that can be ranked and differences are meaningful. However, there is no meaningful or
intrinsic zero, so ratios are meaningless. Note: Celsius & Fahrenheit temperature readings
have no meaningful zero and ratios are meaningless.
Arithmetic operations except division and multiplication are applicable.
For example; 37oc – 35oc = 2oc, 45oc – 43oc= 2oc, but 40oc 2(20oc) because this does not imply
that an object which is 40oc is twice as hot as an object which is 20oc.
3) Ethiopian Television agency wants to know the proportion of TV owners in Addis Ababa who
watch the agencies new program at least once a week. The station asked a group of 1000 TV
owners in Addis Ababa if they watch the program at least once a week.
a) Identify the individuals in the study.
b) Identify the variable.
c) Do the data comprise a sample? If so, what is the underlying population?
d) Is the variable quantitative or qualitative?
4) State the level of measurement for each of the following:
a) Individuals may be classified according to socio-economic as low, medium & high.
b) Dates of the week: Monday, Tuesday, ….., Sunday
c) The senator’s name is Sam Wilson.
d) Of 1100 voters in Senator’s district: 400 strongly favor his bill; 300 favor; 200 neutral;
150 do not favor, and 50 strongly do not favor his bill.
e) Patients may be characterized as unimproved, improved & much improved.
f) The height of the men in Bule Hora University.
g) Your score on an individual intelligence test as a measure of your intelligence.
h) Your checking account number as a name for your account.
i) Your checking account balance as a measure of the amount of money you have in that
account.
j) Your score on the first statistics test as a measure of your knowledge of statistics.
k) Your score on an individual intelligence test as a measure of your intelligence.
l) The distance around your forehead measured with a tape measure as a measure of your
intelligence.
m) A response to the statement "Abortion is a woman's right" where "Strongly Disagree" =
1, "Disagree" = 2, "No Opinion" = 3, "Agree" = 4, and "Strongly Agree" = 5, as a
measure of attitude toward abortion. .
n) Months of the year Meskerm, Tikimit… .
o) Blood type of individuals, A, B, AB and O.
p) Pollen counts provided as numbers between 1 and 10 where 1 implies there is almost no
pollen and 10 that it is rampant, but for which the values do not represent an actual
counts of grains of pollen.
q) Regions numbers of Ethiopia (1, 2, 3 etc.)