0% found this document useful (0 votes)
7 views

Chapter 1 (1)

Uploaded by

alafiteshoma63
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Chapter 1 (1)

Uploaded by

alafiteshoma63
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Stat2011

CHAPTER 1
1. Introduction

The word statistics is derived from the Latin word “status” which means a political state or
government. Thus, the scope of statistics in the ancient times was primarily limited to the collection
of demographic and property, and wealth data of a country by governments for framing military and
fiscal policies. Now days, statistics is used almost in every field of study, such as natural science,
social science, engineering, medicine, agriculture, etc. The improvements in computer technology
make it easier than ever to use statistical methods and to manipulate massive amounts of data.

1.1. Definition and Classification of Statistics


1.1.1. What is Statistics?

Most people think of statistics as the tables of figures giving births, deaths, marriages, divorces,
accidents etc. Some people think of statistics as information about an activity (like production,
population, national income etc.) expressed in numbers. Still some others think of the term statistics
as a subject or as a body of knowledge like other sciences.

The word “Statistics” can be defined in two senses:


a) Plural Sense (lay man definition): Statistics refers to the numerical facts/figures collected
for a certain purposes. But all numerical facts/figures may not be statistics. For example;
Number of people living in a particular area, Number of bed rooms in the hotel…etc.

b) Singular Sense (Formal definition): Statistics is defined as the science of systematic


collection, organization, presentation, analysis and interpretation of numerical data to make a
more effective decision about population based on sample data.

1.1.2. Classification of Statistics:


Statistics is broadly divided into two categories based on how the collected data are used.
A) Descriptive Statistics:
Descriptive statistics is a science which deals with collection, organization and summarization of
data by using graphs, charts and tables. It describes a given set of data without going to conclusion
or generalization about large set of data or population based on sample data.

Prepared by Daba K. Page 1


Stat2011

Examples:
a) Height of five students in a class; 1.5m, 1.6m, 1.7m, 1.3m, 1.4m
Mean = = 1.5m
Average height of five students’ is 1.5m, but we can’t say average heights of statistical students are
1.5m under descriptive statistics, because it gives generalization about the whole students.
b) The median household income for people aged 25-34 is $35,888.
c) Expenditures for the cable industry were $5.66 billion in the year 1996.
d) Nine out of ten on-the-job fatalities are men.
e) The national average annual medicine expenditure per person is $1052
f) The starting salaries for Mathematics and Statistics students in different organizations. …etc

B) Inferential Statistics
Inferential statistics is the science which deals with the systematic collection, organization,
presentation, analysis and interpretation of data for the purpose of making decision about population
based on sample data by using chi-square test of association, regression and odds ratio…etc. It
consists of performing hypothesis testing, determining relationships among variables and making
predictions. For example, the average income of all families (the population) in Ethiopia can be
estimated from figures obtained from a few hundred (the sample) families.
Examples:
a) Average height of the class students is 1.5m.
b) Drinking decaffeinated coffee can raise cholesterol levels by 7%.
c) Experts say that mortgage rates may soon hit bottom.
d) From past figures, it has been predicted that 90% of registered voters will vote in the November
election.
e) The average age of students in Bule Hora University is 19.1 years.
f) There is a relationship between smoking tobacco and an increased risk of developing cancer, etc

1.2. Stages of Statistical Investigation


The area of statistics incorporates the following five stages. These are collection, organization,
presentation, analysis and interpretation of data.
1) Data Collection: Since statistical analyses and decisions are based upon the raw data collected
from surveys, it is necessary and important that such data be carefully and accurately collected,
accumulated and recorded. Faulty data or faulty collection of data techniques would result in
wrong conclusions. Data can be collected in a variety of ways; one of the most common methods
is through the use of survey.

Prepared by Daba K. Page 2


Stat2011

Survey can also be done in different methods, the three most common methods are:
Telephone survey
Mailed questionnaire
Personal interview.
Activity: Discuss the advantage and disadvantage of the above three methods with respect to one
another.
2) Data Organization: Here the collected data should be organized. It should be edited by carefully
so that inconsistent/irrelevant answers and wrong composition in the turn of survey should be
corrected or adjusted or omitted. The next step is Classification. After the data is edited, then it
will be classified according to its resemblance (similarity). The last step here is Tabulation.
Tabulation is an arrangement of data by using row and column.
NB: Classification and Tabulation are the two methods that are used to condense (reduce) the data.
3) Data Presentation: is the process of re-organization, classification, compilation, and
summarization of data in a meaningful form by using Table, Graph and Diagram. The main
purpose of data presentation is to facilitate analysis.
4) Data Analysis: The process of extracting relevant information from the summarized data,
mainly through the use of highly complex and sophisticated mathematical techniques. The main
purpose of data analysis is to dig out useful information for decision making/interpretation.
5) Interpretation of data: Interpretation means drawing conclusions from the data which form the
basis of decision making. Correct interpretation requires a high degree of skill and experience
and is necessary in order to draw valid conclusions.

1.3. Definition of some basic terms


a) Population: It is the totality of subjects/objects possessing some common characteristics under
specified time and place. The population represents the target of an investigation, and the
objective of the investigation is to draw conclusions about the population hence we sometimes
call it target population.
Examples:
All university students of Ethiopia, all staff members of Bule Hora University, Population of trees
under specified climatic conditions, Population of animals fed a certain type of diet, Population of
farms having a certain type of natural fertility, Population of households, etc

Prepared by Daba K. Page 3


Stat2011

The population could be finite or infinite. A population is said to be finite if it consists of finite
number of units. Number of workers in a factory, Total number of students studying in a school or
college, Total number of books in a library, and total number of houses in a village or town are some
examples of finite population. A population is said to be infinite if it has infinite number of units. For
example the number of stars in the sky, the number of people seeing the Television programs at this
time etc. The total number of units in a population is called population size (N), and total number of
units in the sample is called sample size (n).

There are two ways of investigation: Census and sample survey.


Census: a complete enumeration of the population. But in most real problems it cannot be realized,
hence we take sample survey.
b) Sample: is a group of subjects selected from a population or a subset of population, selected
using some sampling techniques in such a way that they should represent the population.
For example, if we want to study the income pattern of lecturers at Bule Hora University and there
are 300 lecturers, then we may take a random sample of only 25 lecturers for the purpose of study.
Then this number of 25 lecturers constitutes a sample. In practice, we rarely conduct census, instead
we conduct sample survey.
c) Parameter: is a descriptive measure of a population, or summary value calculated from a
population. Examples: Average, Range, proportion, variance.
d) Statistic: is a descriptive measure of a sample, or summary value calculated from a sample.
From the previous example, the summary measure that describes a characteristic such as average
income of this sample is known as a statistic, while average income of the whole lecturers
(population) is parameter.
In general, we use Greek or capital letters to denote population parameters and lower case or Roman
letters to denote sample statistics. [N, µ, σ, are the standard symbols for the size, mean, S.D, of
population and n, X , s, are the standard symbol for the size, mean, s. d of sample respectively]

e) Sampling: The process or method of sample selection from the population.


f) Variable: It is an item of interest that can take on many different numerical values.
g) Data: are the values that the variables can assume.

Prepared by Daba K. Page 4


Stat2011

1.4. Applications, Uses and Limitation of Statistics


 Applications
Statistics can be applied in any field of study which seeks quantitative evidence.
a) Engineering: Statistics have wide application in engineering.
To compare the breaking strength of two types of materials.
To determine the probability of reliability of a product.
To control the quality of products in a given production process.
b) Agriculture
To compare the improvement of yield due to certain additives (fertilizer, herbicides, etc).
c) Economics: Statistics are widely used in economics study and research.
To measure and forecast Gross National Product (GNP).
Statistical analyses of population growth, unemployment figures, rural or urban population
shifts and so on influence much of the economic policy making.
Financial statistics are necessary in the fields of money and banking including consumer
savings and credit availability.
d) Statistics and research: there is hardly any advanced research going on without the use of
statistics in one form or another. Statistics are used extensively in medical, pharmaceutical, etc.
 Uses
Statistics has a lot of functions in everyday activities. The following are some uses of statistics:
a) Statistics condenses and summarizes complex data: the original set of data (raw data) is
normally voluminous and disorganized unless it is summarized and expressed in few numerical
values.
b) Statistics facilitates comparison of data: measures obtained from different set of data can be
compared to draw conclusion about those sets by using averages, percentages, ratios, etc.
c) Statistics helps in predicting/forecasting future trends: statistics is extremely useful for
analyzing the past and present data, and predicting some future trends.
d) Statistics influences the policies of government: statistical study results in the areas of
taxation, unemployment rate, the performance of every sort of military equipment, etc, may
convince a government to review its policies and plans with the view to meet national needs and
aspirations.
e) Formulating and testing hypothesis and to develop new theories.

Prepared by Daba K. Page 5


Stat2011

f) Measuring the magnitude of variations in data.


g) Estimating unknown population characteristics.
h) Studying the relationship between two or more variable.

 Limitations
Some of limitations of statistics are:
a) It doesn’t deal with individual values: as discussed earlier, statistics deals with aggregate of
values. For example, the population size of a country for some given year does not help us for
comparative studies, unless it can be compared with some other countries or with a set standard.
b) It doesn’t deal with qualitative data: statistics is not applicable to qualitative characteristics
such as beauty, honesty, poverty, standard of living and so on since these cannot be expressed
in quantitative terms. These characteristics, however, can be statistically dealt with if some
quantitative values can be assigned to these with logical criterion.
c) Statistical conclusions are not universally true: since statistical conclusions are true only
under certain assumptions, statistics is not an exact science. Also, the field deals extensively
with the laws of probability which at best are educated guesses. For example, if we toss a coin
10 times where the chances of a head or tail are 1:1, we cannot say with certainty that there will
be 5 heads and 5 tails. Thus the statistical laws are only approximations, not mathematically
correct.
d) It is sensitive for misuse: Statistics can be easily misused and therefore should be used by
experts. The number of car accidents committed in a city in a particular year by women drivers
is 10 while committed by men drivers is 40. Hence women drivers are safe drivers.

1.5. Types of Variables/Data


A variable is a characteristic of an object that can have different possible values. Age, height, IQ and
so on are all variables since their values can change when applied to different people. There are two
types of variable:
a) Qualitative variables: are variables that cannot be quantified directly (i.e non-numeric).
Qualitative variables are also called categorical variables.
Examples:
 Color, beauty, sex, location, Gender, political affiliation, religious affiliation, and state of birth.
 Classifications of children in a day care center (infant, toddler, and preschool)….etc.

Prepared by Daba K. Page 6


Stat2011

b) Quantitative variables: are variables that can be quantified or can have numerical values.
Examples:
 Height, Area, Income, Temperature.
 Number of rooms in the hotel, number of children in a family.
 Balance in checking account, Weights of fish caught in Lake George, Times taken to cut a lawn.
 Number of bicycles sold in 1 year by a large sporting goods store…..etc.

Quantitative variable is classified in to two:


Discrete variables, and
Continuous variables
1) Discrete variables: are variables whose values are obtained by counting. Which can assume
only certain (specific) values, and there are usually "gaps" between the values.
Examples:
 Number of students attending a conference, Number of households (family size), Number of
pages of a book, Number of eggs in the refrigerator, etc.
b) Continuous variables: is numeric data that can take values between any two given values or is
given in interval. Continuous data results from measuring process.
Examples:
 Water temperatures of six swimming pools, Weights of cats in a pet shelter, Capacity (in gallons)
of six reservoir’s in Jefferson County, Air pressures in a tire, Height of models in a beauty
context, Lengths of steel bars in a given production terms, etc.

1.6. Scales of Measurement


Measurement scale refers to the property of value assigned to the data based on the properties of
order, distance and fixed zero.

In mathematical terms measurement is a functional mapping from the set of objects {Oi} to the set of
real numbers {M (Oi)}.

Prepared by Daba K. Page 7


Stat2011

The goal of measurement systems is to structure the rule for assigning numbers to objects in such a
way that the relationship between the objects is preserved in the numbers assigned to the objects.
The different kinds of relationships preserved are called properties of the measurement system.
a) Order
The property of order exists when an object that has more of the attribute than another object, is
given a bigger number by the rule system. This relationship must hold for all objects in the "real
world". The property of ORDER exists:
When for all i, j if Oi > Oj, then M(Oi) > M(Oj).
b) Distance
The property of distance is concerned with the relationship of differences between objects.

More precisely, an equal difference between two numbers assigned for objects reflects an equal
difference between the two objects in the "real world". In order to define the property of distance in
the mathematical notation, four objects are required: Oi, Oj, Ok, and Ol. The difference between
objects is represented by "-" sign; Oi - Oj refers to the actual "real world" difference between object i
and object j, while M (Oi) – M (Oj) refers to differences between numbers.

The property of DISTANCE exists, for all i, j, k, l


If Oi-Oj ≥ Ok- Ol then M(Oi)-M(Oj) ≥ M(Ok)-M( Ol ).

c) Fixed Zero
A measurement system possesses a rational zero (fixed zero) if an object that has none of the
attribute in question is assigned the number zero by the system of rules. The object does not need to
really exist in the "real world", as it is somewhat difficult to visualize a "man with no height".
Defining O0 as the object with none of the attribute in question, the definition of a rational zero
becomes:
The property of FIXED ZERO exists if M (O0) = 0.
The property of fixed zero is necessary for ratios between numbers to be meaningful.

Scale Types

Four levels of measurement scales are commonly distinguished: nominal, ordinal, interval, and
ratio, and each possessed different properties of measurement systems.

Prepared by Daba K. Page 8


Stat2011

1) Nominal scale: Nominal is a Latin word, meaning “name”. Nominal scales are measurement
systems that possess none of the three properties stated above.
Level of measurement which classifies data into mutually exclusive and all inclusive
categories in which ordering or ranking the data is impossible.
Arithmetic and relational operations are not applicable.
Examples:
o Political party preference (Republican, Democrat, or Other,)
o Gender (Male or Female.)
o Marital status (married, single, widow, divorce)
o Country code (Ethiopia = +251, Somali = +252, Kenya = +254, Eritrea = +291, etc )
o Regional differentiation of Ethiopia (Tigray, Afar, Amhara, Oromia, etc).
o Car colors for a certain model (red, silver, blue and black).
2) Ordinal scale: Ordinal is a Latin word, meaning “order”. Ordinal Scales are measurement
systems that possess the property of order, but not the property of distance. The property of fixed
zero is not important if the property of distance is not satisfied.
Levels of measurement which classifies data into categories that can be ranked
Differences between the ranks do not exist.
Arithmetic operations are not applicable but relational operations are applicable.
Ordering is the sole property of ordinal scale.
Examples:
o Letter grades (A, B, C, D, F)
o Rating scales (Excellent, Very good, Good, Fair, poor)
o Military Status (Army, Navy, Air Force, Marines, and Coast Guard)
o Man A weighs more than man B.
o Ethiopian athletes got 1st and 2nd ranks in the 10,000m women’s final in Beijing.
o Of 17 fishing reels rated: 6 were rated good quality, 4 were rated better quality, and 7 were
rated best quality.
o Out of a high school class of 319, Walter ranked 4th, June ranked 12th, and Jim ranked 20th.
3) Interval Level: possess the properties of Order and distance, but not the property of fixed zero.
Data that can be ranked and differences are meaningful. However, there is no meaningful or
intrinsic zero, so ratios are meaningless. Note: Celsius & Fahrenheit temperature readings
have no meaningful zero and ratios are meaningless.
Arithmetic operations except division and multiplication are applicable.
For example; 37oc – 35oc = 2oc, 45oc – 43oc= 2oc, but 40oc 2(20oc) because this does not imply
that an object which is 40oc is twice as hot as an object which is 20oc.

Prepared by Daba K. Page 9


Stat2011

Relational operations are also possible.


Interval scale data convey better information than nominal and ordinal scale data.
Examples:
o IQ, SAT score, Credit score
o Temperature in 0F or 0C
o Calendar dates
o The years in which democrats won presidential elections
o Building A was built in 1284, Building B in 1492 and Building C in 1695.
4) Ratio scale: Ratio scales are measurement systems that possess all three properties: order,
distance, and fixed zero. The added power of a fixed zero allows ratios of numbers to be
meaningfully interpreted; i.e. the ratio of Bekele's height to Martha's height is 1.32, whereas this
is not possible with interval scales.
Level of measurement which classifies data that can be ranked, differences are meaningful,
and there is a true zero. True ratios exist between the different units of measure.
All arithmetic and relational operations are applicable.
This measurement scale provides better information than interval scale of measurement.
Examples:
o Weight, Height, Time, Length, Number of students, Age, Salary
o Temperature in Kelvin, monetary quantities, counts, mass, electrical current
NB:
 Temperature in Celsius or Fahrenheit is at an interval scale because zero is not the lowest
possible temperature. In the Kelvin scale, a ratio scale, zero represents a total lack of thermal
energy. Interval can fall below zero, but ratio can’t fall below zero, zero is the starting point.
 Interval and ratio scales are quantitative, while nominal and ordinal scales are qualitative.
Exercise
1) Define the following terms.
a) Statistics, Descriptive statistics, Qualitative variable, Nominal scale and Sample survey.
2) To assess the opinion of students at Bule Hora University about Cafeteria Safety, the Ethiopian
television reporter interviews 20 students, he meets walking on the campus late at night with
students willing to give their opinion.
a) What is the sample here?
b) What is the population? Why?
c) Should you trust the results of this survey? Why?
d) As a statistician comment the ETV reporter.

Prepared by Daba K. Page 10


Stat2011

3) Ethiopian Television agency wants to know the proportion of TV owners in Addis Ababa who
watch the agencies new program at least once a week. The station asked a group of 1000 TV
owners in Addis Ababa if they watch the program at least once a week.
a) Identify the individuals in the study.
b) Identify the variable.
c) Do the data comprise a sample? If so, what is the underlying population?
d) Is the variable quantitative or qualitative?
4) State the level of measurement for each of the following:
a) Individuals may be classified according to socio-economic as low, medium & high.
b) Dates of the week: Monday, Tuesday, ….., Sunday
c) The senator’s name is Sam Wilson.
d) Of 1100 voters in Senator’s district: 400 strongly favor his bill; 300 favor; 200 neutral;
150 do not favor, and 50 strongly do not favor his bill.
e) Patients may be characterized as unimproved, improved & much improved.
f) The height of the men in Bule Hora University.
g) Your score on an individual intelligence test as a measure of your intelligence.
h) Your checking account number as a name for your account.
i) Your checking account balance as a measure of the amount of money you have in that
account.
j) Your score on the first statistics test as a measure of your knowledge of statistics.
k) Your score on an individual intelligence test as a measure of your intelligence.
l) The distance around your forehead measured with a tape measure as a measure of your
intelligence.
m) A response to the statement "Abortion is a woman's right" where "Strongly Disagree" =
1, "Disagree" = 2, "No Opinion" = 3, "Agree" = 4, and "Strongly Agree" = 5, as a
measure of attitude toward abortion. .
n) Months of the year Meskerm, Tikimit… .
o) Blood type of individuals, A, B, AB and O.
p) Pollen counts provided as numbers between 1 and 10 where 1 implies there is almost no
pollen and 10 that it is rampant, but for which the values do not represent an actual
counts of grains of pollen.
q) Regions numbers of Ethiopia (1, 2, 3 etc.)

Prepared by Daba K. Page 11

You might also like