0% found this document useful (0 votes)
11 views

Chpt1 4

Let’s go

Uploaded by

nton240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Chpt1 4

Let’s go

Uploaded by

nton240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

The Nature of Statistics

[Agresti & Finlay (1997), Johnson & Bhattacharyya (1992), Weiss


(1999), Anderson & Sclove (1974) and Freund (2001)]

1.1 What is statistics?


Statistics is a very broad subject, with applications in a vast number of
different fields.

In generally one can say that statistics is the methodology for


a) collecting, analysing information.
b) interpreting and drawing conclusions from information.

Statistics is the methodology which scientists and mathematicians have


developed for interpreting and drawing conclusions from collected data.

Everything that deals even remotely with the collection, processing,


interpretation and presentation of data belongs to the domain of statistics.

Statistics also includes detailed planning, design and implementation of


the activities of statistics.

Definition (Statistics).
Statistics consists of a body of methods for collecting and analyzing data.
(Agresti & Finlay, 1997)

In any statistical enquiry, statistical methods are designed to answer and


address to the questions:

 What kind and how much data need to be collected?

 How should we organize and summarize the data?

 How can we analyse the data and draw conclusions from it?

 How can we assess the strength of the conclusions and evaluate


their
uncertainty?

One can think of broad scope of statistical involving

1. Design: Planning and carrying out research studies.

2. Description: Summarizing and exploring data.

1
3. Inference: Making predictions and generalizing about phenomena
represented
by the data.

Applications of interest to the researcher are might include

 to study the effectiveness of medical treatments,


 the reaction of consumers to television advertising,
 the attitudes of young people toward sex and marriage, and much
more. (attitude towards ipelegeng, covid-19 protocols, gender
participation in political or managerial positions).

1.1.3 Application of Statistics

Examples of areas or field of science using statistics.

–agricultural problem: Is new grain seed or fertilizer more


productive?

–medical problem: What is the right amount of dosage of drug to


treatment?

–political science: How accurate are the opinion polls?

–economics: What will be the unemployment rate next year?

–technical problem: How to improve quality of product?

Think of other areas / applications of


enquiry
1.1.4 Use and misuse of Statistics
 Despite its strengths, statistics is one of the most misused subjects.
Some examples of misuse are in D Huff’s “How to Lie with Statistics”
Generally misuse can be classified as
- (1) using statistical methods, techniques, or models in ways that
produce distorted or artificial results;
- (2) failing to disclose important information about statistical
methodology to researchers.

2
 In particular, the causes for misuse of statistics arise at all stages of the
statistical process such as:
- Inappropriate methods of collecting the information (sampling) and
small sample size.
- Use of unrepresentative subgroups in a study
- Inappropriate methods of summarizing data eg Misleading Graphs
as in choice of scales on a graph may lead to misleading
interpretation.
- Wrong choice of data analysis and inferential technique eg
assuming independence or wrong distributions of data.
- Wrong conclusions from the results

SAMPLING BIAS One standardized test used by colleges for assessment


purposes is the Educational Testing Service’s Proficiency Profile; it
purports to measure mathematics, reading, and writing skills of college
students. However, as we shall see, this statistic can be misleading when
presented alone. Consider a comparison of School A and School B, with
scores on the learning assessment test as shown in Table 1.

School A School B

Average Score 55 75

on Learning Assessment Test

Basing an evaluation on the average scores alone, we conclude that


School A should be closed down, since students are doing very poorly on
this key subject.

However, suppose that we re-examine the data, looking not only at the
average scores after attending the schools, but those before entering the
schools. Suppose the numbers are

School A School B

Average Score on Learning Assessment Test

(Before Entering) 25 85

Average Score on Learning Assessment Test

(After Entering) 55 75

Discuss
3
Bias Loaded Questions
Poorly Written Survey Questions Creating Biased
Data
So a “bias obstacle” occurs because poorly worded questions create
biased data.

e.g. “Do you favour elimination of waste in the defence


budget?”

compare with “which defense programs they considered


wasteful”.

e.g. “Do you support a balanced budget?”

compare with

Asking about cutting spending on any one of the individual programs

“Do you support cutting on Ipelegeng (or cutting on free


education, or eliminating medical subsidy)

you get far less agreement than you would get on “Do you support a
balanced budget?”

Try to avoid leading or loaded questions. Use neutral language.

People are more likely to be taken to sound loyal, law abiding and
supportive of “government payments to the poor” and “universal health
coverage”

People are likely to provide an answer that agrees with the authority
figure.
“Do you agree that smoking should be against the law.”

“Do you agree with the president that taxes should be raised to lower the
national debt”

4
In each one of these think of the likely bias from the responses from
respondents.

Think of how the question might have to be phrases to get honest


unambiguous responses.

5
Two Major types of statistics.
1. Descriptive:
Descriptive statistics consist of methods for organizing and summarizing
information (Weiss, 1999).
Descriptive statistics includes
- the construction of graphs, charts, and tables, and
- the calculation of various descriptive measures such as averages,
measures
of variation, and percentiles.
In fact, the most part of this course deals with descriptive
statistics.

2. Inferential Statistics.
Inferential statistics consist of methods for drawing and measuring the
reliability of conclusions about population based on information obtained
from a sample of the population. (Weiss, 1999)

Inferential statistics includes methods like


- point estimation and interval estimation
- hypothesis testing
These methods are all based on probability theory.

Example (Descriptive and Inferential Statistics).

Suppose number of children per household are observed in a selection of


100 households and the results are as follows:

Number of children Frequencies of households


1 10
2 20
3 18
4 16
5 11
6 25

6
Later we will discuss various methods of summarizing and displaying this
data.

Inferential statistics can now be used to verify whether the average


number of children is 3 or not.

7
Variables and organization of the data
[Weiss (1999), Anderson & Sclove (1974) and Freund (2001)]

2.1 Variables
A characteristic that varies from one person or thing to another is called a
variable, i.e, a variable is any characteristic that varies from one individual
member of the population to another.

Examples of variables for humans are


height, weight, number of siblings, sex, marital status, and
eye color.

quantitative (or numerical) variables


The first three of these variables yield numerical information (yield
numerical measurements) and are examples of quantitative (or
numerical) variables,

qualitative (or categorical) variables.


last three yield non-numerical information (yield non-numerical
measurements) and are examples of qualitative (or categorical) variables.

Quantitative variables can be classified as either discrete or continuous.

 Discrete variables.
Some variables, such as the
- numbers of children in family,
- the numbers of car accident on the certain road on different days, or
- the numbers of students taking basics of statistics course are the
results of
counting and thus these are discrete variables.

8
Typically, a discrete variable is a variable whose possible values are some
or all of the ordinary counting numbers like 0, 1, 2, 3, . . . .
As a definition, we can say that a variable is discrete if it has only a
countable number of distinct possible values. That is,
a variable is is discrete if it can assume only a finite numbers of values or
as
many values as there are integers.

 Continuous variables.
Quantities such as length, weight, or temperature can in principle be
measured arbitrarily accurately. There is no indivible unit.

Weight may be measured to the nearest gram, but it could be measured


more
accurately, say to the tenth of a gram.
Such a variable, called continuous, is intrinsically different from a discrete
variable.

2.1.1 Scales of Measurement

Besides being classified as either qualitative or quantitative, variables can


be described according to the scale on which they are defined. The scale
of the variable gives certain structure to the variable and also defines the
meaning of the variable.

 Nominal scale
Mainly Quantitative -- No natural ordering
Observations fall or belong to one or the other category
Eg Marital Status- Single, Married, Living together
ets
Eg Gender M, F
9
E.g. Car colors for a certain model are: red, silver, blue
and black.
 categories are merely names.

Qualities with no ranking/ordering; no numerical or


quantitative value. Data consists of names, labels and
categories.

With regard to the codes for marital status,

we cannot write 3 > 1 or 2 < 4, and

we cannot write 2 − 1 = 4 − 3 or 1 + 3 = 4.

This illustrates how important


it is always check whether the mathematical treatment of statistical data
is
really legimatite.

The categories into which a qualitative variable falls may or may not have

a natural ordering. For example, occupational categories have no natural

ordering. If the categories of a qualitative variable are unordered, then the

qualitative variable is said to be defined on a nominal scale, the word

nominal referring to the fact that the categories are merely names.

 Ordinal variable

10
If the categories can be put in order, the scale is called an ordinal scale.

Based

on what scale a qualitative variable is defined, the variable can be called

as

a nominal variable or an ordinal variable.

Examples of ordinal variables are

 Education - (classified e.g. as low, high) and

 "strength of opinion" on some proposal - (classified according to

whether the individual favors the proposal,

is indifferent towards it, or opposites it), and

 position at the end of race (first, second, etc.).

Scales for Quantitative Variables.


Quantitative variables, whether discrete or continuous, are defined either
on an

interval scale or on a ratio scale.

 Interval Scale
If one can compare the differences between measurements of the variable

meaningfully, (Note in this case but not the ratio of the measurements,

then the quantitative variable is defined on interval scale.

Calendar dates and Celsius & Fahrenheit temperature


readings have no meaningful zero and
ratios are meaningless.

 Ratio Level:

11
Similar to interval, except there is a true zero, or
starting point, and the ratios of data values have
meaning.

a. Core temperature of stars measured in degrees


Kelvin.
b. Time elapsed between the deposit of a check and the
clearance
of that check.
c. Length of trout in the North River.

If, on the other hand, one can compare both the differences between

measurements of the variable and the ratio of the measurements

meaningfully, then the quantitative variable is defined on ratio scale.

In order to the ratio of the measurements being meaningful, the variable

must have natural meaningful absolute zero point,

i.e, a ratio scale is an interval scale with a meaningful absolute zero point.

 For example, temperature measured on the Certigrade system

is an interval variable. Note the temperature of 30 degrees cannot be

added to the temperature of 30 degrees to have a total of 60 degrees. In

addition 0 degrees does not mean no energy in the air

and

 the height of person is a ratio variable.

 a. The years in which democrats won presidential


elections.
 b. Body temperature in degrees Celsius (or
Fahrenheit) of trout swimming in the North River.
 c. Building A was built in 1284, Building B in 1492

and Building C in 5 bce.

12
Guided Exercise 1
Television station QUE wants to know the proportion of
TV
owners in Virginia who watch the stations new program
at least
once a week. The station asked a group of 1000 TV
owners in
Virginia if they watch the program at least once a week.

a. Identify the individuals in the study.

b. Identify the variable.

c. Do the data comprise a sample? If so, what is the


underlying
population?

Yes. The implied population is the responses (watch/not


watch) of all TV owners in Virginia.

d. Is the variable quantitative or qualitative?


e. Identify a quantitative variable that might of interest.

1.1.7 Types of Data


(a) Time series data: A sequence of observations collected over a usually regular interval of
time or space.at some instants in time. Examples: Monthly sales of a departmental store
made in 2017, annual youth unemployment rate for Botswana during 2014-2018, monthly or
daily temperature and rainfall readings for Gaborone in 2018.

(b) Cross-sectional data: Observations taken on a group of individuals or animals at the


same time or approximately the same point in time. Examples of this kind of data include
sample surveys data or census data, i.e. data that arise from a complete enumeration of
individuals, objects, animals, etc.

13
(c) Longitudinal data: These are data taken on a group of people at regular intervals of time
usually aimed at finding out if there has been any change in behaviour, or some other
characteristics. For example, measurements can be taken on a baby’s weight, height every
fortnight to determine if there has been changes in these characteristics; measurements at
regular intervals of time on women suffering from obesity after being placed on some diet.

1.1.8 Sources of Data


There are two kinds of data sources and these are primary and secondary sources.
Primary data: primary data consist of raw information collected at first hand in order to
satisfy the purposes of a particular statistical enquiry. Such data are called primary data as
they are said to come from the scratch and the sources of the data are primary data sources.
Examples of primary data will include census data from census surveys, sample surveys data
in their original form, etc.
Advantages of using Primary data
i) Relevance of the data to the problem at hand
ii) Greater control and accuracy are ensured
iii) Data can be easily manipulated in terms of definitions of basic terms and tabulations.
Disadvantages of using Primary data
i) Data are expensive to collect and take a lot of time
ii) Elaborate planning and training are essential for their collections
iii) Collecting primary data can be a costly venture.

Secondary data: consist of figures that were collected originally to satisfy a particular
investigation, but have been used now at second hand, as the basis for a different
investigation. This a cost effective way of data usage even though it maybe inadequate in
information. Examples of secondary data will include journals articles, magazines, reports,
etc. Note that census data which is being used by a second user who didn’t collect the data is
also secondary data.
Advantages of using secondary data
i) Data are easier and less time consuming to collect since they already exist.
ii) They are less expensive to collect.
Disadvantages of using secondary data

14
i) Data may not be very relevant to the problems at hand.
ii) No control can be exercised in terms of the accuracy in collecting procedure and rounding
errors in the data
iii) The data are no longer amenable to any changes.
Data are primary or secondary depending on who collects it and who uses it. To the original
collector, the data are primary but if after collecting the data, someone else uses the data for
any purpose, the data are secondary to that user.

1.2 Methods of Data Collection

Data collection is a stage in research where the needed data is solicited from the respondents.
Depending on the type of study, there several methods that can be used to obtain responses from a
subject and these are as follows:

1.2.1 Direct Observation: Observation implies the use of eyes rather than listening or asking
questions. The researcher usually would adapt themselves amongst the community members
observing what is going on around and conversing with members. Even though the respondents may
respond to the researcher’s questions, a trusting environment has to be developed otherwise the
respondents will not act in a natural way. This method of collecting data is good in subject matters
that involve sensitive issues like initiation schools in a traditional tribal setup.

Advantages: The respondents are usually unaware of what is happening and so behave in their
natural ways.

Disadvantages: It takes a long time and effort to complete. A lot of training is required for the
person(s) who are involved in the study.

1.2.2 Documentary Sources: Before one can go into the field to collect data, it is more appropriate
that one use the already available information about the subject matter in books, journals, etc. A
mass of information about the population studied by social surveys is available in historical
documents, statistical reports, records of institutions and other sources. It must be borne in mind
that this information was compiled for a purpose, hence on its own it cannot address the objectives
of a new survey, thus will only act as supplementary information.

15
1.2.3 Interviewing: A survey interview is a structured conversation between an interviewer and the
respondents in which questions are asked and answers recorded in standardized schedule sheet.
This constitutes a formal interview, whereas a less formal is depicted by the absence of structured
questionnaire. There are different types of interviews.

Direct Interview: There is an interviewer and the respondents at the same place at the same time.
As the interviewer asks questions from the questionnaire the respondents answers such questions
which are subsequently recorded by the former.

Advantages

 Flexibility in probing
 High response rate
 Observe non-verbal behaviour
 Respondent alone answer the questions
 Time of interview can be controlled
 Complex questions can be elucidated on
Disadvantages

 Costly
 Time consuming
 Interviewer bias
 No time to consult factual documents
 Inconveniencing to respondents

Postal Interview: the same questionnaire administered in direct interview can be mailed to
respondent, with the belief that upon receipt of such questionnaire, the respondent will answer the
questions appropriately and then mail them back to the researcher.

Advantages

 It is cheaper to conduct
 More area can be covered

16
 Interviewer bias is completely eliminated
 Sensitive questions thought to be prying to individual’s privacy can be answered with ease
 Respondents take their time to answer questions
Disadvantages

 Costly
 Time consuming
 Interviewer bias
 No time to consult factual documents
 Inconveniencing to respondents

Telephone Interview: Rather than the interviewer having to travel to meet the respondent, mailing
a questionnaire, if the respondents can be contacted telephonically, then a telephone interview
would ensure.

Advantages

 Takes a shorter time to execute


 Larger area can be covered
 Call backs are easy to conduct
 Interviewer can probe further on the questions
 Difficult and ambiguous questions can be explained
 Many people are willing to speak over the phone than addressed face to face
Disadvantages

 Applicable only where there is a highly developed telephone infrastructure


 Only a few questions can be asked
 Respondent’s emotions cannot be monitored
 Interview can be stopped abruptly by the respondent dropping the phone
 It require highly trained personnel otherwise high bills will be incurred
 Someone can respond in place of the right respondent
 People may not want to communicate sensitive matters over the phone
Electronic Interview: in this era of advanced information technology, a questionnaire can be sent to
a respondent electronically by electronic-mail. The use of video conferencing and SMS through cell
phones is also done.

17
18
2.2 Organization of the data

19

You might also like