Chpt1 4
Chpt1 4
Definition (Statistics).
Statistics consists of a body of methods for collecting and analyzing data.
(Agresti & Finlay, 1997)
How can we analyse the data and draw conclusions from it?
1
3. Inference: Making predictions and generalizing about phenomena
represented
by the data.
2
In particular, the causes for misuse of statistics arise at all stages of the
statistical process such as:
- Inappropriate methods of collecting the information (sampling) and
small sample size.
- Use of unrepresentative subgroups in a study
- Inappropriate methods of summarizing data eg Misleading Graphs
as in choice of scales on a graph may lead to misleading
interpretation.
- Wrong choice of data analysis and inferential technique eg
assuming independence or wrong distributions of data.
- Wrong conclusions from the results
School A School B
Average Score 55 75
However, suppose that we re-examine the data, looking not only at the
average scores after attending the schools, but those before entering the
schools. Suppose the numbers are
School A School B
(Before Entering) 25 85
(After Entering) 55 75
Discuss
3
Bias Loaded Questions
Poorly Written Survey Questions Creating Biased
Data
So a “bias obstacle” occurs because poorly worded questions create
biased data.
compare with
you get far less agreement than you would get on “Do you support a
balanced budget?”
People are more likely to be taken to sound loyal, law abiding and
supportive of “government payments to the poor” and “universal health
coverage”
People are likely to provide an answer that agrees with the authority
figure.
“Do you agree that smoking should be against the law.”
“Do you agree with the president that taxes should be raised to lower the
national debt”
4
In each one of these think of the likely bias from the responses from
respondents.
5
Two Major types of statistics.
1. Descriptive:
Descriptive statistics consist of methods for organizing and summarizing
information (Weiss, 1999).
Descriptive statistics includes
- the construction of graphs, charts, and tables, and
- the calculation of various descriptive measures such as averages,
measures
of variation, and percentiles.
In fact, the most part of this course deals with descriptive
statistics.
2. Inferential Statistics.
Inferential statistics consist of methods for drawing and measuring the
reliability of conclusions about population based on information obtained
from a sample of the population. (Weiss, 1999)
6
Later we will discuss various methods of summarizing and displaying this
data.
7
Variables and organization of the data
[Weiss (1999), Anderson & Sclove (1974) and Freund (2001)]
2.1 Variables
A characteristic that varies from one person or thing to another is called a
variable, i.e, a variable is any characteristic that varies from one individual
member of the population to another.
Discrete variables.
Some variables, such as the
- numbers of children in family,
- the numbers of car accident on the certain road on different days, or
- the numbers of students taking basics of statistics course are the
results of
counting and thus these are discrete variables.
8
Typically, a discrete variable is a variable whose possible values are some
or all of the ordinary counting numbers like 0, 1, 2, 3, . . . .
As a definition, we can say that a variable is discrete if it has only a
countable number of distinct possible values. That is,
a variable is is discrete if it can assume only a finite numbers of values or
as
many values as there are integers.
Continuous variables.
Quantities such as length, weight, or temperature can in principle be
measured arbitrarily accurately. There is no indivible unit.
Nominal scale
Mainly Quantitative -- No natural ordering
Observations fall or belong to one or the other category
Eg Marital Status- Single, Married, Living together
ets
Eg Gender M, F
9
E.g. Car colors for a certain model are: red, silver, blue
and black.
categories are merely names.
we cannot write 2 − 1 = 4 − 3 or 1 + 3 = 4.
The categories into which a qualitative variable falls may or may not have
nominal referring to the fact that the categories are merely names.
Ordinal variable
10
If the categories can be put in order, the scale is called an ordinal scale.
Based
as
Interval Scale
If one can compare the differences between measurements of the variable
meaningfully, (Note in this case but not the ratio of the measurements,
Ratio Level:
11
Similar to interval, except there is a true zero, or
starting point, and the ratios of data values have
meaning.
If, on the other hand, one can compare both the differences between
i.e, a ratio scale is an interval scale with a meaningful absolute zero point.
and
12
Guided Exercise 1
Television station QUE wants to know the proportion of
TV
owners in Virginia who watch the stations new program
at least
once a week. The station asked a group of 1000 TV
owners in
Virginia if they watch the program at least once a week.
13
(c) Longitudinal data: These are data taken on a group of people at regular intervals of time
usually aimed at finding out if there has been any change in behaviour, or some other
characteristics. For example, measurements can be taken on a baby’s weight, height every
fortnight to determine if there has been changes in these characteristics; measurements at
regular intervals of time on women suffering from obesity after being placed on some diet.
Secondary data: consist of figures that were collected originally to satisfy a particular
investigation, but have been used now at second hand, as the basis for a different
investigation. This a cost effective way of data usage even though it maybe inadequate in
information. Examples of secondary data will include journals articles, magazines, reports,
etc. Note that census data which is being used by a second user who didn’t collect the data is
also secondary data.
Advantages of using secondary data
i) Data are easier and less time consuming to collect since they already exist.
ii) They are less expensive to collect.
Disadvantages of using secondary data
14
i) Data may not be very relevant to the problems at hand.
ii) No control can be exercised in terms of the accuracy in collecting procedure and rounding
errors in the data
iii) The data are no longer amenable to any changes.
Data are primary or secondary depending on who collects it and who uses it. To the original
collector, the data are primary but if after collecting the data, someone else uses the data for
any purpose, the data are secondary to that user.
Data collection is a stage in research where the needed data is solicited from the respondents.
Depending on the type of study, there several methods that can be used to obtain responses from a
subject and these are as follows:
1.2.1 Direct Observation: Observation implies the use of eyes rather than listening or asking
questions. The researcher usually would adapt themselves amongst the community members
observing what is going on around and conversing with members. Even though the respondents may
respond to the researcher’s questions, a trusting environment has to be developed otherwise the
respondents will not act in a natural way. This method of collecting data is good in subject matters
that involve sensitive issues like initiation schools in a traditional tribal setup.
Advantages: The respondents are usually unaware of what is happening and so behave in their
natural ways.
Disadvantages: It takes a long time and effort to complete. A lot of training is required for the
person(s) who are involved in the study.
1.2.2 Documentary Sources: Before one can go into the field to collect data, it is more appropriate
that one use the already available information about the subject matter in books, journals, etc. A
mass of information about the population studied by social surveys is available in historical
documents, statistical reports, records of institutions and other sources. It must be borne in mind
that this information was compiled for a purpose, hence on its own it cannot address the objectives
of a new survey, thus will only act as supplementary information.
15
1.2.3 Interviewing: A survey interview is a structured conversation between an interviewer and the
respondents in which questions are asked and answers recorded in standardized schedule sheet.
This constitutes a formal interview, whereas a less formal is depicted by the absence of structured
questionnaire. There are different types of interviews.
Direct Interview: There is an interviewer and the respondents at the same place at the same time.
As the interviewer asks questions from the questionnaire the respondents answers such questions
which are subsequently recorded by the former.
Advantages
Flexibility in probing
High response rate
Observe non-verbal behaviour
Respondent alone answer the questions
Time of interview can be controlled
Complex questions can be elucidated on
Disadvantages
Costly
Time consuming
Interviewer bias
No time to consult factual documents
Inconveniencing to respondents
Postal Interview: the same questionnaire administered in direct interview can be mailed to
respondent, with the belief that upon receipt of such questionnaire, the respondent will answer the
questions appropriately and then mail them back to the researcher.
Advantages
It is cheaper to conduct
More area can be covered
16
Interviewer bias is completely eliminated
Sensitive questions thought to be prying to individual’s privacy can be answered with ease
Respondents take their time to answer questions
Disadvantages
Costly
Time consuming
Interviewer bias
No time to consult factual documents
Inconveniencing to respondents
Telephone Interview: Rather than the interviewer having to travel to meet the respondent, mailing
a questionnaire, if the respondents can be contacted telephonically, then a telephone interview
would ensure.
Advantages
17
18
2.2 Organization of the data
19