Lecture 1
Lecture 1
Lecture 1
Statistical thinking will one day be as necessary a
qualification for efficient citizenship as the ability to read
and write. Samuel Wilkes, 1951, paraphrasing H. G.
Wells from Mankind in the Making
Definitions
Statistics is a tool for converting data into information
Statistics
Data
Information
The word statistics has two meanings. In the more common usage, statistics refers to
numerical facts .
The second meaning of statistics refers to the field or discipline of study closely related to
mathematics
Statistics consists of a body of methods for collecting and analysing data.
Statistics is the methodology for collecting, analysing, interpreting and drawing conclusions
from information.
Statistics is methodology which scientists and mathematicians have developed for
interpreting and drawing conclusions from collected data.
Biostatistics is the branch of applied statistics directed toward applications in the health
sciences and biology.
Mathematical Statistics the study and development of statistical theory and methods in the
abstract
Applied Statistics the application of statistical methods to solve real problems involving
randomly generated data and the development of new statistical methodology motivated by
real problems
Applied statistics can be divided into two areas: descriptive statistics and inferential statistics
Descriptive statistics consists of methods for organizing, displaying, and describing data by
using tables, graphs, and summary measures.
Inferential statistics consists of methods that use sample results to help make decisions or
predictions about a population.
DATA
Data are facts or figures from which conclusions may be drawn.(Dictionary)
Data are observations of random variables made on the elements of a population or sample
Data are the quantities (numbers) or qualities (attributes) measured or observed that are to
be collected and/or analysed
The word data is plural datum is singular
A collection of data is often called a data set singular
Type of data
1) continuous data,
2) counts,
3) proportions,
4) binary data,
5) time at death,
6) time series, and
7) circular data.
Variable
. A variable can be defined as a property, or a characteristic, of a data item that
may vary from one item to another or over time.,i.e, a variable is any characteristic
that varies from one individual member of the population to another.
A variable is a characteristic under study that assumes different values for different
elements. In contrast to a variable, the value of a constant is fixed.
The value of a variable for an element is called an observation or measurement.
Examples of variables for humans are height, weight, number of siblings, (numerical
measurements, quantitative or numerical variables )
sex, marital status, and eye colour. (non-numerical measurements, qualitative or
categorical variables.
A data set is a collection of observations on one or more variables
Quantitative A variable that can be measured numerically
Qualitative Categorical : Variables that cannot be measured numerically but can be
divided into different categories
Quantitative
Quantitative variables is classified as either discrete or
continuous
a discrete variable is a variable whose values are countable. In
other words, a discrete variable can assume only certain values
with no intermediate values, it can takes some or all of the
ordinary counting numbers like 0,1,2,3,....
As a definition, we can say that a variable is discrete if it has
only a countable number of distinct possible values.
Continuous variable is variable that can assume any numerical
value over a certain interval or intervals. Also it is data in which
the observations can be measured on a continuum or scale
Scale
The scale of the variable gives certain structure to the variable and also
defines the meaning of the variable.
Qualitative variable:
Nominal scale: is a type of categorical data in which objects fall into
unordered categories; and thus only givesnamesor labels to various
categories
The word nominal referring to the fact that the categories are merely names
e.g blood group, eye colour.
Nominal comes from the Latin rootnomenmeaningname. Nomenclature,
nominative, and nominee are related words.
If the categories can be put in order, the scale is called an ordinal scale. E.g
education level.. but the interval between measurements is not meaningful
. Quantitative variables, whether discrete or continues, are defined either on
an interval scale or on a ratio scale.
Continuous data
Continuous data is data in which the observations can
be measured on a continuum or scale; can have almost
any numeric value; and can be meaningfully subdivided
into finer and finer increments, depending upon the
precision of the measurement system.
Examples
temperature, mass, distance, etc.
Statistical methods
regression and analysis of variance.
Count data
Count data is a form of discrete data in which the
observations can take only the non-negative integer
values {0, 1, 2, ...}, and where these integers arise from
counting rather than ranking.
Count data is usually of one of two forms: 1) simple
counts, e.g., the number of plants infected by a disease
on a plot, the number of eggs in a nest, etc.,
and 2) categorical data, in which the counts are
recorded for one or more categorical explanatory
variables, e.g., the number of infected plants classified
into tree species and town.
Proportion data
Proportion data is another form of discrete data in which
we know how many of the observations are in one
category (i.e., an event occurred) and we also know how
many are in each other category (i.e., how many times
the event did not occur). This is an important
distinction, since it allows the data to be represented as
proportions instead of frequencies, as withtotal
count infect
data.notinfect
ed
ed
Examples of proportion data:
10
8
2
20
10
10
percent mortality, percent infected, sex ratio,
etc..
Binary data
Binary data is data in which the observations can take
only one of two values, for example, alive or dead,
present or absent, male or female, etc.. Binary data is
useful when you have unique values of one or more
explanatory variables for each and every observational
unit
binary data is the special case of proportional data
when the trial size is fixed at one for all
observational/experimental units.
Time to death/failure
Time to death/failure data is data that take the form of
measurements of the time to death, or the time to
failure of a component; each individual is followed until
it dies (or fails), then the time of death (or failure) is
recorded. Time to death/failure data is not limited to
plant and animal longevity studies, however, it applies
to any situation in which the time to completion of a
process is relevant.
Data Collection
The reliability and accuracy of data depend on method
of collection
Sources:
Published data
Observation data
Experimental data
Published data
Convenient
Low cost
Reliable
They can either be
Primary data: when they are published by the same agency
which collect the data
Secondary when the data is published by organization
differed from the one collected the data (they should be used
with care some error may occurred.
Example: Metrological data, census data
Survey
Survey refers to a technique of gathering information regarding a variable under
study, from the respondents of the population.
It can be considered as an observation data collection
It is used in descriptive research in social study,
It may be a sample survey or a census survey. This method relies on the
questioning of the informants on a specific subject. Survey follows structured form
of data collection, in which a formal questionnaire is prepared, and the questions
are asked in a predefined order.
Surveys may be administered in a variety of ways, e.g.
Personal Interview,
Telephone Interview, and
Self-Administered Questionnaire.
Questionnaire Design
1. Keep the questionnaire as short as possible.
2. Ask short, simple, and clearly worded questions.
3. Start with demographic questions to help respondents get
started comfortably.
4. Use dichotomous (yes | no) and multiple choice questions.
5. Use open-ended questions cautiously.
6. Avoid using leading-questions.
7. Pre-test a questionnaire on a small number of people.
8. Think about the way you intend to use the collected data
when preparing the questionnaire.