Statistik 1
Statistik 1
INTRODUCTION TO STATISTICS
Statistics can be dividing into two categories: descriptive statistics and inferential
statistics.
This kind of statistics deals with developing and utilizing technique for careful collection
and effective presentation of the data collected.
The aim is to study the characteristics of the data. Therefore the study involves mainly in
the collection, organizing, presentation and description of the numerical information.
The art of collecting and organizing the data are the basic steps of statistics before any
presentation can be done. Data can be presented by using graphs or charts. Interpretation
of the graphs and charts will help us evaluate the information presented to describe the
characteristics of the data collected.
This kind of statistics deals with the tools and technique of statistics that are used to
analyzed the data and to make predictions, estimates or decisions by drawing conclusions
from the data.
Inferential statistics is used to determine how far our decision about any information is true
and acceptable. It is also used to estimate or draw inferences about the attitudes or
characteristics of the whole population based on sample.
1
i. Determining whether any apparent characteristics of a situation are
genuine or are merely the result of random happening.
ii. Assessing the probable magnitude of numerical quantity and determining
the reliability of such assessment.
iii. Interpreting past patterns of variations to predict future happenings.
Like all profession, also statisticians have their own keywords and phrases to ease a
precise communication. However, one must interpret the results of any decision making
in a language that is easy for the decision-maker to understand. Otherwise, he/she does
not believe in what you recommend, and therefore does not go into the implementation
phase. This lack of communication between statisticians and the managers is the major
barrier for using statistics.
Term Meaning
Element An element is an object on which a measurement is taken.
Population A population is a collection of element of interest or the
measurements obtained from all individuals or objects of interest.
Example people, animal, plants or thing we may collect data.
2
1.3 TYPE OF VARIABLE
A variable is a measurable factor, characteristic, or attribute of an individual or a system
in other words, something that might be expected to vary over time. For example, variable
of interest maybe the absenteeism among students, household income of Malaysian
citizen and sales of cars.
Qualitative
- Data that cannot be Example:
expressed by number in - gender, type of cars, color of
other word non – cars etc.
numerical scale.
- This is categorical data
that can be separated
into different categories
that are distinguished by
some nonnumeric
characteristics.
Quantitative Discrete data
- Data are anything that - Numerical response
Variable can be expressed as a which arises from a
number. Measured on counting process.
numerical scale. - It is a finite number or a
countable number.
- Example: number of car,
number of children.
Continuous data
- Infinitely many possible
values that correspond to
some continuous scale.
- Numerical response
which arises from a
measuring process or
example height.
Variables that are measured only nominally are also called categorical variables. In this
type of measurement, names are assigned to objects as labels. The data cannot be
arranged in ordering scheme (from low to high). The nominal scale is the lowest in the
level of data measurement scale. Variables measured at a nominal level include gender,
marital status, race, religious affiliation, college major, and birthplace. Other examples
3
include: geographical location in a country, telephone access code, or the make or model
of a car.
Example:
Male
Female
Newspapers
Magazines
Television
Internet
Other
Please specify: _____________
The numbers are called ordinals when the numbers assigned to objects represent the rank
order (1st, 2nd, 3rd etc.) of the entities measured. Comparisons of greater and less can
be made, in addition to equality and inequality. However, operations such as conventional
addition and subtraction are still meaningless. The ordinal scale is a level higher than the
nominal scale.
Example:
4
2. Your highest Completed Level of Education.
1. SPM
2. Diploma
3. Bachelor
4. Master.
5. PhD
The interval is like ordinal level but with the additional property that the different between
two data values is meaningful. Data at this level do not have a natural zero starting point.
Example of interval scale is temperature. 0o F doesn’t mean “no heat” and 40o F is not
twice as hot as 20o F.
Ratio scale is strongest scale of measurement. Ratio scale contains a meaningful zero
(absolute zero point) which represent the absence of the phenomena being measured.
Example of ratio measurement is time taken to study per day, the monthly amount spent
for prepaid top up and number of cigarette per day.
Example:
5
1.5 SAMPLING TECHNIQUE
There are two type of sampling technique: the probability technique and non-
probability sampling technique.
A simple random sample of size n subjects is selected I such a way that every possible
sample of the same size n has the same equal chance of being chosen.
E
A C F
F
B D F D B
Randomly Select
6
Condition/ Method Example
A simple random sample is selected from the An administrator from Academic
population in such a way that each item has the Affairs Department of a Universiti
same chance of being selected as a sample. Teknologi Mara Kelantan wanted
Condition – population is homogeneous in nature to estimate how much the students
(units have similar characteristics) and complete spent on textbooks for a semester.
and updated sampling frame are compulsory. A random sample of 500 students
Method applied are Lottery and random number was selected to answer the
generated from computer. survey.
Process involved in selecting sample:
i. Get a complete sampling frame and sort it. This can be done by generating
ii. Label each element with unit number. 500 random numbers between 1
– 500 using lottery method and
Choose one by one (without replacement) until achieves random number generator.
n.
A random starting point is selected and then every kth member of the population is
selected. Condition – The population has to be homogeneous and sampling frame
has to be random, current but not necessarily complete. The process:-
Let say:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
r (r+k)th (r+2k)th
3 (3 + 5) = 8 (3 + 2(5)) = 13
Sample
7
Example Describe Method
20 students from part one CS 110 Reason:
which consists of 100 students is The sampling frame exists and all the units in the
to be selected using systematic population are randomly arranged.
sampling.
- List number of students from 1-100.
- Using ‘equal interval method’ where
k – N/n = 100/20 =5
- For every 5 students only one will be selected.
- The first students will be selected at random
Between 1 to 5. Example select number 2
- The next students will be 7,12, 17,22,…,97
We divide the population into several mutually exclusive strata and then randomly
sample from each strata. Appropriate when large variations within the population occur.
The population is divided into strata such that unit within strata are homogeneous
and group are heterogeneous.
8
Example Describe Method
A researcher intend to determine Number of household from each ethnic can be selected
the monthly income of household using proportional stratified sampling
in Section 7 Shah Alam, The
breakdown of the households A = (number in A / total population) x number to be selected.
according to ethnicity Malay 120 Malay = (120/300) x 50 = 20
household, Chinese 100 Chinese = (100/300) x 50 = 17
household, Indians 60 household Indian = (60/300) x 50 = 10
and Others 20 household. Only Other = (20/300) x 50 = 3
50 households will be selected
as sample.
The population is divided into subgroups (cluster) such that the units within each group
are heterogeneous as possible and the groups are homogeneous. Specify appropriate
cluster eg.Street. Sampling frame for all clusters must be obtained. A random sample
of cluster is selected. Select all or portion of the units (at random) for each selected
cluster.
Population = group A + B + C + D + E
Number of group to be selected for sample = 3
9
1.6.5 MULTISTAGE SAMPLING
Finally all apartments units from each selected levels are included in the final sample
10
1.7 NON-PROBABILITY SAMPLING TECHNIQUE
Non probability sampling is any procedure in which elements will not have the equal
chance of being included in a sample. Non-probability sampling technique is used
when sampling frames are difficult to obtain. Performing non-probability sampling is
considerably less expensive than doing probability sampling, but the results are of
limited value.
This sampling technique is used when a sample is taken based on certain judgments
about the overall population. The assumption is that the investigator will select units
that are characteristic of the population. Judgmental sampling is subject to the
researcher's biases and is perhaps even more biased than convenience sampling. For
example in laboratory settings where the choice of experimental subjects (i.e., animal,
human,) reflects the investigators pre-existing beliefs about the population.
11
1.8 DATA COLLECTION METHODS
Data collection is important because analysis and conclusion rely on it. That is why the
analysis and validity of the data depends upon the contents and how the data is collected.
Three important aspects in choosing the right method collecting data or doing research
are:
There are a few methods of getting information from a sample. The best methods for a
certain conditions depend on:
i. Budget available for the research; especially the amount allocated for field work.
ii. Time allocation for the research. This is important so that the research can be
finished on time.
iii. Accuracy of the result needed.
iv. The distribution of sample needed. The best methods are needed in handing
sample which is widely distributed to save expenses.
i. The interviewer will meet the respondent and ask question e.g the
population census.
ii. Respondent will have to go to the interview center whereby a few
interviewers will ask some questions e.g job interviews.
Advantages:
Disadvantages:
i. The cost is very expensive for each interview. The cost includes traveling
and living allowances for the interviewers.
12
ii. The interviewer might introduce biases by the way he asks or records
answer to his questions.
iii. Some people might be too embarrassed to answer a personal question.
iv. Interviewers should be trained and closely supervised to avoid any wrong
doing during collecting the data. All these needs smooth administration so
it involves a high cost.
v. Number of interviewers that can be done every day is limited because of
the time taken for each interview quite long and the time taken to contact
the respondents also are time consuming.
Disadvantages:
i. Telephone subscriber does not represent the general public. So, not all
kinds of research can use this method because it will be bias.
ii. Only short questions are suitable to be asked.
iii. The nature of respondents cannot be seen.
iv. Time for interviewing via telephone is very limited. This means the
interviewer must find a suitable time to fit the respondent’s schedule.
v. Second attempt has to be done if the respondents schedule.
vi. Second attempt has to be done if the respondents did not answer the
telephone for the first time.
13
iii. Accurate measurements or counting are available.
Disadvantages:
i. If the respondents realize that they are being observed, they might not act
like what they normally do.
ii. There is a tendency that the observed might interpret or misunderstood the
outcome of an event so he will make wrong choice and marked the
observations wrongly.
iii. The observer will have to take a long time to collect the information. By
doing this, he might get bored or get carried away by the events which will
give rise to recording the data wrongly.
Disadvantages:
14
TUTORIAL
QUESTION 1
A finance executive from FDR company is interested in conducting a study to estimate the monthly
expenditure on transportation spent by employees at FDR company. This company has
approximately 2400 employees. Some of them work in the manufacturing department, some in
the sales department and others in management and admin department. Each employee in the
company was given a number and a sample of 400 employees is selected as respondents.
QUESTION 2
A marketing analyst is conducting a satisfaction survey on car purchase in town A. The population
list includes 3000 Proton buyers, 2500 Honda buyers, 2500 Toyota buyers and 2000 Mazda
buyers. The analyst selects a sample of 400 car buyers.
a) State the population and the sampling frame for the study
b) Identify the variable of interest
c) State the appropriate sampling technique and give one reason for using this technique
d) Calculate the number of selected samples for each car manufacturer
e) Determine a suitable method for data collection and give one advantage of this method
QUESTION 3
A hotel has 300 rooms which are classified as Deluxe, Premier, Suite and Standard. The manager
wants to obtain information regarding the satisfaction level of the room usage from the hotel
guests by taking 20% sample from the hotel rooms.
15
QUESTION 4
The Tenaga Nasional Berhad branch in Klang is conducting a survey to find out the average
monthly consumption of electricity in a housing estate. The area consists of 20 blocks a double-
storey house. Each block consists of ten houses. A random sample of five blocks is selected. All
the houses in each block are included as units in the sample.
QUESTION 5
A finance executive from DELIRA Company is interested in conducting a study to estimate the
monthly expenditure on transportation spent by the employees at DELIRA Company. This
company has approximately 2000 employees. Some of them work in manufacturing department,
some in the sales department and some in management and administration department. Each
employee in the firm was given a number and a random sample of 200 employees is selected.
QUESTION 6
Moore Travel Agency, a nationwide travel agency, offers special rates on Caribbean cruises to
Malaysian citizens. A researcher of Moore Travel wants to do a research on the ages of those
people taking the cruise. 100 customers taking a cruise last year was selected as a sample.
16
QUESTION 7
In the automobile industry, customer service is a crucial factor affecting car sales. The
management of a reputed automobile company is interested in determining the level of customer
satisfaction with the service provided by the company’s service centers. The company has
altogether 60 service centers in the Klang Valley. A sample of 6 centers was selected at random.
Then all customers, who service their cars at these 6 service centers, were selected for the study.
A questionnaire was posted to these customers.
QUESTION 8
A bachelor of Corporates student carried out a survey on sleeping habits among students in
tertiary education from the Private Higher Learning Institute. In the survey, 15% of the 1000
students interviewed said they must take a nap after lunch before the evening class.
QUESTION 9
Excessive sugar intake has been pointed as the main cause of diabetes among Malaysians.
However, recent research findings also include stress and unhealthy diet as possible causal
factors. A health science researcher would like to confirm these recent findings by conducting a
research. He took a random sample of 50 patients from a total of 400 patients who received
treatment for diabetes at one of the hospital in Johor. The researcher developed his own
questionnaire and interviewed every patient in the sample.
i. State the population and the sampling frame for the study.
ii. List the variable of interest and state the type of variable.
iii. It is advisable to do pilot study when creating questionnaire. Give TWO advantages of
pilot study.
iv. If systematic random sampling was employed to select the 50 patients, explain how it
was conducted?
17