0% found this document useful (0 votes)
26 views

Statistik 1

Uploaded by

MAISARAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Statistik 1

Uploaded by

MAISARAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

CHAPTER 1

INTRODUCTION TO STATISTICS

1.0 INTRODUCTION TO STATISTICS

1.1 WHAT IS STATISTICS


Statistics is the science of collecting, organizing, presenting, analyzing and interpreting
data. Based on the analyzed data conclusion can be drawn on the characteristics of the
population and decision can be made for future action. The steps of statistical analysis
involve collecting information, evaluating it, and drawing conclusions.
Statisticians provide crucial guidance in determining what information is reliable and
which predictions can be trusted. They often help search for clues to the solution of a
scientific mystery, and sometimes keep investigators from being misled by false
impressions. Statisticians work in a variety of fields, including medicine, government,
education, agriculture, business, law , and finance.

1.2 TYPES OF STATISTICS

Statistics can be dividing into two categories: descriptive statistics and inferential
statistics.

1.2.1 DESCRIPTIVE STATISTICS

This kind of statistics deals with developing and utilizing technique for careful collection
and effective presentation of the data collected.
The aim is to study the characteristics of the data. Therefore the study involves mainly in
the collection, organizing, presentation and description of the numerical information.

The art of collecting and organizing the data are the basic steps of statistics before any
presentation can be done. Data can be presented by using graphs or charts. Interpretation
of the graphs and charts will help us evaluate the information presented to describe the
characteristics of the data collected.

1.2.2 INFERENTIAL STATISTICS

This kind of statistics deals with the tools and technique of statistics that are used to
analyzed the data and to make predictions, estimates or decisions by drawing conclusions
from the data.

Inferential statistics is used to determine how far our decision about any information is true
and acceptable. It is also used to estimate or draw inferences about the attitudes or
characteristics of the whole population based on sample.

It encompasses all types of decision. In principle it enables an optimum decision to be


made for any problem especially if it relates to the following:

1
i. Determining whether any apparent characteristics of a situation are
genuine or are merely the result of random happening.
ii. Assessing the probable magnitude of numerical quantity and determining
the reliability of such assessment.
iii. Interpreting past patterns of variations to predict future happenings.

1.2.3 BASIC TERMS

Like all profession, also statisticians have their own keywords and phrases to ease a
precise communication. However, one must interpret the results of any decision making
in a language that is easy for the decision-maker to understand. Otherwise, he/she does
not believe in what you recommend, and therefore does not go into the implementation
phase. This lack of communication between statisticians and the managers is the major
barrier for using statistics.

Term Meaning
Element An element is an object on which a measurement is taken.
Population A population is a collection of element of interest or the
measurements obtained from all individuals or objects of interest.
Example people, animal, plants or thing we may collect data.

Sample A sample is a portion or subset of the total group or population of


interest.
Census A census is a study of the entire population.
Sample Survey Is a study on some selected segment of the population
Parameter A parameter is a numerical descriptive measure of the population.
Parameters are used to represent a certain population
characteristic. For example, the population means is a parameter
that is often used to indicate the average value of a quantity.

Statistic A statistic is a numerical descriptive measure taken from sample. It


is used to give information about unknown values in the
corresponding population. For example, the average of the data in
a sample is used to give information about the overall average in
the population from which that sample was drawn.

Variable A variable is measure a characteristics of the population under


study which may take different values, such as weight, gender since
they are different from individual to individual.

Data A data is an observation or information that have been recorded or


collected.
Random Random is the choice of a single item from a group if every item in
the group has the same chance of being selected as any other item
Pilot study A pilot, or feasibility study is a small experiment designed to test
logistics and gather information prior to a larger study, in order to
improve the quality and efficiency The pilot study provide vital
information on the severity of proposed procedures.

2
1.3 TYPE OF VARIABLE
A variable is a measurable factor, characteristic, or attribute of an individual or a system
in other words, something that might be expected to vary over time. For example, variable
of interest maybe the absenteeism among students, household income of Malaysian
citizen and sales of cars.

Qualitative
- Data that cannot be Example:
expressed by number in - gender, type of cars, color of
other word non – cars etc.
numerical scale.
- This is categorical data
that can be separated
into different categories
that are distinguished by
some nonnumeric
characteristics.
Quantitative Discrete data
- Data are anything that - Numerical response
Variable can be expressed as a which arises from a
number. Measured on counting process.
numerical scale. - It is a finite number or a
countable number.
- Example: number of car,
number of children.
Continuous data
- Infinitely many possible
values that correspond to
some continuous scale.
- Numerical response
which arises from a
measuring process or
example height.

1.4 LEVEL/ SCALE OF MEASUREMENT

The level of measurement of a variable in mathematics and statistics is a classification


that is used to describe the nature of information contained within numbers assigned to
objects and, therefore, within the variable. Level of measurement of the data is an
important factor in determining which procedure to use. Four level of measurement:
nominal, ordinal, interval and ratio.

1.4.1 NOMINAL SCALE

Variables that are measured only nominally are also called categorical variables. In this
type of measurement, names are assigned to objects as labels. The data cannot be
arranged in ordering scheme (from low to high). The nominal scale is the lowest in the
level of data measurement scale. Variables measured at a nominal level include gender,
marital status, race, religious affiliation, college major, and birthplace. Other examples

3
include: geographical location in a country, telephone access code, or the make or model
of a car.

Example:

1. What is your gender?

Male

Female

2. Where did you get information about 3G

Newspapers

Magazines

Television

Internet

Other
Please specify: _____________

1.4.2 ORDINAL SCALE

The numbers are called ordinals when the numbers assigned to objects represent the rank
order (1st, 2nd, 3rd etc.) of the entities measured. Comparisons of greater and less can
be made, in addition to equality and inequality. However, operations such as conventional
addition and subtraction are still meaningless. The ordinal scale is a level higher than the
nominal scale.
Example:

1 How satisfied are you with this book?

Very satisfied Neutral Not Very


satisfied satisfied Satisfied

4
2. Your highest Completed Level of Education.

1. SPM

2. Diploma

3. Bachelor

4. Master.

5. PhD

1.4.3 INTERVAL SCALE

The interval is like ordinal level but with the additional property that the different between
two data values is meaningful. Data at this level do not have a natural zero starting point.
Example of interval scale is temperature. 0o F doesn’t mean “no heat” and 40o F is not
twice as hot as 20o F.

Other Examples: Test Score, Shoe Size,

1.4.4 RATIO SCALE

Ratio scale is strongest scale of measurement. Ratio scale contains a meaningful zero
(absolute zero point) which represent the absence of the phenomena being measured.
Example of ratio measurement is time taken to study per day, the monthly amount spent
for prepaid top up and number of cigarette per day.

Example:

1. The weight of Tenggiri fish at a market.


2. The distance jumped by athletes in the long-jump event.
3. Number of sales calls made.

5
1.5 SAMPLING TECHNIQUE

TYPE OF SAMPLING TECHNIQUES

There are two type of sampling technique: the probability technique and non-
probability sampling technique.

Non – probability sampling techniques

Convenience Judgmental Quota Snowball


Sampling Sampling Sampling Sampling

Probability Sampling Technique

Simple Random Systematic Cluster Stratified Multistage


Sampling Sampling Sampling Sampling sampling

1.6 PROBABILITY SAMPLING TECHNIQUE

1.6.1 A SIMPLE RANDOM TECHNIQUE

A simple random sample of size n subjects is selected I such a way that every possible
sample of the same size n has the same equal chance of being chosen.

E
A C F
F
B D F D B
Randomly Select

6
Condition/ Method Example
 A simple random sample is selected from the An administrator from Academic
population in such a way that each item has the Affairs Department of a Universiti
same chance of being selected as a sample. Teknologi Mara Kelantan wanted
 Condition – population is homogeneous in nature to estimate how much the students
(units have similar characteristics) and complete spent on textbooks for a semester.
and updated sampling frame are compulsory. A random sample of 500 students
 Method applied are Lottery and random number was selected to answer the
generated from computer. survey.
 Process involved in selecting sample:
i. Get a complete sampling frame and sort it. This can be done by generating
ii. Label each element with unit number. 500 random numbers between 1
– 500 using lottery method and
Choose one by one (without replacement) until achieves random number generator.
n.

1.6.2 SYSTEMATIC SAMPLING

A random starting point is selected and then every kth member of the population is
selected. Condition – The population has to be homogeneous and sampling frame
has to be random, current but not necessarily complete. The process:-

i. List of sampling frame has to be sorted.


population size
ii. Find the of interval k 
sample size
iii. Choose r at random i.e the rth data is selected first.
iv. The remaining selection : rth , ( r  k )th, ( r  2k )th, ( r  3k )th,.....

Let say:

Total in the population N = 15


Number to be selected n = 3
Therefore k = N/n = 15/3 = 5
Let randomly select r = 3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
r (r+k)th (r+2k)th
3 (3 + 5) = 8 (3 + 2(5)) = 13

Sample

7
Example Describe Method
20 students from part one CS 110 Reason:
which consists of 100 students is The sampling frame exists and all the units in the
to be selected using systematic population are randomly arranged.
sampling.
- List number of students from 1-100.
- Using ‘equal interval method’ where
k – N/n = 100/20 =5
- For every 5 students only one will be selected.
- The first students will be selected at random
Between 1 to 5. Example select number 2
- The next students will be 7,12, 17,22,…,97

1.6.3 STRATIFIED SAMPLING

We divide the population into several mutually exclusive strata and then randomly
sample from each strata. Appropriate when large variations within the population occur.
The population is divided into strata such that unit within strata are homogeneous
and group are heterogeneous.

Example: A researcher wishes to conduct a study of the Berita Harian readership


based on Kota Kemuning Town which consists of 100 000 people. The resident’s
breakdown area as follows: 50% Malay, 30% Chinese, 18% Indian and 2% others. A
random sample of 500 people will be selected for this study.

Resident Size Population Size Sample


Malay 50% x 100 000 = 50 000 (50 000 / 100 000) x 500 = 250
Chinese 30% x 100 000 = 30 000 (30 000 / 100 000) x 500 = 150
Indian 18% x 100 000 = 18 000 (18 000 / 100 000) x 500 = 90
Others 2% x 100 000 = 2 000 (2 000 / 100 000) x 500 = 10
Total 100 000 500

8
Example Describe Method
A researcher intend to determine Number of household from each ethnic can be selected
the monthly income of household using proportional stratified sampling
in Section 7 Shah Alam, The
breakdown of the households A = (number in A / total population) x number to be selected.
according to ethnicity Malay 120 Malay = (120/300) x 50 = 20
household, Chinese 100 Chinese = (100/300) x 50 = 17
household, Indians 60 household Indian = (60/300) x 50 = 10
and Others 20 household. Only Other = (20/300) x 50 = 3
50 households will be selected
as sample.

1.6.4 CLUSTER SAMPLING

The population is divided into subgroups (cluster) such that the units within each group
are heterogeneous as possible and the groups are homogeneous. Specify appropriate
cluster eg.Street. Sampling frame for all clusters must be obtained. A random sample
of cluster is selected. Select all or portion of the units (at random) for each selected
cluster.

Population = group A + B + C + D + E
Number of group to be selected for sample = 3

Example Describe Method


Hotlink is conducting a survey to find - identifying the block of college as cluster.
out the average monthly prepaid in a - A sample of these clusters (block) is then chosen
certain college in UiTM Kelantan. randomly.
The area consists of 15 block of - All students in the block selected are including in the
college. Each block consists of 5 sample.
levels. A random sample of five
blocks is selected. All the level in
each block are chosen as sample.

9
1.6.5 MULTISTAGE SAMPLING

Multistage sampling extension of cluster sampling and involving several stages of


sampling. The purpose of this method to reduce time and cost when working with
samples from very large poplations.

Eg. Population: Shah Alam apartment residents.

i. Select randomly 5 sections out of 13 sections.


ii. Select randomly several block from 5 chosen sections.
iii. Then random sample of levels is selected from the chosen block.

Finally all apartments units from each selected levels are included in the final sample

10
1.7 NON-PROBABILITY SAMPLING TECHNIQUE

Non probability sampling is any procedure in which elements will not have the equal
chance of being included in a sample. Non-probability sampling technique is used
when sampling frames are difficult to obtain. Performing non-probability sampling is
considerably less expensive than doing probability sampling, but the results are of
limited value.

1.7.1 CONVENIENCE SAMPLING

Convenience sampling is referred as accidental sampling. It is not normally


representative of the target population because sample units are only selected if they
can be accessed easily and conveniently. Basically, respondent are selected because
they happen to be in the right place at the right time. For example the first 200
customers to enter a shopping mall and first five callers in a television contest.

1.7.2 JUDGMENTAL SAMPLING

This sampling technique is used when a sample is taken based on certain judgments
about the overall population. The assumption is that the investigator will select units
that are characteristic of the population. Judgmental sampling is subject to the
researcher's biases and is perhaps even more biased than convenience sampling. For
example in laboratory settings where the choice of experimental subjects (i.e., animal,
human,) reflects the investigators pre-existing beliefs about the population.

1.7.3 SNOWBALL SAMPLING


Snowball sampling is a method in which a researcher identifies one member of some
population of interest, speaks to him, and then asks that person to identify others in
the population that the researcher might speak to. This person is then asked to refer
the researcher to yet another person, and so on. For example in study the behavior
of drug addicts.

1.7.4 QUOTA SAMPLING


The sample is divided up into quotas, the quotas indicating the number of people to
be interviewed, but leaving the choice of the actual respondents to the interviewers.
This, of course introduces bias. The quotas are chosen so that the sample is
representative of the population in a number of respects, according to the controls
chosen. It may be completely unrepresentative in other respects. For example a
survey was carried out to study the opinion of the UiTM Kelantan students regarding
the service in UiTM Kelantan. The proportions of the respondents are 60% male and
40% female.

11
1.8 DATA COLLECTION METHODS
Data collection is important because analysis and conclusion rely on it. That is why the
analysis and validity of the data depends upon the contents and how the data is collected.
Three important aspects in choosing the right method collecting data or doing research
are:

i. How to choose a respondent.


ii. How to contact the selected respondent.
iii. What is the information needed from the respondents.

There are a few methods of getting information from a sample. The best methods for a
certain conditions depend on:

i. Budget available for the research; especially the amount allocated for field work.
ii. Time allocation for the research. This is important so that the research can be
finished on time.
iii. Accuracy of the result needed.
iv. The distribution of sample needed. The best methods are needed in handing
sample which is widely distributed to save expenses.

1.8.1 FACE-TO-FACE INTERVIEW


Face-to-face interview also know as personal interview. This process is done
whereby the researcher has to interview the respondents. Usually there are two
methods of how it is being done:

i. The interviewer will meet the respondent and ask question e.g the
population census.
ii. Respondent will have to go to the interview center whereby a few
interviewers will ask some questions e.g job interviews.

Advantages:

i. A high response rate compared to other methods. The skilled interviewer


can persuade respondents to answer their question.
ii. The interviewer can explain any question which the interviewee does not
understand. Moreover if some questions need elaboration the interviewer
can do so.
iii. By analyzing the surroundings; the interviewer can make some visual check
of the life style which is visibility obvious.
iv. Respondents will give a spontaneous answer
v. Interviewer can carefully ask personal question if necessary.
vi. More questions can be asked compared to other methods.

Disadvantages:

i. The cost is very expensive for each interview. The cost includes traveling
and living allowances for the interviewers.

12
ii. The interviewer might introduce biases by the way he asks or records
answer to his questions.
iii. Some people might be too embarrassed to answer a personal question.
iv. Interviewers should be trained and closely supervised to avoid any wrong
doing during collecting the data. All these needs smooth administration so
it involves a high cost.
v. Number of interviewers that can be done every day is limited because of
the time taken for each interview quite long and the time taken to contact
the respondents also are time consuming.

1.8.2 TELEPHONE INTERVIEWING


Telephone interviews are only limited to samples of research having telephones only.
Interviewer will contact a respondent by telephone. Questions that will be asked to
respondents must be arranged properly. This method is cheaper than personal
interview, but response rate is less because people can easily refuse the interview
by simply replacing the receiver.
Advantages:

i. This method allows a few interviews to be done simultaneously.


ii. Expenses are less than the personal interview method.
iii. Samples that are widely dispersed can be contacted easily as long as they
have a telephone.
iv. This method is only applicable for research when the sample has
telephone. Usually telephone subscribers are companies or individuals with
a high income.

Disadvantages:

i. Telephone subscriber does not represent the general public. So, not all
kinds of research can use this method because it will be bias.
ii. Only short questions are suitable to be asked.
iii. The nature of respondents cannot be seen.
iv. Time for interviewing via telephone is very limited. This means the
interviewer must find a suitable time to fit the respondent’s schedule.
v. Second attempt has to be done if the respondents schedule.
vi. Second attempt has to be done if the respondents did not answer the
telephone for the first time.

1.8.3 DIRECT OBSERVATIONS


Generally this method is used to study the habits or human behavior towards
something. All information needed is being recorded by the observer. This
information represents the data. To get accurate information, those that are being
observed should not be aware of this.
Advantages:

i. Fast to get information.


ii. It record what actually happened rather than what people say would have
happened or did happen.

13
iii. Accurate measurements or counting are available.

Disadvantages:

i. If the respondents realize that they are being observed, they might not act
like what they normally do.
ii. There is a tendency that the observed might interpret or misunderstood the
outcome of an event so he will make wrong choice and marked the
observations wrongly.
iii. The observer will have to take a long time to collect the information. By
doing this, he might get bored or get carried away by the events which will
give rise to recording the data wrongly.

1.8.4 DIRECT QUESTIONNAIRES

Questionnaire is a written or printed list of questions to be answered by a number of


people. The main goal of a questionnaire is to obtain a meaningful responses that
will be of aid in the decision making process. Questionnaires can be distributed by
hand or by post. Frequently they are used in interviewer with a set list of questions to
ask.

1.8.5 MAIL (OR POSTAL) QUESTIONNAIRE


Mail questionnaires are sent to respondents via post with a stamped addressed
envelope attached. The respondent required to fill up the questionnaires and return
back to the researcher within a specific time.
Advantages:

i. The research coverage is wider geographical area.


ii. The method is cheaper.
iii. The respondent has more time to think properly to answer the question.

Disadvantages:

i. Any doubts the respondents might have cannot be clarified.


ii. Response rate is quite low.
iii. Only simple questions can be asked.
iv. The respondents may be filling up the questions wrongly because nobody
is on hand to explain the question.

1.8.6 OTHER METHOD


Other data collection techniques are email, internet survey, short messaging service
(SMS), and computer assisted interviewing and computer- aided survey services.

14
TUTORIAL

QUESTION 1

A finance executive from FDR company is interested in conducting a study to estimate the monthly
expenditure on transportation spent by employees at FDR company. This company has
approximately 2400 employees. Some of them work in the manufacturing department, some in
the sales department and others in management and admin department. Each employee in the
company was given a number and a sample of 400 employees is selected as respondents.

a) State the population and sample of the study.


b) Identify the variable of interest for this study and state its type
c) What is the sampling method used? Describe how to carry out this sampling method
d) Name the most appropriate data collection method for the above study

QUESTION 2

A marketing analyst is conducting a satisfaction survey on car purchase in town A. The population
list includes 3000 Proton buyers, 2500 Honda buyers, 2500 Toyota buyers and 2000 Mazda
buyers. The analyst selects a sample of 400 car buyers.

a) State the population and the sampling frame for the study
b) Identify the variable of interest
c) State the appropriate sampling technique and give one reason for using this technique
d) Calculate the number of selected samples for each car manufacturer
e) Determine a suitable method for data collection and give one advantage of this method

QUESTION 3

A hotel has 300 rooms which are classified as Deluxe, Premier, Suite and Standard. The manager
wants to obtain information regarding the satisfaction level of the room usage from the hotel
guests by taking 20% sample from the hotel rooms.

Type of rooms Number of room


Deluxe 120
Premier 100
Suite 60
Standard 20

a) State the population for the study


b) Identify the variables of interest
c) State the appropriate sampling technique used by the manager
d) Obtain the number of rooms for each group in the study

15
QUESTION 4

The Tenaga Nasional Berhad branch in Klang is conducting a survey to find out the average
monthly consumption of electricity in a housing estate. The area consists of 20 blocks a double-
storey house. Each block consists of ten houses. A random sample of five blocks is selected. All
the houses in each block are included as units in the sample.

i. Determine the population in this survey.


ii. What is the variable of interest? It is a discrete or a continuous variable.
iii. State the possible sampling frame for this study.
iv. Suggest an appropriate sampling technique used in this survey. Give a reason
v. What method of data collection was used in the survey?

QUESTION 5

A finance executive from DELIRA Company is interested in conducting a study to estimate the
monthly expenditure on transportation spent by the employees at DELIRA Company. This
company has approximately 2000 employees. Some of them work in manufacturing department,
some in the sales department and some in management and administration department. Each
employee in the firm was given a number and a random sample of 200 employees is selected.

i. State the population.


ii. Identify the variable of interest for thus study and state its type.
iii. What is the sampling method used.
iv. What is the best sampling method to be used if the executive wanted to obtain
representative samples of employees?
v. Name the best data collection method that is appropriate for the above study.

QUESTION 6

Moore Travel Agency, a nationwide travel agency, offers special rates on Caribbean cruises to
Malaysian citizens. A researcher of Moore Travel wants to do a research on the ages of those
people taking the cruise. 100 customers taking a cruise last year was selected as a sample.

a) Describe the population and sample.


b) Define the sampling frame.
c) State the variable of interest and identify its type.
d) State the most appropriate sampling technique and explain how it can be done.
e) Name the best data collection method to be used and give one advantage and one
disadvantage.

16
QUESTION 7

In the automobile industry, customer service is a crucial factor affecting car sales. The
management of a reputed automobile company is interested in determining the level of customer
satisfaction with the service provided by the company’s service centers. The company has
altogether 60 service centers in the Klang Valley. A sample of 6 centers was selected at random.
Then all customers, who service their cars at these 6 service centers, were selected for the study.
A questionnaire was posted to these customers.

i. State the population for this study.


ii. State the sampling technique used for this study. State one reason for using this
technique.

QUESTION 8

A bachelor of Corporates student carried out a survey on sleeping habits among students in
tertiary education from the Private Higher Learning Institute. In the survey, 15% of the 1000
students interviewed said they must take a nap after lunch before the evening class.

a) State the population and sampling frame for the above


b) How many sample involve in the study?
c) State the appropriate sampling method used for this study. Why?
d) 15% of the 1000 students interviewed said they must take a nap after lunch before evening
classes. From the statement, what types of statistic are given?
e) If one of the variable is number of hours sleep, state the type of variable and level of
measurement use for the variable.
f) Is pie chart an appropriate graphical presentation for the variable above (e). Explain.

QUESTION 9

Excessive sugar intake has been pointed as the main cause of diabetes among Malaysians.
However, recent research findings also include stress and unhealthy diet as possible causal
factors. A health science researcher would like to confirm these recent findings by conducting a
research. He took a random sample of 50 patients from a total of 400 patients who received
treatment for diabetes at one of the hospital in Johor. The researcher developed his own
questionnaire and interviewed every patient in the sample.

i. State the population and the sampling frame for the study.
ii. List the variable of interest and state the type of variable.
iii. It is advisable to do pilot study when creating questionnaire. Give TWO advantages of
pilot study.
iv. If systematic random sampling was employed to select the 50 patients, explain how it
was conducted?

17

You might also like