The document discusses sampling design and different types of sampling techniques. It defines sampling as selecting a subset of units from a target population to make inferences about the whole population. There are two main types of sampling - probability sampling and non-probability sampling. Probability sampling uses random selection so that each unit has an equal chance of being selected, while non-probability sampling involves deliberate or judgment-based selection. The key aspects of a good sampling design are that it results in a representative sample and minimizes errors and biases.
Download as PPTX, PDF, TXT or read online on Scribd
67%(3)67% found this document useful (3 votes)
2K views
Sampling Design
The document discusses sampling design and different types of sampling techniques. It defines sampling as selecting a subset of units from a target population to make inferences about the whole population. There are two main types of sampling - probability sampling and non-probability sampling. Probability sampling uses random selection so that each unit has an equal chance of being selected, while non-probability sampling involves deliberate or judgment-based selection. The key aspects of a good sampling design are that it results in a representative sample and minimizes errors and biases.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49
Sampling design
• Scope and purpose.
• Sampling is a means of selecting a subset of units from a target population for the purpose of collecting information. • This information is used to draw inferences about the population as a whole. • The subset of units that are selected is called a sample. Sampling design The process of obtaining information from a subset (sample) of a larger group (population) The results for the sample are then used to make estimates of the larger group Faster and cheaper than asking the entire population • All items in any field of inquiry constitute a “Universe” or “Population” • A complete enumeration of all items in the population is known as “ census inquiry” • It can be presumed that in such an inquiry, when all items are covered, no element is left----thus achieving highest accuracy • But in a practical scenario—this may not be true • A small amount of bias in such an inquiry will become larger and larger as the number of observation increases • There is no way of checking the element of bias or its extent except through a survey or use of sample checks • ‘Bias’ in research is anything that produces systematic variation in the research finding • This is unexpected ( it is out of your control) • In research, the goal is to understand the “true” relationship between the predictor and an outcome • Isolating these 2 is difficult • Bias is defined as any tendency which prevents consideration of a question • In research, bias occurs when “systematic error is introduced into sampling or testing by selecting or encouraging one outcome or answer over others” • Besides, this type of inquiry involves a great deal of time, money and effort • Therefore, when the field of inquiry is large, this method becomes difficult to adopt because of the resources involved • At times, this method is practically beyond the reach of ordinary researchers/or group • Perhaps, Government is the only Institution which can accomplish such a task • Example…population census carried out once in a decade • However, it needs to be emphasized that when the Universe is a small one, it is no use resorting to a sample survey • At the same time, when field studies are undertaken in practical life, time, cost and effort have to be taken care of • This consideration invariably lead to a selection of respondents i.e., selection of only a few items • The respondents selected should be as representative of the total population as possible in order to produce a miniature cross-section • The selected respondents constitute what is technically called a ‘sample’ • The selection process is called ‘sampling technique’ • The survey so conducted is known as ‘sample survey’ • Let the population size be ‘N’ • If a part of size ‘n’ (which is< N) of this population is selected according to some rule for studying some characteristics of the population • The group consisting of these ‘n’ units is known as “sample” • ‘Sample design’---how a sample should be selected and of what size such a sample would be • Example: to study the feed back of a teacher in a class consisting of 60 students, you select 2 samples of same no of students…one sample of 10 boys and an other sample of 10 girls Steps in Sample design 1. Type of Universe: Clearly define the set of objects….technically called ‘Universe’ (or population) • The Universe can be finite ( population of city, no of students in a college etc.) or infinite ( no. of stars in the sky, no of insects in the world etc) 2. Sampling Unit: • It may be a geographical one such as state, district, village etc. • May be a social unit such as family, school, college etc • May be a unit of students (scored FCD, girls placed etc) 3. Source list: It is the ‘sampling frame’ from which sample is to be drawn, ex: an Engineering college 4.Size of Sample: Ex: 10 boys from a class of 60---here 10 is the sample size and 60 is the population • The size of the sample should neither be excessively large nor too small • It should be optimal 5. Parameters of interest: • Height, weight , marks scored by students (of sample)etc 6.Budgetary constraint: • Cost considerations, from practical point of view, have a major impact upon decisions relating to not only the size of the sample but also the type of the sample 7.Sampling procedure: • Researcher must decide about the technique to be used in selecting the items for the sample • This technique or procedure stands for the sample design itself Criteria for selecting a sampling procedure • 2 costs are involved in a sampling analysis 1. The cost of collecting the data 2. The cost of an incorrect inference resulting from the data • There are 2 causes of incorrect inferences namely systematic bias and sampling error • A systematic bias results from errors in the sampling procedure, and it cannot be reduced or eliminated by increasing the sample size • Usually a systematic bias is the result of one or more of the following factors 1. Inappropriate sampling frame: 2. Defective measuring device: If the physical measuring device is defective there will be systematic bias in the data collected through such a measuring device 3. Non-respondents: The individuals selected in the sampling procedure should be as representative of the population as possible 4. Indeterminacy principle: Example is that some individuals act differently when they are under observation 5.Natural bias in the reporting of data: • People in general understate their incomes if asked about it for tax purposes, but they overstate the same if asked for social status • Generally in psychological surveys, people tend to give what they think is the correct answer rather than revealing their true feelings • In summary, while selecting a sampling procedure, researcher must ensure that the procedure causes a relatively small sampling error and helps to control the systematic bias in a better way Characteristics of a good sampling design a. Sample design must result in a truly representative sample b. Sample design must be done in such way that it would result in a small sampling error c. It should be viable in the context of budget allocated for the research study d. It should be able to control the systematic bias in a better way e. The results of the sample study can be applied, in general, for the Universe with a reasonable level of confidence Different types of sample designs • There are different types of sample designs based on 2 factors namely representation basis and the element selection technique • On the representation basis, the sample may be probability sampling or it may be non-probability sampling • Probability sampling is based on the concept of random selection • Non-probability sampling is ‘non-random’ selection • On the element selection basis, the sample may be either restricted or unrestricted Non-probability sampling (random) • This is also known by different names such as deliberate sampling, purposive sampling and judgment sampling • The choice of the researcher concerning the items remains supreme • In such a design, personal element has a great chance of entering into the selection of the sample • The investigator may select a sample which shall yield results favourable to his point of view • If this happens, the entire inquiry may get vitiated (destroyed/spoilt) • Thus there is always the danger of bias entering into this type of sampling technique • ‘Quota sampling’ is also an example of non- probability sampling • A sampling method of gathering representative data from a group • As opposed to random sampling, quota sampling requires that representative individuals are chosen out of a specific subgroup. • For example, a researcher might ask for a sample of 100 females, or 100 individuals between the ages of 20 and 30 • Under this sampling, the interviewers are simply given quotas to be filled from the different strata • The actual selection of items for the sample is left to the interviewer’s discretion • This type of sampling is very convenient and is relatively inexpensive • But the samples so selected certainly do not possess the characteristic of random samples • Quota samples are essentially judgment samples • Inferences drawn on this basis are not amenable (willing to accept) to statistical treatment in a formal way Probability sampling • This is known as ‘random sampling’ or ‘Chance sampling’ • Every item of the universe has an equal chance of inclusion in the sample • It is so to say, a lottery method • It is blind chance alone that determines whether one item or the other is selected • Results obtained can be assured in terms of probability • Random sampling ensures the law of ‘Statistical Regularity’ • It (SR) states that if on an average the sample chosen is a random one, the sample will have the same composition and characteristics of the universe • The implications of random sampling are: a) It gives each element in the population an equal probability of getting into the sample b) All choices are independent of one another c) It gives each possible sample combination an equal probability of being chosen • Thus we can define a simple random sample from a finite population as a sample which is chosen such that each of the NCn possible samples has the same probability…1/NCn of being selected • Example: • Take a certain finite population consisting of 6 elements(say a,b,c,d,e and f). • So N=6 • Suppose that we want to take a sample of size n=3 from it. • Then there are 6c3= 6!/3!.3!=20 • 20 possible distinct samples of the required size • The elements are: abc, abd, abe, abf……… • If we choose one of these samples, then its probability will be 1/20 How to select a random sample • If the population is small, write each of the possible sample on a slip of paper • Mix them properly • Then draw a lottery to pick a sample • This procedure is obviously impractical if the population is large • The procedure may be simplified in actual practice by the use of random number tables • Various statisticians like Tippett, Yates, Fisher have prepared tables of random numbers which can be used for selecting a random sample Tippett’s random number tables • Tippett gave 10400 four digit numbers • He selected 41600 digits from the census reports and combined them into fours to give his random numbers which may be used to obtain a random sample • The first 30 sets of Tippett’s numbers:
2952 6641 3992 9792 7979 5911
3170 5624 4167 9525 1545 1396 7203 5356 1300 2693 2370 7483 3408 2769 3563 6107 6913 7691 0560 5246 1112 9025 6008 8126 • Suppose we are interested in taking a sample of 10 units from a population of 5000 Units, bearing numbers from 3001 to 8000 • Take 10 such numbers which are not less than 3001 and not greater than 8000 • We shall read the table randomly from left to right starting from the first row, we get the following 10 numbers; • 6641,3992,7979,5911,3170,5624,4167,7203,5356 and 7483 • The units bearing the above serial numbers would then constitute our required random sample
2952 6641 3992 9792 7979 5911
3170 5624 4167 9525 1545 1396 7203 5356 1300 2693 2370 7483 3408 2769 3563 6107 6913 7691 0560 5246 1112 9025 6008 8126 Complex Random sampling designs • Probability sampling under restricted sampling techniques may result in complex random sampling designs • There are 6 popular types of complex random sampling designs: 1. Systematic sampling 2. Stratified sampling 3. Cluster sampling 4. Area sampling 5. Multi stage sampling 6. Sampling with probability proportional to size Systematic sampling • Select every ith item on a list….it is systematic sampling • An element of randomness is introduced into this kind sampling by using random numbers to pick up the unit with which to start • The remaining units of the sample are selected at fixed intervals • As the unit interval is fixed, this type of sampling is spread more evenly over the entire population • It is an easier and inexpensive method and it could be used even in case of large population Stratified sampling • If a population from which a sample is to be drawn does not constitute a homogeneous group, stratified sampling technique is generally applied in order to obtain a representative sample • Here the population is divided into several sub- populations (strata) and then we select items from each stratum to constitute a sample • Since each stratum is more homogeneous than the total population, precise estimation would be achieved • Stratified sampling results in more reliable and detailed information • How to form strata? • Strata be formed on the basis of common characteristic(s) of the items to be put in each stratum • Ensure elements being most homogeneous within each stratum and most heterogeneous between the different strata • Thus strata are purposely formed and are usually based on past experience and personal judgment of the researcher • At times, pilot study may be conducted for determining a more appropriate and efficient stratification plan • How should items be selected from each stratum? • By simple random sampling • Systematic sampling can be used if it is considered more appropriate in certain situations • How many items be selected from each stratum or how to allocate the sample size of each stratum? • Usually follow the method of proportional allocation under which the sizes of the samples from the different strata are kept proportional to the sizes of the strata • If Pi represents the proportion of population included in stratum ‘i’, and ‘n’ represents the total sample size, then the number of elements selected from stratum i is ---n.pi • Illustration: • Let n=30, N=8000, 3 Strata of sizes N1=4000, N2=2400 and N3 = 1600 Adopting proportional allocation, we can get the sample size for strata with N1 =4000, as: P1 = 4000/8000 n1 = n.P1 = 30x0.5 =15 Similarly for the 2nd strata: n2 = 30x2400/8000=09 And for the 3rd strata: n3 = 30x 1600/8000 =06 The Proportion of sample size-----15:9:6==5:3:2 The proportion of sizes of strata--- 4000:2400:1600 Disproportionate sampling
• In cases where strata differ not only in size but also
in variability, it is considered reasonable to take larger samples from the more variable strata and smaller samples from the less variable strata • This procedure accounts for both----differences in stratum size and differences in stratum variability by using disproportionate sampling design Illustration • A population is divided into 3 strata so that N1=5000, N2=2000 and N3=3000. Respective standard deviations are:15,18 and 5. • How should a sample size n=84 be allocated to the 3 strata, if we want optimum allocation using disproportionate sampling design? • ni = n Ni i / Ni I • Ni i = N11+ N12+ N13 +………..+ Ni i Sample size for strata with N1=5000 is given by: N1 = 84x5000x15/(5000x15 +2000x18 +3000x9) =6300000/126000 =50 On the similar lines we get N2=24, N3=10 • In addition to differences in stratum size and differences in stratum variability (),one may have differences in stratum cost • Then one can have cost optimal disproportionate sampling design as: ni = n.Ni i/ Ci /( Ni i/Ci) for i=1,2….k N i i = N11+ N12+ N13 + N14 + N15 Illustration: Next slide… Problem.2 A certain population is divided into 5 strata so that N1=200, N2=2000, N3=1800,N4=1700, and N5=2500. Respective standard deviations are: 1= 1.6, 2=2.0, 3 =4.4, 4 =4.8, 5= 6.0, and further the expected sampling cost in the first 2 strata is Rs4.0 per interview and in the remaining 3 strata, the sampling cost is Rs6.0, per interview. How should a sample of size n=226 be allocated to 5 strata if we adopt a) proportionate sampling design, b) disproportionate sampling design considering i) only the differences in stratum variability ii) differences in stratum variability as well as the differences in stratum sampling costs Cluster sampling • The total area of interest happens to be a big one, • A convenient way in which a sample can be taken is to divide the area into a number of smaller non- overlapping areas • And then to randomly select a number of these small areas (usually called clusters), with the ultimate sample consisting of all units in these small areas or clusters • Assume that there 20,000 machine parts in the inventory at a given point of time • These are stored in 400 cases (boxes) each case having 50 each. • Suppose we want to estimate the proportion of machine parts which are defective in the above said inventory • Now use a cluster sampling treating each case as a cluster • Then randomly select ‘n’ cases out of 400 cases and examine all the machine parts in them • Cluster sampling, no doubt, reduces cost by concentrating surveys in selected clusters • But it is less precise than random sampling • There may not be as much information in ‘n’ clusters’ observation as in ‘n’ randomly drawn observations without making clusters • This has the economic advantage Area sampling • Area sampling is same as ‘cluster sampling’, if some geographical subdivisions are treated as clusters • So the merits and demerits of cluster sampling are also applicable for area mapping Multi-stage sampling • Multi-stage sampling is further development of the principle of cluster sampling • Here one will have clusters at different levels • Suppose we want to investigate the working efficiency of nationalized banks in India and we want to take a sample of few banks for this purpose • This can be solved having banks at different stages-- -banks at states, banks in the districts of a state, banks in the towns in a district so on… • Multi-stage sampling is applied in big inquires extending to a considerable large geographical area or a big organization (say an University) • There are 2 advantages 1. It is easier to administer than most single stage designs because it is developed in partial units 2. A large number of units can be sampled for a given cost under multistage sampling because of sequential clustering 6. Sampling with probability proportional to size • If the cluster sampling units do not have the same number of units, then cluster sampling cannot be followed. • In that case a random selection process where the probability of each cluster being included in the sample is proportional to the size of the cluster • List the number of elements in each cluster • Then sample systematically the appropriate number of elements from the cumulative total of elements Illustration • The following are the number of departmental stores in 15 cities; 35,17,10,32,70,28,26,19,26,66,37,44,33,29,28 • It is required to select a sample of 10 stores, using cities as clusters and selecting within clusters proportional to size • How many stores from each city should be chosen. Use a starting point of first sample from the first city (cluster) with 10 elements(departmental stores) • Sampling of appropriate number of elements has to be done from the cumulative total elements. • Cumulative total is =500 departmental stores • From this we have to select a sample of 10 stores • Therefore the appropriate sampling interval is 500/10=50 • The given starting point of the element is 10 ( if it is not given, it can be randomly selected) • The next sampling is at 10+50=60 • The sequence goes on 60 + 50 =110 and so on City No. No.of Deptl stores Cumulativ Sample e total 1 35 35 10 (1) 2 17 52 ----(0) 3 10 62 60 (1) 4 32 94 ----- 5 70 164 110, 160(2) 6 28 192 ---- 7 26 218 210 (1) 8 19 237 --- 10 samples 9 26 263 260 (1) 10 66 329 310 (1) 11 37 366 360 (1) 12 44 410 410 (1) 13 33 443 --- 14 29 472 460 (1) 15 28 500 • Problem.2 The following are the number of departmental stores in 10 cities: 35,27,24,32,42,30,34,40,29 and 38.If we want to select a sample of 15 stores using cities as clusters and selecting within clusters proportional to size, how many stores each city should be chosen? • Use a starting point of 4 City No. of Departmental Cumulative Sample No.of stores to Number Stores total be selected 1 35 35 4, 26 2 2 27 62 48, 1 3 24 86 70 1 4 32 118 92,114 2 5 42 160 136,158, 2 6 30 190 180, 1 15 7 34 224 202,224 2 8 40 264 246 1 9 29 293 268, 1 10 38 331 290,312 2 Sequential sampling • When a particular lot has to be accepted or rejected on the basis of a single sample---it is known as single sampling • Similarly double sampling……….., multiple sampling • When the number of samples is more than 2 but it is neither certain nor decided in advance, then this type of sampling is often referred to as sequential sampling