0% found this document useful (0 votes)
13 views8 pages

Zoba 9

Uploaded by

Ibrahim Henaish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Zoba 9

Uploaded by

Ibrahim Henaish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

International Journal

of Hygiene and
Int. J. Hyg. Environ. Health 207 (2004); 555 ± 562 Environmental Health
http: // www.elsevier.de/intjhyg

Spatial analysis of childhood leukemia in a case/control study

Steve Selvin, Kathleen E. Ragland, Ellen Yu-Lin Chien, Patricia A. Buffler

University of California, Berkeley, USA

Received November 5, 2003 ´ Revision received May 11, 2004 ´ Accepted May 20, 2004

Abstract
A simple and direct analysis of the spatial distribution of childhood leukemia was performed
using geographic data from a large case/control study. The data consist of cases of childhood
leukemia and their corresponding birth cohort controls located in seven San Francisco Bay
Area counties. Both parametric and randomization analyses show no evidence of a non-
random spatial pattern of childhood leukemia among six of these counties. The data from
San Francisco County, however, produce a moderately small significance probability (0.08)
arising from a distance analysis and a significant p-value (0.01) arising from a frequency
analysis of concordant case pairs. Although these p-values accurately reflect the probability
of the observed spatial pattern occurring by chance alone, these results are based on only four
cases of leukemia.

Key words: Case/control design ± spatial analysis ± childhood leukemia

Introduction
need. The data and analyses that follow address this
Geographic or spatial patterns of childhood leuke- same question using recently collected cases of
mia have been occasionally reported (Cartwright childhood leukemia and controls in the United
et al., 2001; Dickinson et al., 2002; Land et al., States.
1984; Chen et al., 1997; Knox, 1994; Gustafsson
and Carstensen, 2000; Pobel and Viel, 1997; Rey-
nols et al., 1996, 2002; Michelozzi et al., 2002;
Grosche et al., 1999; Aickin et al., 1992; Alexander, Methods
1992) but their statistical interpretation has not
produced a consensus on the simple question: is Data
there evidence of a non-random spatial pattern Case/control data collected from the San Francisco Bay
associated with the incidence of childhood leuke- Area (1995-1999) provide an opportunity to explore the
mia? This is a question that was first posed in the spatial distribution of childhood leukemia using relatively
1950's and has yet to be resolved. It is recognized large numbers of observations (Table 1).
that acquiring some understanding of the temporal These case/control data originate from the Northern
California Childhood Leukemia Study (NCCLS) and are a
and spatial patterns of leukemia may provide small part of this large and far-ranging study of childhood
important etiologic insights. Thus, recent studies in leukemia (Ma, X. et al., 2002). The specific observations
the United Kingdom have attempted to address this consist of the addresses of 112 leukemia cases and 221

Corresponding author: Steve Selvin, School of Public Health, University of California, Berkeley, Berkeley, CA 94720,
USA. e-mail: [email protected], phone: 001 510 642 4618

1438-4639/04/207/06-555 $ 30.00/0
556 S. Selvin et al.

Table 1. Populations, areas, number of cases and number of controls from seven San Francisco Bay area counties.

county populations* areas** cases controls totals

Alameda 310,916 1952.7 40 77 117


Contra Costa 219,550 1963.5 20 40 60
Marin 43,742 1407.5 5 7 12
San Francisco 97,852 122.3 4 9 13
San Mateo 141,032 1193.9 11 19 30
Santa Clara 362,874 3376.2 22 51 73
Sonoma 96,034 4114.2 10 18 28
combined 1,272,000 14,131.2 112 221 333
* ˆ children ages 0 ± 15 from 2000 US census counts.
** ˆ area of county in square kilometers.

that matched on sex, race, county of birth and Hispanic


status of mother and father. Four to eight controls per case
were selected because it was anticipated that some families
would not agree to participate in the broader NCCLS.
Essentially all cases and controls selected, however, were
located geographically using birth address and whether or
not they agreed to participate further in the study, and used
in this spatial analysis. The residence of the mother at the
time of the child's birth was geographically located using a
standard global positioning system (GPS) based on visiting
the address of the study subject; thus producing a latitude
and longitude measurement for all cases and controls. The
final data set consists of 333 sets of matched pairs.
Figure 2 displays the spatial distribution of cases (dots)
and controls (circles) for a single county (Alameda) to
illustrate the data available to explore possible differences
in spatial patterns. In addition, the latitude and longitude
coordinates were mathematically transformed so that
distance is measured in kilometers; that is, a new Cartesian
coordinate system was established in kilometers relative to
the latitude and longitude point (37.5, ± 122.5). Nearest
neighbor distances were then calculated to compare
statistically the spatial patterns of cases and controls.

Fig. 1. The seven San Francisco Bay Area counties that make up
Analysis
the geographic study area. The nearest neighbor distance for a specific observation is
simply the distance to the closest observation among the
other case and control study participants. Nearest neigh-
bor distances were classified into two categories ±
birth controls from seven contiguous San Francisco Bay distances between two cases (case/case pairs) and dis-
Area counties (Alameda, Contra Costa, Marin, San tances between case/control or control/control pairs
Francisco, San Mateo, Santa Clara and Sonoma ± Table 1 (jointly referred to as non-case/case pairs).
and Figure 1). The cases are children (ages 0 to 14) with The nearest neighbor distance between case/case near-
newly diagnosed leukemia during the years 1995 to 1999 est neighbor pairs and the frequency of case/case pairs
ascertained from four major clinical centers in the greater provide two statistical measures of non-randomness of the
Bay Area. Although the case ascertainment is hospital- spatial data. The distribution of case/control and control/
based, a comparison with cases ascertained by the control pairs of nearest neighbor distances reflects the
California State Cancer Registry shows that the NCCLS spatial pattern unaffected by case status. Thus, when
identified more than 90% of eligible children in the San leukemia influences the spatial location of the cases, a
Francisco metropolitan area (the five counties in the reduced mean nearest neighbor distance between case/
registry area and data was not available from two non- case pairs should be observed as well as an elevated
participating counties). Controls for the NCCLS were frequency of nearest neighbor case/case pairs relative to
matched to these cases by date of birth. Specifically, the the non-case/case pairs.
controls were identified by randomly selecting four to When no spatial pattern exists, the mean nearest
eight birth certificates of children born on the same day neighbor distances calculated from case/case and non-
Spatial Analysis of Childhood Leukemia 557

Fig. 2. The spatial distribution of the 40 cases of childhood leukemia and 77 controls in the county of Alameda, CA (1995 ± 1999).

case/case pairs are expected to be equal and the frequency This approach is the topic of an extensive theoretical
of the case/case pairs is expected to be equal to a known paper that explores the properties and power of using the
value that depends only on the number of cases and number of case/case pairs as a test-statistic (Cuzick and
controls sampled. Edwards, 1990).
The two kinds of nearest neighbor mean distances are That is, the value m is compared to its expected value
compared using a typical two-sample test-statistic (denoted E[m])
n1

n n 1†
x1 x0 expected value ˆ E‰mŠ ˆ np ˆ n n ˆ 1 1
2
z ˆ p n n 1†
variance x1 x0 †: 2

where p represents the probability of the occurrence of a


where xÅ1 and xÅ0 represent the mean nearest neighbor
case/case pair, n1 represents the total number of cases and
distances between case/case and non-case/case pairs,
n represents the number of collected observations (con-
respectively.
trols ‡ cases ˆ n0 ‡ n1 ˆ n).The test-statistic used is
This test-statistic z has an approximate standard normal
distribution when no leukemia related systematic differ-
m ‡ 0:5 E‰mŠ
ences exist among mean nearest neighbor distances. zˆ p
The observed frequency of case/case pairs of nearest np 1 p†
neighbors (denoted m) is compared to the expected
frequency calculated under the hypothesis that no leuke- and it also has an approximate standard normal distribu-
mia related spatial pattern exists. tion when no leukemia related spatial pattern exists.
558 S. Selvin et al.

Fig. 3. The spatial distribution of the 112 cases of childhood leukemia and 221 controls in the seven Bay Area counties (1995 ± 1999).

These two parametric analyses are supplemented by Results


parallel randomization procedures applied to the two
nearest neighbor measures that are entirely assumption-
free and accommodate nearest neighbor statistics calcu- Using the 333 case/control observations, two mean
lated for any number of observations and any shaped values are calculated: the mean nearest neighbor
region (Besag and Diggle (1977). Significance levels (p- distance between case/case pairs (xÅ1) and the mean
values) are estimated from the computer generated nearest neighbor distance between the non-case/case
randomization distributions to evaluate the observed pairs (xÅ0). These mean values along with their
differences in mean nearest neighbor distances and the standard errors are presented in Table 2 for the six
frequency of case/case pairs. That is, case and control Bay Area counties and all the counties combined.
status is assigned at random, producing randomized
ªdataº that is used to estimate the distribution of these
Figure 3 displays the geographic locations of these
two test-statistics when only random differences exist 333 cases and controls for all counties.
between case and control spatial patterns. Significance For five counties and all counties combined the
probabilities (p-values) are then calculated from these comparisons show no evidence of a non-random
estimated ªnullº distributions. spatial pattern of childhood leukemia (no extreme p-
values ± Table 2). The County of San Francisco
shows some evidence of a non-random spatial
distribution but involves only three case/case pairs
among the 13 observations.
Spatial Analysis of Childhood Leukemia 559

Table 2. Nearest neighbor mean distances (in kilometers) for case/case and non-case/case pairs.

case/case xÅ1 (s.e.) non-case/case xÅ0 (s.e.) p-values*

Alameda 12 1.360 (0.388) 105 1.027 (0.083) 0.89 (0.88)


Contra Costa 7 2.055 (0.444) 53 1.508 (0.201) 0.83 (0.79)
San Francisco 3 0.410 (0.038) 10 1.597 (0.411) 0.08 (0.11)
San Mateo 3 2.563 (0.234) 27 2.390 (0.332) 0.57 (0.52)
Santa Clara 7 1.046 (0.305) 66 1.760 (0.247) 0.18 (0.15)
Sonoma 3 1.880 (0.587) 25 2.035 (0.437) 0.45 (0.33)
combined 34 1.454 (0.198) 299 1.529 (0.091) 0.39 (0.37)
*one sided p-values for parametric and randomized test (in parentheses) procedures
Note: Marin county has no case/case nearest neighbor pairs and in San Francisco count three case/case pairs are so close in proximity that they are not distinguishable in
Figure 3.

Table 3. Frequencies of case/case pairs, expected values and p-values.

n case/case ( m ) expected ( E [ m ]) p-values*

Alameda 117 12 13.45 0.61 (0.61)


Contra Costa 60 7 6.44 0.33 (0.31)
Marin 12 0 1.82 0.86 (0.82)
San Francisco 13 3 1.00 0.01 (0.01)
San Mateo 30 3 3.79 0.56 (0.56)
Santa Clara 73 7 6.42 0.33 (0.33)
Sonoma 28 3 3.33 0.46 (0.46)
combined 333 34 37.45 0.70 (0.70)
*one sided p-values for parametric and randomized test (in parentheses) procedures

To supplement the parametric approach, a parallel This theoretical lack of independence has been
randomization procedure is used to evaluate the shown to have little impact on evaluating nearest
observed differences between the seven mean nearest neighbor mean distances (Diggle 1978). Also, the
neighbor distances. Assigning case/control status at similarity of the parametric and randomization
random to the observed spatial locations produced analyses indicates that any lack of normality or
1000 replicate ªcase/controlº mean nearest neighbor non-independence of the nearest neighbor distances
distances for each county that differ only because of has no discernible systematic influence on the
sampling variation. The proportion of these null analysis of these data.
hypothesis generated differences in nearest neighbor In addition to comparing the mean nearest neigh-
means that are less than the original nearest neighbor bor distances, the frequency of case/case pairs also
mean values (xÅ1 xÅ0) is used to estimate the signifi- provides an assessment of the non-randomness of
cance levels. This assumption-free randomization spatial patterns. For example, the data from Alame-
approach essentially duplicates the two-sample para- da county where m ˆ 12 case/case pairs were
metric analysis (p-values in parentheses in Table 2). recorded and np ˆ 117(0.115) ˆ 13.45 cases/case
Two issues arise in the evaluating of these results: pairs are expected, a statistical test produces a p-
the assumption that the nearest neighbor distances value of 0.61, again using a normal distribution
have at least an approximate normal distribution based test-statistic. The results from the same
and that these distances are independently sampled. analysis applied to the six other counties and all
Mean nearest neighbor distances based on more seven counties combined are given in Table 3.
than six observations show no evidence of a These comparisons of expected and observed
misleading non-normality (Donnelly, 1978) due values also show no important indications of a
primarily to the fact that nearest neighbor distances non-random spatial pattern of childhood leukemia
have an approximately symmetric (mean  median) with again the possible exception of San Francisco
and normal-like distribution. However, the inde- County.
pendence of the observations remains a concern. In addition to a normal distribution based approx-
It is likely that searches for nearest neighbors imate approach, a randomization procedure was
involve overlapping areas causing a degree of non- used to generate 1000 random values of the ªcase/
independence. controls pairsº for each county when no difference
560 S. Selvin et al.

Table 4. Standard errors for evaluation of the frequency of case/ impact on the analysis of a spatial pattern. A small
case pairs (m versus E[m]). error in the location or a single misclassification of a
randomization theoretical15 independence* case, for example, would undoubtedly produce a
considerable change in the statistical results.
Alameda 3.311 3.396 3.450 Most epidemiologic case/control studies of child-
Contra Costa 2.394 2.390 2.398 hood leukemia face the problem of differing parti-
Marin 1.438 1.221 1.242
San Francisco 0.938 0.876 1.006
cipation rates between case and control groups,
San Mateo 1.695 1.791 1.820 producing a bias that potentially influences subse-
Santa Clara 2.419 2.466 2.419 quent analytic results. A feature of the NCCLS
Sonoma 1.725 1.702 1.714 spatial data is the absence of ªparticipation biasº
combined 5.685 5.673 5.759 due to the inclusion of essentially all selected case/
* ˆ standard errors based on assumed independence of observations and control subjects. Another potential source of bias
variance ˆ np(1 p) where p again represents the probability of the occurrence that is not an issue is the choice of the study
of a case/case pair.
population. Frequently the motivation to study a
population comes from an observed cluster of
exists in spatial patterns. The proportion of rando- leukemia cases. For example, the studies in Seascale,
mized ªcase/caseº pairs that exceeded the original England (Aickin et al., 1992), Falon, Nevada (Besag
value of m was computed (p-value). These estimated and Diggle, 1977) and the California Central Valley
p-values produce results that hardly differ from the (Diggle, 1978) all followed a reported ªclusterº of
parametric approach (p-values in parentheses ± leukemia cases. The NCCLS data were collected as
Table 3). part of a study focused on a wide range of questions
The estimated standard errors for the randomiza- concerning childhood leukemia in a large and
tion procedure, the standard errors calculated from diverse population with no history of any unusual
entirely theoretical considerations (Michelozzi spatial patterns.
et al., 2002) and the standard errors based on the Most geographic studies use location of the
null hypothesis generated expected values do not residence at diagnosis and hence are population-
substantially differ (Table 4). based for incidence. By using birth residence for the
When the estimates of variability do not differ, the NCCLS data, cases born during the same years as the
three analytic approaches for assessing the case/case controls but diagnosed prior to 1995 are excluded by
pair frequencies will not substantially differ. study criteria. However, only a few such cases were
omitted and no trends in leukemia risk are suffi-
ciently strong to produce a meaningful bias from
these excluded cases.
Discussion It should be noted that the NCCLS data were
collected in a matched design which was not taken
Both parametric and randomization analyses show into account in the spatial analyses.
no evidence of a non-random spatial pattern of The bias incurred is small and negligible because
childhood leukemia cases in this approximate po- the matching variables (age, sex, county and race)
pulation-based series of 333 cases and controls. are not strong confounding factors. The gain in
That is, the combined analysis indicates no power from an unmatched analysis over a matched
evidence of a spatial pattern. The same analyses, in analysis is considerable (333 cases/control observa-
addition, applied to each county (at reduced power) tions versus 150 pairs). In addition, less powerful
also show no indication of important non-random matched analyses (not given) produced results that
case/control differences. The data from San Francis- were similar to those that ignored the matched
co County produce a moderately small significance structure of the data collection.
probability (0.08) arising from the distance analysis One-sample nearest neighbor analyses are fre-
(Table 2) and a significantly small p-value (0.01) quently biased by ªedge-effectsº (Reynolds et al.,
arising in the frequency analysis (Table 3). Although 2002) which arises because the theoretical nearest
these p-values accurately reflect the probability of neighbor mean distance and its associated variance
the observed spatial pattern occurring by chance are generated based on regions without regard to
alone, the results from San Francisco County are boundaries whereas spatial data are typically re-
based on observing only four cases of leukemia stricted to well defined geographic areas causing a
(three case/case pairs). For such a small number of usually slight bias.
observations, classification errors or even slight Because both cases and controls in the present
biases in reporting will likely have an extreme analysis are subject to the same bias, they will be
Spatial Analysis of Childhood Leukemia 561

similarly affected by the absence of data beyond the The study was supported by two research grants from the Environ-
county boundaries. Thus, case/control comparisons mental Health Sciences, United State (R01 ES09137 and PS42
ES04705).
will be unbiased and correction is unnecessary.
In general, a statistical approach that reduces a
two-dimensional distribution to a one-dimensional
summary incurs a loss of ªinformationº. Conse- References
quently, certain spatial configurations are not easily
detected with specific spatial summary statistics Aickin, M., Chapin, C. A., Flood, T. J., Englender, S. J.
such as near neighbor distances (low statistical and Caldwell, G. C.: Assessment of the spatial
power). It is, however, likely that many such patterns occurrence of childhood leukemia mortality using
would be noticed by inspection, which then serves as standardized rate ratios with a simple linear Poisson
model. Int J Epidemiol 21, 649 ± 65 (1992).
a guide to the selection of a more powerful statistical
Alexander, E. F.: Space-time clustering of childhood
measure. As always, the degree of statistical power acute lymphoblastic leukemia: indirect evidence for
associated with a summary of a geographic distribu- transmissible agent. Br J of Cancer 65, 589 ± 592
tion of disease data depends largely on the postu- (1992).
lated spatial pattern underlying the observed loca- Besag, J. and Diggle, P. J.: Simple Monte Carlo test for
tions. spatial pattern. Appl Statist 26(3), 327 ± 33 (1977).
The controls for the spatial analysis of the NCCLS Cartwright, R. A., Dovey, G. J., Kane, E. V. and Gilman,
data were randomly selected from birth certificates E. A.: The onset of the excess of childhood cancer in
but, in general, a control group can be selected from Seascale, Cumbria. J Public Health Med 23(4), 314 ±
available US Census Bureau data and maps (e.g., 22 (2001).
Chen, R., Iscovich, J. and Goldbourt, U.: Clustering of
TIGER files). It is relatively straightforward to select
leukemia cases in a city in Israel. Stat Med 16(16),
randomly controls from a series of census tracts with 1873 ± 87 (1997).
probabilities equal to age-, race- and sex-specific Cuzick, J. and Edwards, R.: Spatial clustering for
population counts using readily available US census inhomogeneous populations. J R Statist Soc B 52(1),
data. Alternatively, census based block groups could 73 ± 104 (1990).
be used to provide an even better approximation to Dickinson, H. O. and Parker, L.: Leukemia and non-
the control population distribution. It is equally Hodgkins lymphoma in children of male Sellafield
straightforward to select a random location within radiation workers. Int J Cancer 99(3), 437 ± 44
each of these geographic areas producing a set of (2002).
spatially random control ªobservationsº. The im- Diggle, P. J.: Note on Clark and Evans test of spatial
randomness. In Simulation methods in archaeology,
plicit assumption in this process is that non-diseased
ed I. Hopper. Cambridge: Cambridge University Press
individuals have a stable and uniform distribution 246 ± 248 (1978).
within each selected area. The fact that people live in Donnelly, K. P.: Simulations to determine the variance
small clusters (single residences, apartment houses, and edge effect of total nearest neighbor distance. In
hotels, along roads, etc.) and are not at random Simulation methods in archaeology, ed I. Hopper.
geographic locations is ignored when a randomly Cambridge: Cambridge University Press 91 ± 95
selected point represents the location of a control. (1978).
This degree of approximation, however, is likely Grosche, B., Lackland, D., Mohr, L., Dunbar, J.,
sufficient to estimate accurately the spatial distribu- Nicholas, J., Burkart, W. and Hoel, D.: Leukemia in
tion of most US populations. the vicinity of two tritium-releasing nuclear facilities:
a comparison of the Kruemmel Site, Germany, and the
Another notable issue that emerges from the
Savannah River Site, South Carolina, USA. J Radiol
analysis of the NCCLS spatial data is the accuracy Prot 19(3), 243 ± 252 (1999).
of simple normal distribution based test-statistics. Gustafsson, B. and Carstensen, J.: Space-time clustering
Comparing nearest neighbor distances and counts of of childhood lymphatic leukemia and non-Hodgkins
case/case pairs using elementary parametric statis- lymphomas in Sweden. Eur J Epidemiol 16(12),
tical tools produces analytic results that differ little 1111 ± 1116 (2000).
from the computer intensive nonparametric approa- Knox, E. G.: Leukemia clusters in childhood: geographi-
ches. Because of the availability of ªcontrol obser- cal analysis in Britain. J Epidemiol Community
vationsº and elementary statistical methods to Health 48(4), 369-376 (1994).
compare spatial patterns, data collected using a Land, C. E., McKay, F. W. and Machado, S. G.: Child-
hood leukemia and fallout from the Nevada nuclear
case/control design should be useful for investigating
tests. Science 223(4632), 139 ± 144 (1984).
the spatial patterns of a wide variety of human Ma, X., Buffler, P. A., Selvin, S., Wiencke, J. L., Wie-
diseases. mels, J. L. and Reynolds, P.: Day-care attendance and
562 S. Selvin et al.

the risk of childhood acute lymphoblastic leukemia. Reynolds, P., Smith, D. F., Satariano, E., Nelson, D. O.,
Br J of Cancer 86, 1419 ± 1424 (2002). Goldman, L. R. and Neutra, R. R.: The four county
Michelozzi, P., Capon, A., Kirchmayer, U., Forastiere, F., study of childhood cancer: clusters in context. Stat
Biggeri, A., Barca, A. and Perucci, C. A.: Adult and Med 15(7 ± 9), 683 ± 697 (1996).
childhood leukemia near a high-power radio station Reynolds, P., Von Behren, J., Gunier, R. B., Goldberg,
in Rome, Italy. Am J Epidemiol 155(12), 1096-1103 D. E., Hertz, A. and Harnly, M. E.: Childhood cancer
(2002). and agricultural pesticide use: an ecologic study in
Pobel, D. and Viel, J. F.: Case-control study of leukemia California. Environ Health Perspectives 110(3), 319 ±
among young people near La Hague nuclear re- 324 (2002).
processing plant: the environmental hypothesis revis-
ited. BMJ 314(7074), 101 ± 106 (1997).

You might also like