ENVIRONMETRICS
Environmetrics 2006; 17: 605–621
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/env.767
Landscape-based geostatistics: a case study of the distribution
of blue crab in Chesapeake Bay
Olaf P. Jensen
1,
*
,
y
, Mary C. Christman
2,
z
and Thomas J. Miller
1
1
University of Maryland Center for Environmental Science Chesapeake Biological Laboratory, P.O. Box 38, 1 Williams Street,
Solomons, MD 20688, USA
2
Deptartment of Animal and Avian Sciences, Animal Sciences Building Room no. 1117, University of Maryland, College Park,
MD 20742, USA
SUMMARY
Geostatistical techniques have gained widespread use in ecology and environmental science. Variograms are
commonly used to describe and examine spatial autocorrelation, and kriging has become the method of choice for
interpolating spatially-autocorrelated variables. To date, most applications of geostatistics have defined the
separation between sample points using simple Euclidean distance. In heterogeneous environments, however,
certain landscape features may act as absolute or semi-permeable barriers. This effective separation may be more
accurately described by a measure of distance that accounts for the presence of barriers. Here we present an
approach to geostatistics based on a lowest-cost path (LCP) function, in which the cost of a path is a function of
both the distance and the type of terrain crossed. The modified technique is applied to 13 years of survey data on
blue crab abundance in Chesapeake Bay. Use of this landscape-based distance metric significantly changed
estimates of all three variogram parameters. In this case study, although local differences in kriging predictions
were apparent, the use of the landscape-based distance metric did not result in consistent improvements in kriging
accuracy. Copyright # 2006 John Wiley & Sons, Ltd.
key words: barriers; blue crab; Chesapeake Bay; distance metric; kriging; variogram
1. INTRODUCTION
Traditionally, geostatistical approaches have specified spatial covariance based on the Euclidean
distance between sampled points. Implicit in the use of Euclidean distance is the assumption that the
process or feature of interest is continuously distributed between any two points. However, in many
instances, the space separating two sampled points may represent a partial or complete barrier owing
Received 29 October 2004
Copyright # 2006 John Wiley & Sons, Ltd. Accepted 14 July 2005
*Correspondence to: Olaf P. Jensen, University of Wisconsin, Center for Limnology, 680 N Park Street, 16, Madison, WI 53706,
USA.
y
z
Current address: Deptartment of Statistics, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL
32611-0339, USA.
Contract/grant sponsor: University of Maryland Sea Grant; contract/grant number: R/F-89.
to biological or physical characteristics of the intervening space. Presumably, the presence of such
barriers should impact the distribution of the process or featur e. However, the influence of barriers in
geostatistical analyses has been largely ignored.
Barriers are common in coastal or estuarine environments and in river networks. Ignoring such
landscape complexity can result in inaccurate interpolation across barriers and misspecification of the
spatial covariance structure (Rathbun, 1998). Previous approaches to variogram modeling and kriging
using alternative non-Euclidean distance metrics explored the impact of the alternative distance
metrics on model predictions (Little et al., 1997; Rathbun, 1998); however, they have either not
made use of efficient GIS algorithms and available habitat classification maps (Rathbun, 1998) or are
difficult to apply to systems which do not approximate a linear network (e.g., Little et al., 1997;
Gardner et al., 2003).
1.1. The Importance of barriers in environmental modeling
Spatial heterogeneity is a common feature of nearly all landscapes and can have important
consequences for the way organisms move and interact. One of the simplest but most important
impacts of spatial heterogeneity occurs when one habitat type serves as a barrier to movement and
dispersal. Barriers are important in determining biogeographic, ecological, and evolutionary patterns
(Grinnell, 1914; MacArthur and Wilson, 1967; Gilpin and Hanski, 1991; Brown, 1998). The
recognition of barriers, however, has been restricted generally to a few high-profile models that
explicitly describe their effects (e.g., island biogeography and metapopulation dynamics). As habitat
fragmentation and isolation continue to increase, barrie rs will be an increasingly important component
of many landscapes.
Barriers are a prominent feature of the landscape of stream and estuarine systems. It has long been
recognized by stream ecologists that Euclidean distance is an inappropriate metric, and distance
measured along the thalweg (i.e., the center of the stream channel) is commonly used. This metric
recognizes that most processes measured in a stream are continuous only within the aquatic habitat.
Many estuaries are characterized by highly invaginated shorelines where converging tributaries are
separated by narrow peninsulas of land. Conditions on opposite sides of a peninsula can show much
greater variation than their geographic proximity suggests. In some cases, because of differences in the
geology or land use of their watersheds , adjacent tributaries show remarkable differences in their
chemical and biological characteristics (Pringle and Triska, 1991). Not surprisingly then, the first
attempts to incorporat e the effects of barriers into geostatistical modeling occurred in estuaries (Little
et al., 1997; Rathbun, 1998).
1.2. Geostatistics and ecological landscapes
Heterogeneous landscapes can impose patterns that violate the assumpt ions of geostatistics (Cressie,
1993). For example, the strongest assumptions of the geostatistical model, those of second-order
stationarity (spatial constancy of the mean and variance) and isotropy (directional constancy of the
variogram), are likely to be violated in the presence of any ecologically important gradients in the
landscape. In a simple example, a resource gradient in a meadow may result in a trend in mean plant
density along the gradient (violation of the constant mean assumption). Spatial autocorrelation is
likely to be stronger and extend further when measured perpendicular to the resource gradient (i.e., at
similar resource levels), and consequently the variograms will exhibit anisotropy. This effect is often
seen in data from coastal syst ems in which autocorrelation extends further when measured parallel to
606
O. P. JENSEN, M. C. CHRISTMAN AND T. J. MILLER
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
the shoreline, that is, alon g rather than across depth contours. While such landscape characteristics can
lead to violation of the assumptions of geostatistical methods, they often represent useful information
about the underlying processes being studied. For example, in a study of snow thickness on various
types of sea ice, anisotropy in variograms of snow depth highlighted the important role of prevailing
wind direction in determining spatial patterns of snow distribution (Iacozza and Barber, 1999).
Checking for and correcting such landscape-induced violations of the assumptions have become
integral steps in geostatistical modeling through the introduction of easily applied corrections such as
detrending, variogram models that incorporate geometric anisotropy, and universal kriging.
However, efficient and easily implemented solutions to landscape barriers have not been available,
and so their impacts have been largely ignored. A commonly-used approach to interpolation in the
presence of barriers, which is implemented in many GIS programs, is to simply reject points that are
separated by a barrier. This approach effectively divides the prediction area into many convex regions
in which only points contained within a given region are used for prediction. In complex landscapes
with many barriers, this approach limits the number of points used for prediction in some areas, and
therefore greater sample sizes are needed to achieve the same degree of accuracy.
While a simple test for the presence of influential barriers is not available, we can define conditions
under which they may be important. Barriers are likely to have a substantial impact on geostatistical
interpolation only when the following two general conditions apply:
(1) The extent of the survey and the prediction areas are larger than the scale at which barriers
intervene. For example, peninsulas may be effective barriers to the dispersal of marine organisms
among adjacent bays, yet they would have little impact on predictions if the survey and the
prediction area were limited to a single bay.
(2) The range of spatial autocorrelation is greater than the scale at which barriers intervene. In an
estuary, we would expect little impact if the Euclidean distance between sample or prediction
points in adjacent tributaries was grea ter than the range parameter from the variogram. This is
because points separated by a distance greater than the range are essentially uncorrelated and
receive very little weight when predictions are made.
Visual inspection of the sample and prediction points on a map of the underlying landscape can
determine quickly whether the first condition applies. It is more difficult, however, to determine a
priori whether the range is greater than the scale at which barriers intervene since barriers may
influence the empirical variogram and consequently affect the estimate of the range.
Here we present an approach to incorpor ate the effects of barriers in geostatistical analyses. This
approach makes use of common GIS algorithms for calculating distances that are weighted based on
the ‘cost’ of the habitat type through which a given path passes. As an example, we apply the technique
to data on the spatial distribution of blue crab (Callinectes sapidus) in Chesapeake Bay.
2. METHOD
2.1. Landscape-based distance metrics
What are appropriate alternatives to Euclidean distance when barriers exist and the spatial scale of the
modeling effort and the range of spatial autocorrelation make them relevant? Sampson and Gutto rp
(1992) suggest an empirical non-paramet ric approach to determining the appropriate distance metric
in cases where a time series of observations for each sample site is available. Such a data-rich
environment, however, is likely to be the exception in environmental applications. In his work,
LANDSCAPE-BASED GEOSTATISTICS 607
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
Rathbun (1998) divided the study region into a series of adjacent convex polygons based on a digitized
shoreline of the estuary. This approach split the estuary into increasingly smaller polygons until the
shortest through water distance between all sample points was achieved. Little et al. (1997) recognized
the suitability of a GIS as an efficient environment for conducting this type of spatial calculation. They
defined a network of line segments connecting points in an estuary. Variations of this linear network
approach have been used to model water temperat ures (Gardner et al., 2003) and fish abundance
(Torgersen et al., 2004; Ganio et al., 2005) in stream networks. While computationally efficient for
narrow regions where movement is only possible along one dimension, this approach is difficult to
apply in the more open portions of an estuary where distance both along and across the principal axis
of the estuary must be considered.
Here we develop a distance metric that is equally applicable to both linear networks and open
areas and accounts for the presence of barriers in terrestrial or aquatic landscapes. The distances
are calculated using the cost-weighted distance function common to many GIS programs. This
raster function calculates the lowest-cost distance from a cell to any other cell in a digitized
map. Cost is defined by a function that represents the relative ease of movement through the
associated habitat type. Diagonal m ovements are allowed, and their cost is estimated from the
length of the diagonal rather than the cell size. The total cost of a given path is the sum of
the individual cost cells encountered along that path multiplied by the cell size. For each point in
the survey data set, a distance raster map is produced that represents the lowest-cost distance from
the cell to any sample point. This distance raster is sampled at each of the other s ample and
prediction locations and the corresponding values are stored in a table of distances. We note that
when the landscape is defined in terms of absolute barrie rs, the binary case, passable habitat is
given a cost of one while barrier habitat is given an infinite cost (e.g., a ‘no data’ value). However,
the approach need not assign costs in this binary manner and is generally expandable to any cost
function.
Krivoruchko and Gribov (2002) applied a techniq ue similar to the one developed here for
calculating a lowest-cost path (LCP) distance and used it to model air quality in California. They
used a digital elevation model (DEM) to define a cost map repr esenting the relative impeda nce of the
environment to the spread of air pollution. Regions with steep changes in elevation were given a higher
cost than flat land in order to account for the preferential spread of air masses along rather than across
elevation contou rs. Interpolation was conducted using the inverse distance weighted method. Visual
inspection of interpolated maps based on Euclidean distance and those produced using the landscape-
based distance support the use of the latter technique. Importantly, however, Krivorucko and Gribov
(2002) did not present any quantitative comparisons of the prediction accuracy of alternative distance
metrics or the effects of the distance metric on variograms.
2.2. Validity of the covariance matrix
A currently unresolved problem with using a landscape-based distance metric for kriging is assurin g
the validity of the covariance matrix (Rathbun, 1998). There is no guarantee that the covariance
function, C(x), for a given combination of variogram model and n on-Euclidean distance metric will be
non-negative definite. That is:
X
m
i¼1
X
m
j¼1
a
i
a
j
Cðs
i
s
j
Þ0
608
O. P. JENSEN, M. C. CHRISTMAN AND T. J. MILLER
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
where s
i
and a
i
represents all finite collections of spatial location fs
i
: i ¼ 1; ...; mg and real numbers
fa
i
: i ¼ 1; ...; mg (Cressie, 1993). While criteria for consistently valid combinations of variogram
model and distance metric are yet to be determined, candidate covariance functions can be tested and
rejected if they fail to meet the non-negative definiteness criterion. We note that although all of the
covariance matrices in this analysis met this criterion, there is no guarantee that this would hold true
for the set of all possible sample locations, or for other applications. Importantly, the variograms,
spatial autocorrelation statistics, and deterministic interpolation methods are not affected by this
problem.
Krivoruchko and Gribov (2002) suggest a moving average approach to estimating the covariance
model that is not subject to the same criterion of non-negative definiteness, and Løland and Høst
(2003) use multidimensional scaling to create a Euclidean appro ximation of the water distance. The
latter approach remaps sample locations into a new Euclidean space with the result that spatial
covariance models based on distances in the new Euclidean space are guara nteed to be valid for most
common variogram forms. While computationally efficient, the Løland and Høst (2003) approach
represents an approximation of the water distance and the effect of this approximation on variogram
model fitting and kriging prediction accuracy has not been examined.
3. APPLICATION
We tested our landscape-based approach using data from the winter dredge survey (WDS) of blue crab
in Che sapeake Bay. The survey is conducted yearly by the Maryland Department of Natural Resources
and the Virginia Institute of Marine Science. These data have been used to quantify crab abundance
(Zhang and Ault, 1995), shery exploitation (Sharov et al., 2003), and crab distribution (Jensen and
Miller, 2005) in Chesapeake Bay.
Like many estuaries, the Chesapeake Bay has several tributaries separated by long, narrow
peninsulas of land that present a barr ier to the distribution of many aquatic variables at a scale that
makes them potentially influential for baywide modeling efforts. The tributaries differ widely in the
land-use characterist ics of their watersheds with some, such as the Potomac River, draining large
urban areas, and others, such as the Susquehanna River and many eastern shore tributaries, draining
primarily agricultural land. Thus, sample points in adjacent Chesapeake Bay tributaries, although
quite close in Euclidean distance, can differ substantially in their chemical and biological character-
istics (Dauer et al., 2000).
Preliminary variogram analysis using Euclidean distance showed that blue crab catches exhibit
distinct spatial autocorrelation at a range (i.e., 24–55 km) greater than the Euclidean distance
separating some sample points in adjacent tributaries. This finding indicates that Euclidean dis-
tance-based kriging techniques may rely on samples from adjacent tributaries, and that a landscape-
based approach may increase prediction accuracy.
3.1. Data
WDS data from 1990 to 2002 were analyzed individually by year. Full details of the survey design and
application are provided in Vølstad et al. (2000) and Sharov et al. (2003). Briefly, the survey was
conducted during the winter dormant period (December–April) and consisted of a 1-min tow of a
1.83 m wide crab dredge at each station. Stations were chosen randomly each year within geographic
strata. From 1993present, 1255–1599 stations were sampled annually within three strata. During
LANDSCAPE-BASED GEOSTATISTICS 609
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
the period 1990–1992, there were more strata and generally fewer (867–1395) samples. Figure 1
shows a typical distribution of sample locations and illustrates the shoreline complexity of the
Chesapeake Bay and its tributaries.
Depletion experiments (Zhang et al., 1993; Vølstad et al., 2000) were conducted yearly to
determine catchability coefficients that could be used to transform survey catches into estimates of
absolute abundance based on the fraction of blue crabs caught in a single tow. The variable studied
Potomac
River
Tangier
Sound
Patuxent
River
Susquehanna River
Figure 1. Sample locations for the 1998 (i.e., winter 1997–1998) Winter Dredge survey of blue crab in Chesapeake Bay. The
rectangle represents the region used for the Tangier Sound subset
610 O. P. JENSEN, M. C. CHRISTMAN AND T. J. MILLER
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
here is the density of blue crabs per 1000 m
2
, calculated by dividing the absolute abundance estimate
by the dredge area swept.
Sample coordinates were based on the starting location of each tow, and the tow distance was
calculated from the start and end coordinates determined by Loran-C (early years) or a differential
global positioning system (DGPS). Tows shorter than 15 m and longer than 500 m (1.4 per cent of
the total data) were not used in this analysis. All coordinates were projected to Universal Transverse
Mercator (UTM) zone 18 before analysis. Annual density estimates were detrended to meet
the geostatistical assumption of stationarity. Variogram analysis, kriging, and cross-validation were
conducted on the residuals. For detrending, a second-order two-dimensional polynomial of spatial
trend with interactions was fit to the data for each year. The mode l was simplified using backward
elimination with a significance level of a ¼ 0.01. This relatively stringent significance level cut-off was
used to avoid overfitting the trend.
3.2. Incorporation of landscape-based distance into geostatistical algorithms
The detrended residuals were used to calculate empirical variograms for both Euclidean and
landscape-based distance metrics. Euclidean distances were calculated using standard algorithms
programmed within Matlab (The Mathworks, Cambridge, MA). Intersample LCP distances for every
pair of sample locations were calculated using square cells (250 m on a side) and a cost-distance
algorithm programmed in the Visual Basic macro language within ArcView v8.3 (ESRI, Redlands,
CA) where LCP distance was calculated along the path that minimized the distance function:
X
j
C
ij
X
j

where C
ij
is the cost coefficient of the ith habitat type in the jth cell (here C
ij
¼ 1 for cells in the water
and is effectively infinite for cells on land) and X
j
is the distance across the jth individual cell. X
j
is
equal to the cell width for cells that are crossed in the North–South or East–West direction or
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð2 width
2
Þ
q
for cells that are crossed diagonally.
Robust variograms were calculated accor ding to Cressie (1993), based on distances from the
Euclidean and landscape-based distance matrices. A 250 m bin size was used to calculate the empirical
variogram to a distance of 40 km. Exponential and Gaussian variogram models were fit to the
empirical variograms using non-linear least squares. The best fitting variogram model, that is,
the model with the lowest mean squared error, was used for kriging and variogram comparison.
The estimated variogram parameters for the Euclidean and landscape-based distance metrics were
compared using signed rank tests where each year represents one observation.
Following variogram selection, kriging was conducted using ordinary kriging algorithms (Journel
and Huijbregts, 1978) modified to use Euclidean and landscape-based distances from a user-defined
distance matrix and a neighborhood of the 10 nearest points. Matlab functions used in this analysis
and a dynamic link library for calculating LCP distances in ArcView v8.3 are available at:
http://hjort.cbl.umces.edu/crabs/LCPkrige.html
Blue crab density at the center of each 1 km grid cell was predicted by adding the kriged prediction
to the trend. Prediction accuracy for both Euclidean and landscape-based methods was assessed using
the predi ction error sum of squares (PRESS) statistic divided by n 1 sample points to allow
comparison across years. The PRESS statistic is a cross-validation measure calculated by leaving one
observation out of the data set and using the remai ning points to predict the value at that site (Draper
LANDSCAPE-BASED GEOSTATISTICS 611
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
and Smith 1981). The PRESS statistic is given by the sum of the squared differences between the
predicted and observed values. Predicted abundances were then mapped for visual comparison.
Differences between the two distance metrics are likely to be accentuated as distances between
neighboring sample points increase (see condition 1 above). Within a given landscape, increased
distance between sample points increases the likelihood that a barrier will intervene at some point
along the straight line connecting any two points. Increasing the average distanc e between pairs of
sample points without changing the underlying spatial structure was achi eved by taking a random
subsample of the data. The potential impact of increased intersample distance was examined by taking
50 random subsamples of 200 sample points drawn from the entire study area and calculating the
average difference in PRESS.
Similarly, differences between the Euclidean and LCP-based kriging predictions are likely to be
greater in regions of the Bay where more barriers are present (see condition 1 above). In the mainstem
of the Bay, few barriers exist, and the Euclidean and LCP distances are likely to be similar. However,
between adjacent tributaries and in areas of the Bay with islands and complex shorelines, the
Euclidean and LCP distances, and consequently the kriging predictions, are more likely to show
differences. To examine these potential regional differences, predictions were made and the PRESS
was compared for a subset of the data from Tangier Sound (see Figure 1.), a region with many islands
and inlets. This region typically contained from 104 to 259 sample sites per year. A random subsample
analysis was also conducted for the Tangier Sound region. For each year of the survey, 50 random
subsamples of 50 points each were drawn from the Tangier Sound region and the PRESS was
compared as described above.
4. RESULTS
Spatial trends in blue crab abundance in Chesape ake Bay were found in all years. In most cases, the
underlying trend in crab density (D) was described by a model of the form
D ¼
0
þ
1
E þ
2
N þ
12
E N þ "
where E refers to the Easting value and N the Northing value. In two cases, additional terms were
found to be significant: the trend model for 1998 included an E
2
term also, and that for 2000 included
an E
2
and an N
2
term.
Gaussian variogram models were chosen for all years, except 1990 and 1992, for which an
exponential model provided a better fit (Table 1). In several cases, the exponential model provided a
marginally better fit, but was rejected because it resulted in unrealistic variogram parameters (e.g.,
negative nugget or unrealistically large range). In all years, choice of variogram model was the same
for both distance metrics.
Comparison of the variograms calculated under a Euclidean distance metric with those from the
LCP distance metric revealed systematic differences in the variogram parameter estimates. Inter-
sample distances calculated using the LCP algorithm were on average 11–17 km (14 per cent–
23 per cent) greater than the equivalent Euclidean distances (Table 2). The estimated variogram
parameters, nugget, sill, and range, were smaller on average for the LCP distance variograms (Table 1,
Figure 2). Compared to the Euclidean distance variograms, the LCP distance variograms had a smaller
nugget in 8 out of the 10 years compared, with an average difference of 236 (signed-rank test,
p ¼ 0.049); a smaller sill in 9 out of 10 years, with an average difference of 1038 (signed-rank test,
612
O. P. JENSEN, M. C. CHRISTMAN AND T. J. MILLER
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
Table 1. Summary of variogram model parameters. Numbers in italics denote parameters that were fit by eye
and were not used in variogram comparisons
Year Sample size Distance metric Variogram model Nugget Partial sill Range (km)
1990 863 Euclidean Exponential 18 173 22 455 54
LCP Exponential 16 448 25 042 55
1991 964 Euclidean Gaussian 9736 30 484 55
LCP Gaussian 8000 12 000 30
1992 1392 Euclidean Exponential 792 1408 25
LCP Exponential 763 997 16
1993 1253 Euclidean Gaussian 6963 20 254 50
LCP Gaussian 6000 6000 35
1994 1427 Euclidean Gaussian 7108 885 35
LCP Gaussian 7000 900 30
1995 1598 Euclidean Gaussian 1324 10 165 49
LCP Gaussian 1178 5436 41
1996 1580 Euclidean Gaussian 3877 11 461 34
LCP Gaussian 3444 7453 28
1997 1587 Euclidean Gaussian 2848 6075 29
LCP Gaussian 2860 4446 29
1998 1573 Euclidean Gaussian 1160 1580 33
LCP Gaussian 1195 1222 38
1999 1519 Euclidean Gaussian 581 2042 33
LCP Gaussian 564 1181 27
2000 1511 Euclidean Gaussian 592 1220 24
LCP Gaussian 587 1075 23
2001 1556 Euclidean Gaussian 281 1114 25
LCP Gaussian 263 830 22
2002 1530 Euclidean Gaussian 416 1409 35
LCP Gaussian 377 867 30
Table 2. Baywide. Prediction error sum of squares (PRESS) for kriging predictions based on Euclidean and
lowest-cost path (LCP) distance metrics, the per cent difference in PRESS between the two metrics (positive
numbers indicate greater prediction accuracy for the LCP metric), the average increase in intersample distance
for the LCP metric, and the mean percent difference over 13 years
Year Euclidean PRESS LCP PRESS Percent Average absolute Average percent
( 10
3
)( 10
3
) difference increase in intersample increase in intersample
distance (km) distance
1990 65.64 65.09 0.84 16.84 23.12
1991 61.08 61.53 0.73 12.13 15.27
1992 6.46 6.49 0.47 12.77 16.53
1993 38.00 38.21 0.54 14.60 20.11
1994 29.57 29.48 0.28 16.19 21.10
1995 19.80 19.63 0.87 14.13 18.99
1996 50.00 49.99 0.01 12.87 16.83
1997 16.12 16.19 0.41 11.10 14.52
1998 9.58 9.68 1.04 11.86 15.65
1999 10.23 10.14 0.95 11.87 15.44
2000 5.24 5.23 0.11 11.34 14.50
2001 4.49 4.65 3.46 11.06 14.09
2002 6.10 6.04 0.94 11.68 15.30
mean: 0.20 12.96 17.03
LANDSCAPE-BASED GEOSTATISTICS 613
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
p ¼ 0.049); and a smaller range in 8 out of 10 years, with an average difference of 3.32 km (signed-
rank test, p ¼ 0.049) . The effect of this pattern of differences was to reduce the inter-station variability
at any given distance. Representative variograms are shown for 1996 (Figure 3a), a year of relatively
small (0.01 per cent) difference in prediction accuracy and for 2001 (Figure 3b), the year of greatest
difference (3.46 per cent) in prediction accuracy. The variograms for 2001 were an example of a case
where the exponential variogram provided a somewhat better fit than the Gaussian model, but was
0
10
20
30
40
50
60
0 102030405060
Range (km) - Euclidean distance
e
cnatsidPCL-)mk
(e
g
naR
0
5000
10000
15000
20000
0 5000 10000 15000 20000
Nugget - Euclidean distance
e
c
n
ats
idPCL-te
gguN
0
5000
10000
15000
20000
25000
30000
0 5000 10000 15000 20000 25000 30000
Sill - Euclidean distance
ecnatsid
P
CL-
lliS
a.
b
.
c
.
Figure 2. Comparison of the nugget (a), sill (b), and range (c) parameters from variograms based on Euclidean and lowest cost
path (LCP) distance metrics. The black line represents equality
614 O. P. JENSEN, M. C. CHRISTMAN AND T. J. MILLER
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
rejected because it resulted in an unrealistically high estimate of the range. In both years, the estimated
nugget, partial sill, and range were smaller for the LCP distance metric.
Despite this difference in the distances and in the variogram parameter estimates, the PRESS
statistic comparison showed little difference in prediction accuracy between the two distance metrics
(Table 2). The LCP algorithm did not always result in a lower PRESS than the Euclidean approach. Of
the 13 years of survey data tested, only 7 showed greater prediction accuracy when LCP distance was
used. Absolute difference in PRESS ranged from 0.01 per cent to 3.46 per cent with a mean increase in
PRESS of 0.2 per cent when LCP distance was used.
Results of the PRESS comparisons were similar for the Tangier Sound subset and both random
subsamples, scenarios in which we expected the LCP algorithm to be at an advantage (Table 3). The
direction of the difference in PRESS was not consistent. Seven out of 13 years for Tangier Sound had
greater prediction accuracy when LCP distance was used. In Tangier Sound, the difference in PRESS
ranged from 0.15 per cent to 8.45 per cent with a mean increase in PRESS of 0.94 per cent when LCP
distance is used. When smaller randomly selected subsets of the data were analyzed, 4 out of 13 years
a.
b.
0
400
800
1200
1600
0 10203040
Lag Distance (km)
ecnairavimeS
Euclidean-Empirical
Euclidean-Gaussian
LCP-Empirical
LCP-Gaussian
0
4000
8000
12000
16000
010203040
Lag Distance (km)
ecnairavimeS
Euclidean-Empirical
Euclidean-Gaussian
LCP-Empirical
LCP-Gaussian
Figure 3. Euclidean and LCP distance-based variograms for 1996 (a) and 2001 (b)
LANDSCAPE-BASED GEOSTATISTICS
615
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
for both the baywide and Tangier Sound random subsamples had greater prediction accuracy when
LCP distance was used. For the baywide random subsa mples, the difference in PRESS ranged from
0.07 per cent to 1.47 per cent with a mean increase in PRESS of 0.25 per cent when LCP distance is
used. Similar ly, the Tangier Sound random subsample showed an average increase in PRESS of
1.35 per cent for the LCP metric.
Consistent with the small differences in PRESS, maps of predicted blue crab density show broadly
similar patterns. Baywide patterns of blue crab distribution appear similar between the two methods in
both 1996 (Figure 4) and 2001 (Figure 5). Small-scale differences are apparent, however, especially in
the unsampled upper reaches of some tributaries. In the upper Potomac River, for example, the
Euclidean-based map for 1996 (Figure 4a) shows high-predicted density because the nearest samples
(by Euclidean distance) are high values in the adjacent Patuxent River. The LCP-based maps for the
same year (Figure 4b) predict low abundance in the upper Potomac River based on the nearest samples
downstream.
5. DISCUSSION
Differences in prediction accuracy were expected to result from the impact of the landscape-based
distance metric at two distinct stages of the geostatistical modeling process: variogram estimation and
kriging. Use of an LCP distance metric changed estimates of the underlying spatial structure as
summarized in the variogram. Estimates of all three variogram parameter estimates were significantly
lower under the landscape-based distance metric, indicating lower variation and a shorter estimated
distance of spatial autocorrelation (range). In our kriging analysis, predictions at a point were based on
a weighted sum of the 10 nearest neighboring points. The landscape-based distance metric also
changed the sample points (and their weights) employed in kriging, reducing the importance of points
Table 3. Tangier Sound and Baywide random subsample. PRESS for kriging predictions based on Euclidean
and LCP distance metrics, the percent difference in PRESS between the two metrics (positive numbers indicate
greater prediction accuracy for the LCP metric), and the mean percent difference over 13 years. Only the mean
percent difference in PRESS is given for the random subsamples
Year Tangier Euclidean Tangier LCP Tangier percent Baywide random Tangier random
PRESS (10
3
) PRESS (10
3
) difference subsample percent subsample percent
difference difference
1990 31.60 31.28 1.02 0.36 0.30
1991 5.78 5.91 2.22 0.55 0.93
1992 1.30 1.31 0.92 0.74 0.04
1993 0.30 0.33 8.45 0.84 9.18
1994 10.93 10.89 0.38 0.67 0.34
1995 3.55 3.41 3.98 0.05 3.62
1996 5.38 5.33 0.87 1.29 0.42
1997 1.72 1.70 0.70 0.07 0.78
1998 1.29 1.29 0.15 0.86 0.56
1999 0.51 0.51 1.15 1.47 0.76
2000 1.22 1.23 1.15 0.86 1.31
2001 0.80 0.86 7.29 0.46 2.62
2002 0.44 0.44 0.41 0.58 0.83
mean: 0.94 0.25 1.35
616 O. P. JENSEN, M. C. CHRISTMAN AND T. J. MILLER
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
separated by barriers from the predict ion site. We note, that if all observation points were used in
prediction, only the weights would have changed. Differences in variogram estimates and kriging
neighbors and their associated weights, however, did not yield a consistent effect on the accuracy of
the kriging predictions. No consistent improvements in kriging accuracy were seen even when the
analysis was restricted to areas of the Bay with many barriers (the Tangier Sound analysis) or when
distances among points were increased (the random subsample analyses).
Given the impact of the alternative distance metric on the variogram, why did we not see similar
impacts on prediction accuracy and the prediction maps? Although many factors interact to influence
prediction accuracy, the unique shape of Chesapeake Bay may have played a role in reducing the
increase in accuracy that was expected from the LCP distance metric. Many of the Bay tributaries,
particularly on the west side, run parallel to one another. Because of this parallel orientation, the
nearest point in an adjacent tributary is often at approximately the sam e distance from the tributary
mouth (Figure 6). Such a point, while in a different tributary, may well show similar blue crab density
because of its similar location relative to the tributary mouth. In fact, distance from the Bay mouth is a
useful predictor of female blue crab density (Jensen et al., 2005) because it is correlated with many
biologically relevant variables. In this case, predictions using points in adjacent tributaries may
actually be more accurate.
Figure 4. Map of predicted 1996 blue crab density (individuals per 1000 m
2
based on a Euclidean distance metric (a) and an
LCP distance metric (b). Note: negative values are a result of the two-stage (detrending then kriging residuals) approach
LANDSCAPE-BASED GEOSTATISTICS
617
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
Chemical and biological differences among adjacent tributariesfactors which might favor a
landscape-based distance metricare perhaps less important in the Chesapeake Bay where similar
tributaries tend to be clustered geographically. For example, the adjacent Potomac and Patuxent Rivers
on the western shore both drain large urban areas (Washington DC and the Baltimore–Washington
corridor). The watersheds of most eastern shore tributaries all contain flat, rich, agricultural land with
relatively little urban development. Such similarities among adjacent tributaries may also influence the
relative performance of different distance metrics.
Inter-annual differences were apparent in the relative prediction accuracy of the Euclidean and LCP
metrics. Two geographic areas (the entire Bay and Tangier Sound) and random subsets of each area
were analyzed, and in no case were the results consistent among all 13 years of data. Neither were the
results consistent within a year. For example, in 1990, the LCP metric showed a slight advantage over
the Euclidean metric for the Baywide data and the Tangier Sound subset, but a slight disadvantage for
both of the random subsamples. Interannual differences in blue crab distribution patterns have been
observed and the population has experienced a substantial decline over the study period (Jensen and
Miller, 2005). Nevertheless, the small differences in prediction accuracy and the inconsistency both
among and within years offer no guidelines regarding the conditions under which an LCP metric
would be preferred for kriging.
Figure 5. Map of predicted 2001 blue crab density (individuals per 1000 m
2
based on a Euclidean distance metric (a) and an
LCP distance metric (b). Note: negative values are a result of the two-stage (detrending then kriging residuals) approach
618 O. P. JENSEN, M. C. CHRISTMAN AND T. J. MILLER
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
We are not the first to attempt landscape distance-based prediction in estuaries, and the results of
other approaches to kriging with a landscape-based distance metric have been equally equivocal. Both
Little et al. (1997) and Rathbun (1998) found improvements in the prediction of some variables but not
others. Little et al. (1997) found improvements in prediction accuracy (on the order of 10–30 per cent
reduction in PRESS) for only four out of eight variables when they applied a linear network-based
distance metric. For the other four variables, use of the network-based distance metric actually
increased the PRESS by 5–10 per cent. Rathbun (1998) found slight improvements in cross-validation
accuracy using a water distance metric for predicting dissolved oxygen but slightly worse accuracy
when predicting salinity. Although variogram parameter estimates differed between the two distance
metrics in the Rathbun (1998) study with the water distance metric resulting in higher variance and a
longer range, no systemat ic comparisons were possible in that study since only one sample was
analyzed.
Two recent studies in stream systems (Torgersen et al., 2004; Gardner et al., 2003) apply
geostatistical tools based on the distance between sam ple sites along a stream network. Torgersen
Figure 6. Map of LCP distance (km) from the Bay mouth (represented by the black circle)
LANDSCAPE-BASED GEOSTATISTICS
619
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
et al. (2004) used a network-based distance metric to quantify spatial structure in cutthroat trout
abundance in an Oregon stream system. Although the distance metric they used provided clear
variogram patterns, no explicit comparison was made with a Euclidean distance metric. Gardner et al.
found improvements (lower prediction standard errors and predictions that better met expectations) in
the prediction of stream temperature when a network-based metric was used, but did not report cross-
validation statistics. Variogram parameter estimates were also found to change in that study with the
network-based metric resulting in smaller nugget but longer range.
The effect of alternative distance metrics on variogram parameter estimates is difficult to predict
since opposing influences may interact. For example, increasing the distance between points is likely to
result in a longer estimated range, as seen in the Rathbun (1998) and Gardner et al. (2003) studies. Since
a landscape-based metric reduces the influence of points separated by a barrier, which are expected to
differ more than their Euclidean separation would suggest, it also seems likely to reduce the sill
parameter (as seen in this study), a measure of overall variability. However, when variograms do not
show a clear inflection point at the sill, the range and the sill parameters are highly correlated; that is, a
variogram model with higher or lower values of both the sill and range may also provide an adequate fit
to the data. This correlation makes the overall effect of the distance metric unpredictable since increases
in the range of spatial autocorrelation may be masked by the effect of a decrease in the sill.
While we present the simple binary (passable or barrier) case in our example, the LCP approach can
incorporate varying degrees of impedance to the continuity of the process or population under study.
For example, one type of habitat may repr esent an insurmountable barrier while another may only
slow the spread of the process. Parameters used to define the degree of impedance or ‘cost’ of different
landscape types could come from many sources depending on the type of variable studied. For mobile
organisms, costs could be based on studies of animal movement, although the extent to which different
habitat types present a barrier to movement may not be static (Thomas et al., 2001). For temporary
barriers the cost might simply be the inverse of the fraction of time that the barrier is passable. For
spatial modeling of chemical contaminants, cost parameters might come from laboratory experiments
of diffusion and transport in different media.
Landscape ecologists have long recognized that Euclidean distance is rarely the most appropriate
metric when considering the ecological relatedness among points in a landscape (Forman and Godron,
1986). When flows betwee n points are of interest ‘time-distance’, that is, the quickest route, may be
preferable. However, time-distance require s detailed knowledge of how an organism or contaminant
disperses through various habitat types. Time-distance has an added complication in that it may be
asymmetric, where the time-distance from A to B is not necessarily the same as that from B to A. This
is likely to be the case in stream systems, hilly terrain, and other environments that impose
directionality on movement. Nevertheless, the idea that the distance metric should reflect the relative
ease/speed of moving along a particular path remains valid.
The LCP approach to variogram estimation and kriging presented here represents an easily
incorporated modification to commonly used geostatistical techniques. The benefits of using this
approach depend on the study environment (e.g., scale and extent of barriers), the spatial distribution
of the variable being studied, and the study objectives (e.g., variogram estimation, mapping, or
quantitative prediction). Although the expected increases in prediction accuracy did not materialize in
this study, the relatively unique configuration of parallel tributaries within the Bay may have been
partly responsible. This approach, however, is a general one and can be applied to other locations or
data sets for which greater differences in accuracy may be found. The potential also exists for the LCP
distance metric to be incorporated into other types of spatial analyses such as home range estimation,
habitat modeling, and determin istic interpolation methods.
620
O. P. JENSEN, M. C. CHRISTMAN AND T. J. MILLER
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621
ACKNOWLEDGEMENTS
The authors than k Glenn Davis for providing winte r dredge survey data and Glenn Moglen and Ken
Buja for assistance with GIS programming. This work was supported by the University of Maryland
Sea Grant, grant number (R/F-89). This is contribution number 3886 from the University of Maryland
Center for Environmental Science Chesapeake Biological Laboratory.
REFERENCES
Brown JH, Lomolino MV. 1998. Biogeography. Sinauer Associates: Sunderland; 624.
Cressie N. 1993. Statistics for Spatial Data. John Wiley & Sons, Inc.: New York; 900.
Dauer DM, Ranasinghe JA, Weisberg SB. 2000. Relationships between benthic community condition, water quality, sediment
quality, nutrient loads, and land use patterns in Chesapeake Bay. Estuaries 23: 80–96.
Draper NR, Smith H. 1981. Applied Regression Analysis. John Wiley & Sons, Inc.: New York; 709.
Forman RTT, Godron M. 1986. Landscape Ecology. John Wiley & Sons, Inc.: New York; 619.
Ganio LM, Torgersen CE, Gresswell RE. 2005. A geostatistical approach for describing spatial pattern in stream networks.
Frontiers in Ecology and the Environment 3: 138–144.
Gardner B, Sullivan PJ, Lembo AJ. 2003. Predicting stream temperatures: Geostatistical model comparison using alternative
distance metrics. Canadian Journal of Fisheries and Aquatic Sciences 60: 344–351.
Gilpin ME, Hanski IA. 1991. Metapopulation Dynamics: Empirical and Theoretical Investigations. Academic Press: San Diego,
CA; 336.
Grinnell J. 1914. Barriers to distribution as regards birds and mammals. American Naturalist 48: 248–254.
Iacozza J, Barber DG. 1999. An examination of the distribution of snow on sea-ice. Atmosphere-Ocean 37: 21–51.
Jensen OP, Miller TJ. 2005. Geostatistical analysis of blue crab (Callinectes sapidus) abundance and winter distribution patterns
in Chesapeake Bay. Transactions of the American Fisheries Society (in press).
Jensen OP, Seppelt R, Miller TJ, Bauer LJ. 2005. Winter distribution of blue crab (Callinectes sapidus) in Chesapeake Bay:
application and cross-validation of a two-stage generalized additive model (GAM). Marine Ecology Progress Series 299:
239–255.
Journel AG, Huijbregts C. 1978. Mining Geostatistics. Academic Press: London; 600.
Krivoruchko K, Gribov A. 2002. Geostatistical Interpolation in the Presence of Barriers. GeoENV IVGeostatistics for
Environmental Applications. Kluwer Academic Publishers: Barcelona, Spain.
Little L, Edwards D, Porter D. 1997. Kriging in estuaries: as the crow flies, or as the fish swims? Journal of Experimental Marine
Biology and Ecology 213: 1–11.
Løland A, Høst G. 2003. Spatial covariance modelling in a complex coastal domain by multidimensional scaling. Environmetrics
14: 307–321.
MacArthur RH, Wilson EO. 1967. The theory of island biogeography. Princeton University Press: Princeton, NJ; 203.
Pringle CM, Triska FJ. 1991. Effects of geothermal groundwater on nutrient dynamics of a lowland Costa Rican stream. Ecology
72: 951–965.
Rathbun S. 1998. Spatial modeling in irregularly shaped regions: kriging estuaries. Environmetrics 9: 109–129.
Sampson PD, Guttorp P. 1992. Nonparametric-estimation of nonstationary spatial covariance structure. Journal of the American
Statistical Association 87: 108–119.
Sharov A, Davis G, Davis B, Lipcius R, Montane M. 2003. Estimation of abundance and exploitation rate of blue crab
(Callinectes sapidus) in Chesapeake Bay. Bulletin of Marine Science 72: 543–565.
Thomas CD, Bodsworth EJ, Wilson RJ, Simmons AD, Davies ZG, Musche M, Conradt L. 2001. Ecological and evolutionary
processes at expanding range margins. Nature 411: 577–581.
Torgersen CE, Gresswell RE, Bateman DS. 2004. Pattern detection in linear networks: Quantifying spatial variability in fish
distribution. In Gis/Spatial Analyses in Fishery and Aquatic Sciences, Nishida T, Kailoa PJ, Hollingsworth CE (eds). Fishery-
Aquatic GIS Research Group: Saitama, Japan; 405–420.
Vølstad J, Sharov A, Davis G, Davis B. 2000. A method for estimating dredge catching efficiency for blue crabs. Callinectes
sapidus, in Chesapeake Bay. Fishery Bulletin 98: 410–420.
Zhang CI, Ault JS. 1995. Abundance estimation of the Chesapeake Bay blue crab, Callinectes sapidus. Bulletin of the Korean
Fisheries Society 28: 708–719.
Zhang CI, Ault JS, Endo S. 1993. Estimation of dredge sampling efficiency for blue crabs in Chesapeake Bay. Bulletin of the
Korean Fisheries Society 26: 369–379.
LANDSCAPE-BASED GEOSTATISTICS
621
Copyright # 2006 John Wiley & Sons, Ltd. Environmetrics 2006; 17: 605–621