Clinch Dace Occupancy Modeling Lab

June 14, 2023

R libraries and data needed for this lab

You need to load the following libraries and data. Set the seed to 10 to compare your answers to ours.

library(rjags)
library(MCMCvis)
library(HDInterval)
# library(BayesNSF)
load(file="SwissBirds.rda") # for motivating example
load(file='ClinchDace.rda') # for lab problems
set.seed(10)

Motivating example

Modeling presence or absence is a classic problem involving mixture models, specifically random variables that are zero-inflated. Extra zeros are encountered when we model presence or absence because zeros arise from two conditions: truly absent individuals and individuals present but undetected. This means we need a model for the process that controls occupancy, the true state, and model of the data that accounts for detection. This is often our starting point in Bayesian analysis – there is a true, unobserved state we seek to understand using a model of a process. We take imperfect observations on that state and must correct them using a model of the data.

A fundamental question in landscape ecology seeks to understand how landscape structure shapes variation in habitat use by species. We will demonstrate using data from the Swiss Survey of Common Breeding Birds, courtesy of Royle and Dorazio (2008), to model habitat occupancy by a resident bird in the Swiss Alps, the willow tit (Parus montanus). The data come from annual surveys of one km2 quadrats distributed across Switzerland. Surveys are conducted during the breeding season on three separate days, but some quadrats have missing data so that the number of replicate observations is fewer than three.

During each survey, an observer records every visual or acoustic detection of a breeding species (we do not differentiate between these two types of detection in this example). We assume that the true state (occupied or unoccupied) does not change among sample dates, an assumption known as closure. This assumption is reasonable because we are observing a resident species during the breeding season.

We want to understand the influence of forest cover and elevation on the distribution of the willow tit. The data frame SwissBirds has the number of times a quadrat (quadrat) was searched (numberVisits) and the number of times willow tits were detected (numberDetections). We have covariates on forest canopy cover (forestCover) as well as elevation in meters (elevation) for each quadrat surveyed. We are going to develop a model of the influence of forest cover and elevation on the distribution of willow tits.

First, let’s diagram the network of knowns and unknowns using the following notations: \(\mathbf{y}\) for the observed data, \(p\) for the detection probability, \(\mathbf{z}\) for the true occupancy state at the quadrats, and \(\boldsymbol\psi\) for the occupancy probability at the quadrats.

To infer the influence of covariates on the true occupancy states, we use a generalized linear regression model (GLM) with a logit link function for the Bernoulli variables \(z_i\),

\[ \text{logit}(\psi_i) = \beta_0 + \beta_1*\text{elevation}_i + \beta_2*\text{forestCover}_i, \] and the above diagram can be modified as follows.

The joint posterior distribution of the parameters given the data is \[ [p, \mathbf{z}, \mathbf{\beta}|\mathbf{y}] \propto \prod_{i = 1}^n([y_i|p, z_i]\times[z_i|\mathbf{\beta}])\times[p]\times[\mathbf{\beta}] \] Now, let’s implement the occupancy model from lecture using JAGS. Some hints: 1) You will need to standardize the covariates by subtracting the mean and dividing by the standard deviation for each observation in the elevation and forest cover data. Use the scale function to do this (it will drastically speed convergence). 2) You must give initial values of 1 to all unknown \(z\) states.

y <- SwissBirds$numberDetections
J <- SwissBirds$numberVisits
X <- cbind(1,
           scale(SwissBirds$elevation),
           scale(SwissBirds$forestCover))
n <- dim(X)[1]
pX <- dim(X)[2]

m.jags <-"
  model{
    for(i in 1:n){
      y[i] ~ dbin(z[i]*p, J[i]) 
      z[i] ~ dbern(psi[i])
      logit(psi[i]) <- X[i, 1:pX]%*%beta
    }
    p ~ dbeta(1, 1)
    beta ~ dmnorm(mu.beta, Tau.beta)
}
"

mod<-textConnection(m.jags)

mu.beta <- rep(0, pX)
Sig.beta <- 10*diag(pX)
Tau.beta <- solve(Sig.beta)

m.out <- jags.model(mod, data=list('y'=y, 'J'=J, 'n'=n, 'X'=X, 'pX'=pX, 'mu.beta'=mu.beta, 'Tau.beta'=Tau.beta), inits=list(z=rep(1, n)), n.chains=1, n.adapt=0)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 237
##    Unobserved stochastic nodes: 239
##    Total graph size: 2369
## 
## Initializing model
n.mcmc <- 10000
n.burn <- round(.2*n.mcmc)
update(m.out, n.mcmc)
m.samples <- jags.samples(m.out, c('beta', 'p'), n.mcmc)
## NOTE: Stopping adaptation

Check chains for model convergence. Plot the posterior distributions of the parameters and summarize the posterior means and associated 95% credible intervals. We can infer that both elevation and forest cover had positive effects on occupancy probability, and that the detection probability was fairly high.

p.samp <- m.samples$p[1, , 1]
beta.samp <- m.samples$beta[, , 1]

# plot chains to check convergence visually
par(mar=c(4, 4, 0.1, 0.1))
matplot(t(beta.samp), type="l", lty=1, xlab='', ylab=bquote(beta))

plot(p.samp, type="l", xlab='Iteration', ylab=bquote(p))

# plot posterior distirbutions
hist(p.samp, prob=TRUE, ylab="", xlab="p", main="")

hist(beta.samp[1, ], prob=TRUE, ylab="", xlab=expression(beta[0]), main="")

hist(beta.samp[2, ], prob=TRUE, ylab="", xlab=expression(beta[1]), main="")

hist(beta.samp[3, ], prob=TRUE, ylab="", xlab=expression(beta[2]), main="")

# summarize posterior means and credible intervals
mean(p.samp)
## [1] 0.7846338
quantile(p.samp, c(0.025, 0.975))
##      2.5%     97.5% 
## 0.7235247 0.8386410
apply(beta.samp, 1, mean)
## [1] -1.064071  1.371013  1.243321
apply(beta.samp, 1, function(x) quantile(x, c(0.025, 0.975)))
##             [,1]      [,2]     [,3]
## 2.5%  -1.4812187 0.9472562 0.853609
## 97.5% -0.6891823 1.8500971 1.683429


Problem

The Clinch Dace (Chrosomus saylori) is a headwater fish species of golbal conservation concern with a limited distribution in two counties in southwest Virginia. Clinch Dace occurred in streams with higher substrate embeddedness and catchment forest cover, and additional information is needed regarding habitat requirements of the species to understand responses to future mining and logging activities in the region. Sampling protocols involve backpack electrofishing and minnow trapping. In this problem, we are going to compare capture efficiency between the two protocols, and to investigate relationships between habitat conditions and Clinch Dace occupancy (Moore et al., 2004).

  1. Draw the DAG using the following notations: \(y_i\) for the number of detections at site \(i\), \(p_i\) for the detection probability at site \(i\), \(\mathbf{\alpha}\) for covariate coefficients associated with detection, \(z_i\) for the true occupancy state at site \(i\), and \(\mathbf{\beta}\) for covariate coefficients associated with occupancy.


See whiteboard!



  1. Write the mathematical expression for the joint posterior distribution.


\[ [\mathbf{\alpha}, \mathbf{\beta}, \mathbf{z}|\mathbf{y}] \propto \prod_{i = 1}^N([y_i|p_i, z_i]\times[p_i|\mathbf{\alpha}]\times[z_i|\beta]) \times [\mathbf{\alpha}] \times [\mathbf{\beta}] \]


  1. Implement the JAGS model. Use Gear as a covariate for detection probability (0=minnow trapping, 1=electrofishing), and use Prop75embed and ProportionAllForest as covariates for occupancy pobability.


y <- ClinchDace$y1 + ClinchDace$y2
n <- length(y)
J <- 2
W <- cbind(1, ClinchDace$Gear)
pW <- dim(W)[2]
X <- cbind(1,
           scale(ClinchDace$Prop75embed),
           scale(ClinchDace$ProportionAllForest))
pX <- dim(X)[2]

m.jags <-"
  model{
    for(i in 1:n){
      y[i] ~ dbin(z[i]*p[i], J) 
      logit(p[i]) <- W[i, 1:pW]%*%alpha
      z[i] ~ dbern(psi[i])
      logit(psi[i]) <- X[i, 1:pX]%*%beta
    }
    alpha ~ dmnorm(mu.alpha, Tau.alpha)
    beta ~ dmnorm(mu.beta, Tau.beta)
}
"

mod<-textConnection(m.jags)

mu.alpha <- rep(0, pW)
Sig.alpha <- 10*diag(pW)
Tau.alpha <- solve(Sig.alpha)
mu.beta <- rep(0, pX)
Sig.beta <- 10*diag(pX)
Tau.beta <- solve(Sig.beta)

m.out <- jags.model(mod, data=list('y'=y, 'J'=J, 'n'=n, 'W'=W, 'pW'=pW, 'X'=X, 'pX'=pX, 'mu.alpha'=mu.alpha, 'Tau.alpha'=Tau.alpha, 'mu.beta'=mu.beta, 'Tau.beta'=Tau.beta), inits=list(z=rep(1, n)), n.chains=1, n.adapt=0)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 276
##    Unobserved stochastic nodes: 278
##    Total graph size: 2930
## 
## Initializing model
n.mcmc <- 10000
n.burn <- round(.2*n.mcmc)
update(m.out, n.mcmc)
m.samples <- jags.samples(m.out, c('alpha', 'beta'), n.mcmc)
## NOTE: Stopping adaptation



  1. Summarize the posterior distributions of the parameters. What can you conclude by comparing the efficiencies between the two protocols? What can you conclude about the habitat condition effects on Clinch Dace occupancy?


alpha.samp <- m.samples$alpha[, , 1]
beta.samp <- m.samples$beta[, , 1]

par(mar=c(4, 4, 0.1, 0.1))
matplot(t(alpha.samp), type="l", lty=1, xlab='', ylab=bquote(alpha))

matplot(t(beta.samp), type="l", lty=1, xlab='', ylab=bquote(beta))




References

Royle, J.A., and R.M. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology: The Analysis of Data from Populations, Metapopulations, and Communities. Academic Press, London, United Kingdom.

Moore, M. J., Orth, D. J., & Frimpong, E. A. (2017). Occupancy and detection of clinch dace using two gear types. Journal of Fish and Wildlife Management, 8(2), 530-543.