Bayesian Models for Ecologists
Bayesian Model Selection
June 10, 2022

Motivation

A wide range of philosophical opinions exist about the use of model selection (e.g., Burnham and Anderson 2002, Gelman et al. 1995, Gelman et al. 2004, Gelman et al. 2013, Hobbs et al. 2012, Ver Hoef 2015, Gelman and Rubin 1995, Gelman and Shalizi 2013, Hooten and Hobbs 2015). Wildlife biology has embraced the use of Akaike Information Criterion (AIC) to compare models, in part because it can be used with any likelihood-based model and has connections with the predictive ability of the model. AIC is not Bayesian (nor is BIC). Two Bayesian alternatives to AIC are DIC and WAIC. They are only subtlely different in their forumalation and often lead to the same conclusions when comparing models. DIC is a good place to start learning about Bayesian model comparison, so that is what we focus on in this lab. For details on the suite of other Bayesian model comparison approaches, see Hooten and Hobbs (2015).



A Note About: Matrix Specification of Linear Models

Specifying linear models in matrix notation is compact and convenient relative to writing out the full model as a scalar equation when there are several predictor variables. Consider the the typical, deterministic linear model:

\[ \mu_{i}=\beta_{0}+\beta_{1}x_{1i}+\beta_{2}x_{2i}.\]

It can be written in matrix form as

\[\pmb{\mu}=\mathbf{X}\pmb{\beta},\]

where \(\pmb{\beta}\) is a column vector, \((\beta_0, \beta_1, \beta_2)'\) with length = number of model coefficients, and \(\mathbf{X}\) is a design matrix with the number of rows equal to the number of data points and the number of columns equal to the number of predictor variables + 1 (so, in this example, 3). Column one of \(\mathbf{X}\) usually contains all 1s. Column two of \(\mathbf{X}\) contains the covariate values of predictor variable 1, column three, predictor variable 2, and so on. The response variables are represented as \(\mathbf(y)\), a vector of dimension \(n\times 1\). If you are unfamiliar with matrix multiplication, ask one of the lab instructors to explain how this works.

Matrix notation is handy because we can use a single JAGS file to specify several different models using the code below.

 z <- X %*% beta # the regression model in matrix form, returns a vector of length n
    for(i in 1:n)   { 
    lambda[i] <- exp(z[i])
    y[i] ~ dpois(lambda[i])
}

Note that %*% is the symbol for matrix multiplication in JAGS and R.

The reason this is so handy is the R function, model.matrix(), for creating a design matrix, i.e., the \(\mathbf(X)\). Consider the following:

X = model.matrix(~as.numeric(scale(area)) + as.numeric(scale(temp)), data = bird.sm.df)

This creates a design matrix with 1s in column one and standardized data for area and temperature in columns two and three using the data, bird.sm.df.



Problem

We seek to model the response, bird species richness in 49 US states, using a set of predictor variables: area, temperature, and precipitation. Fit the following models and compare them using Deviance Information criterion (DIC) for each. Use a prior mean of 0 and variance of 100 for each regression coefficient (assuming independence}.

  1. Model 1: Intercept and area as covariate.

  2. Model 2: Intercept and area and temperature as covariates.

The matrix parameterization allows us to fit the models using the same code, but different design matrices input as covariate data.



R libraries needed for this lab

You need to load the following libraries. Set the seed to 10 to compare your answers to ours. The data for this problem is located in the LynxFamilies data frame of the BayesNSF package.

library(BayesNSF)
library(rjags)
set.seed(10)



Some preliminaries:

bird.df <- RichnessBirds

####
####  Remove Outliers 
####

idx.outlier=(1:51)[(bird.df$species==min(bird.df$species) | bird.df$area==max(bird.df$area))]
bird.sm.df=bird.df[-idx.outlier,]

####  Setup Data to Fit Model 
####  Below will make the full design matrix from a data frame.  Automatically makes the first column = 1 to allow for intercept.

X.1 = model.matrix(~as.numeric(scale(area)), data = bird.sm.df)
X.2 = model.matrix(~as.numeric(scale(area)) + as.numeric(scale(temp)), data = bird.sm.df)
y = bird.sm.df$species  
M1.list <- list(y=y,X=as.matrix(X.1),n=length(y),p=dim(X.1)[2])
M2.list <- list(y=y,X=as.matrix(X.2),n=length(y),p=dim(X.2)[2])
set.seed(7)
  1. Write an expression for the posterior and joint distribution for the two models with pencil and paper.


Model 1: Intercept and one covariate

\[\begin{align*} [\pmb{\beta} \mid \pmb{y}] \propto & \prod_{i=1}^{49} \text{Poisson} (y_{i} \mid g(\pmb{\beta},x_i)) \text{ normal} (\beta_{0}\mid0,1000) \text{ normal} (\beta_{1}\mid0,1000)\\ g(\pmb{\beta},x_i)=&e^{\beta_0+\beta_1 x_i}\\ \end{align*}\]

Model 2 : Intercept and two covariates \[\begin{align*} [\pmb{\beta} \mid \pmb{y}] \propto & \prod_{i=1}^{49} \text{Poisson} (y_{i} \mid g(\pmb{\beta},\mathbf{x}_i)) \text{ normal} (\beta_{0}\mid0,1000) \text{ normal} (\beta_{1}\mid0,1000) \text{ normal} (\beta_{2}\mid0,1000)\\ g(\pmb{\beta},\mathbf{x}_i)) =&e^{\mathbf{x}'_i\pmb{\beta}}\\ \end{align*}\]


  1. Write the JAGS model statement and fit the model using both sets of covariates.


Below we show how to get DIC for model 1 and 2. Try to modify the code to fit the models containing other covariate combinations and compute DIC for those as well if time permits.

pois.mod.jags <- "model{
  z <- X %*% beta # returns a vector of length n
  for(i in 1:n)   { 
    y[i] ~ dpois(lambda[i])
    lambda[i] <- exp(z[i])
  }
  
  # PRIORS
  # p = number of coefficients, including intercept
  for(j in 1:p) {  
    beta[j] ~ dnorm(0, 0.01)
  }
}"

pois.mod <- textConnection(pois.mod.jags)
####
####  Fit Model 1 Using JAGS 
####
set.seed(7)
#get DIC module for calculating deviance and DIC directly
load.module("dic")
## module dic loaded
inits.1=list(beta=c(mean(log(y)),rep(0,dim(X.1)[2]-1)))
M1.model <- jags.model(pois.mod,data=M1.list,inits=inits.1,n.chains=2,n.adapt=1000)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 47
##    Unobserved stochastic nodes: 2
##    Total graph size: 244
## 
## Initializing model
update(M1.model, n.iter=3000)
M1.out <- coda.samples(M1.model,variable.names=c("beta"),n.iter=8000)
####
####  Fit Model 2 Using JAGS 
####
pois.mod <- textConnection(pois.mod.jags)
inits.2=list(beta=c(mean(log(y)),rep(0,dim(X.2)[2]-1)))
M2.model <- jags.model(pois.mod,data=M2.list,inits=inits.2,n.chains=2,n.adapt=1000)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 47
##    Unobserved stochastic nodes: 3
##    Total graph size: 292
## 
## Initializing model
update(M2.model, n.iter=3000)
M2.out <- coda.samples(M2.model,variable.names=c("beta"),n.iter=8000)


  1. Report DIC for both models.


DIC.1=dic.samples(M1.model, n.iter=8000)
DIC.1
## Mean deviance:  542.8 
## penalty 2.01 
## Penalized deviance: 544.8
DIC.2=dic.samples(M2.model, n.iter=8000)
DIC.2
## Mean deviance:  493 
## penalty 2.958 
## Penalized deviance: 495.9

Literature cited

Burnham K. and D. Anderson. 2002. Model Selection and Multimodel Inference. Springer.

Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 1995, 2004. Bayesian data analysis. Chapman and Hall / CRC, London.

A. Gelman, J. B. Carlin, H. S. Stern, D. Dunson, A. Vehhtari, and D. B. Rubin. 2013. Bayesian data analysis. Chapman and Hall / CRC, London, UK, 2013.

Hobbs, N. T., H. Andren, J. Persson, M. Aronsson, and G. Chapron. 2012. Native predators reduce harvest of reindeer by Sami pastoralists. Ecological Applications 22: 1640-1654.

Hoef, J. M. V., and P. L. Boveng. 2015. Iterating on a single model is a viable alternative to multimodel inference. Journal of Wildlife Management 79: 719-729.

Hooten, M.B. and N.T. Hobbs. (2015). A guide to Bayesian model selection for ecologists. Ecological Monographs, 85: 3-28.