MixtureModelsLab.knit

Bayesian Models for Ecologists

Mixture Models Lab

June 11, 2024

Motivation

Models that “mix” parameters from more than one distribution provide virtually infinite flexibility in describing how data arise from ecological processes. Such distributions, of course, can be bi- or multimodal, but they can also be used to create distributions that are more “flat” than can be accommodated by the usual parametric toolbox. This exercise provides familiarity with two applications of mixture models: inflating zeros and combining two parametric distributions.

R libraries needed for this lab

You need to load the following libraries. Set the seed to 10 to compare your answers to ours.

library(rjags)
library(MCMCvis)
library(arm)
library(BayesNSF)
set.seed(10)

Zero inflation

Zero inflation is a special case of mixture models. We suppose we have a process that produces 0’s for a discrete random variable with probability \(p\). And, with probability \(1-p\) the process produces a value for the random variable from a distribution \([y|\mathbf\theta]\), where this value also might be 0. Zero inflation is a common problem in ecology.

Data simulation is a terrific way to check your model code because you know the parameter values that gave rise to the simulated data. This knowledge allows to compare the mean of the marginal posterior distributions approximated by your model against the values used to generate the data used to fit the model. Data simulation also forces you to think carefully about your model. Consider the following model for count data:

\[ \begin{align*} y_{i} &\sim\begin{cases} \text{Poisson}(\lambda)\, & \mbox{if }z_{i}=0\\ 0 & \mbox{if }z_{i}=1 \end{cases}\\ z_{i}&\sim\text{Bernoulli}(p) \end{align*} \]

Here \(p\) is the probability of a zero for \(y_i\) from the zero component of the mixture. The full posterior and joint distributions can be written as

\[ \begin{align*} [\lambda,p,\mathbf{z}\mid \mathbf{y}]&\propto\prod_{i=1}^n(\text{Poisson}(y_i \mid \lambda(1-z_i))\text{Bernoulli}(z_i\mid p) \\ & \times \text{gamma}(\lambda \mid 0.001,0.001)\text{uniform}(p \mid 0,1) \end{align*} \]

Simulate 500 data points \(y_1\),…,\(y_{500}\) assuming \(\lambda\)=5.3 and \(p\)=.4. Fit the model in JAGS to see if you can recover the parameters. They will not match exactly. Why not? Conduct posterior predictive checks.

The fit parameters and the generating parameters will be close but not identical. This is because the data are a single realization of a stochastic process. The means of replicated data simulations and associated model fits will be identical given sufficient replications.

p = .4
lambda = 5.3
n = 500

z = rbinom(500, prob = p, size = 1)
y = (1 - z) * rpois(n, lambda)
discrete.histogram(y, xlab = "y", main = "Simulated data")

{ # Extra bracket needed only for R markdown files
sink("ZIP.R")
cat("
model {

  # priors
  lambda ~ dgamma(.001,.001)
  p ~ dunif (0,1)

  # likelihood
  for (i in 1:length(y)){
    z[i] ~ dbern(p)
    y[i] ~ dpois(lambda * (1 - z[i]))
    y.sim[i] ~ dpois(lambda * (1 - z[i]))
  } #end of i

  mean.y = mean(y)
  mean.y.sim = mean(y.sim)
  sd.y = sd(y)
  sd.y.sim = sd(y.sim)

} #end of model
",fill=TRUE)
sink()
 }

data = list(y = y)
jm = jags.model("ZIP.R", data = data, n.chains = 3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 500
##    Unobserved stochastic nodes: 1002
##    Total graph size: 2511
## 
## Initializing model

update(jm, n.iter = 1000)
zc = coda.samples(jm, variable.names = c("p","lambda"), n.iter = 1000)
MCMCsummary(zc)

##             mean         sd      2.5%       50%     97.5% Rhat n.eff
## lambda 5.5648881 0.13705900 5.3037748 5.5643030 5.8390983    1  2803
## p      0.3964418 0.02186924 0.3547454 0.3959534 0.4393369    1  3240

Mixing two distributions

You mixed a scalar 0 with a distribution \(\text{Poisson}(y_i\mid \lambda)\) in the example above. Here, you will learn how to mix two distributions. You saw in lecture how to mix two distributions to represent differences in beak size between sexes of Darwin’s finches. The math for this problem is simple, but there are a couple of tricks for implementing it in JAGS, shown below.

The concentration of metabolizable energy (ME) in plants (k joule / g dry matter) and the biomass of plants ultimately determines how many herbivores can exist in a habitat. Herbaceous plants contain higher concentrations of ME than woody plants because their cell walls contain less lignin. This creates a biomodal distribution of ME in plant communities that include woody and herbaceous plants as illustrated in the histogram below, which depicts the probability density of random samples of plant tissue collected from mountain shrub communities in Colorado.

hist(PlantEnergy$ME, breaks = 20, xlab = "Metabolizable energy (kjoule / g)", main = "", freq = FALSE)

Let’s first suppose that you had never heard of mixture distributions and you decide to model ME with a single distribution. ME cannot be negative so let’s try a gamma distribution. Suppose from prior literature we expect the mean ME to be around 3 and we expect the standard deviation of ME to be around 6.

Estimate the a simple gamma model using JAGS, where the gamma distribution is parameterized in terms of the mean \(\mu\) and \(\sigma\). Put informative priors on \(\mu\) and \(\sigma\) based on the literature. (Write out the model and then the JAGS code.)
Perform posterior predictive checks by simulating a ‘y.new’ at each iteration of the chain. Does the posterior predictive distribution resemble the observed data?

Here we consider the following priors:

\(\mu \sim \text{gamma}(9/100,3/100)\)

\(\sigma \sim \text{gamma}(36/100,6/100)\)

Note the \(E[\mu] = 3\) and \(E[\sigma] = 6\). You can probably see where the numerators of these priors come from using your now penetrating knowledge of moment matching. However where did the denominators come from? These variance terms were chosen to “flatten” the distributions, making them weakly informative. The larger the demoninator, the flatter the distributions become.

The full model can then be expressed

\[\begin{align} [\mu,\sigma \mid \mathbf{y}]&\propto\prod_{i=1}^n\text{gamma}(y_i\mid \mu,\sigma) \times \text{gamma}(\mu|9/100,3/100) \times \text{gamma}(\sigma|36/100,6/100) \end{align}\]

Here is the JAGS code:

{ # Extra bracket needed only for R markdown files
sink("fit_simpJAGS.R")
cat("
model{

  # priors
  sigma ~ dgamma(36/100, 6/100) 
  mu ~ dgamma(9/100, 3/100) 

  # likelihood
  for(i in 1:length(y)){
    y[i] ~ dgamma(mu^2/sigma^2, mu/sigma^2)
    y.new[i] ~ dgamma(mu^2/sigma^2, mu/sigma^2)
  } 


  sd.y = sd(y[])
  sd.new = sd(y.new[])
  mean.y = mean(y[])
  mean.new = mean(y.new[])
} #end of model
",fill=TRUE)
sink()
} # Extra bracket needed only for R markdown files

data = list(y = PlantEnergy$ME)
inits = list(
  list(mu=c(3), sigma = c(6)),
  list(mu=c(3), sigma = c(6)))

jm_simp = jags.model("fit_simpJAGS.R", data = data, inits = inits,  n.chains = length(inits), n.adapt = 3000)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 531
##    Unobserved stochastic nodes: 533
##    Total graph size: 1084
## 
## Initializing model

update(jm_simp, n.iter = 5000)
zc_simp = coda.samples(jm_simp, variable.names=c("mu", "sigma", "y.new"), n.iter = 5000)

MCMCsummary(zc_simp, params = c("mu", "sigma"))

##           mean        sd     2.5%      50%    97.5% Rhat n.eff
## mu    6.764807 0.1897141 6.398498 6.761053 7.138369    1  2163
## sigma 4.358562 0.1761241 4.031782 4.352333 4.717027    1  2144

y.new <- MCMCchains(zc_simp, params = "y.new")
hist(PlantEnergy$ME, freq=FALSE, xlab = "Metabolizable energy (kjoule / g)", ylab = "Probability density",main="Posterior Predictive density of Simple Model", 
  ylim = c(0, .2), breaks = 20)
lines(density(y.new), lwd = 2)

YIKES! That predictive distribution doesn’t look anything like our data! :(

Now let’s try a two-component mixture model. Specifically, mix two means and two standard deviations using a parameter \(p\) that controls the relative weight of these parameters in the final, bimodal distribution. In this problem you will create JAGS code that returns the posterior distribution of the means and standard deviations for each plant type and the mixing parameter.

Here are some hints to get started. Choose sensible priors for two distributions parameterized by their moments. Here is some starter JAGS code:

sigma[1] ~ dgamma(4/100, 2/100) 
sigma[2] ~ dgamma(25/200, 5/200) T(sigma[1],)
mu[1] ~ dgamma(4/100, 2/100) 
mu[2] ~ dgamma(49/500, 7/500) T(mu[1],)
p ~ dunif(0, 1)

Note a couple of things. First, the priors are centered on reasonable values for the means and the standard deviations based on well established knowledge of the composition of plant tissue, but they have large variances, allowing them to be only weakly informative. The second, critical thing to note is the code T(mu[1],). The T() assures that the draws for mu[2] are always greater than the draws for mu[1] at any given iteration of the MCMC algorithm. The code will not converge if the T()’s are not present. It is probably a good idea to specify initial values for mu[2] that are greater than values for mu[1], although you can often obtain good results without doing so.

You will now need to choose a likelihood appropriate for the support of the random variable and do the proper moment matching of the moments to the parameters of your chosen distribution. Based on the support of the random variable ME, what are appropriate distributions for the mixture components?

Since the observations are positive, we want to select mixture components that have support on the positive real line. Thus, we will model the data using gamma distributions.

Mixture 1: gamma(\(\alpha_1\),\(\beta_1\)) where \(\alpha_1 = \mu_1^2/\sigma_1^2\), \(\beta_1 = \mu_1/\sigma_1^2\)

Mixture 2: gamma(\(\alpha_2\),\(\beta_2\)) where \(\alpha_2 = \mu_2^2/\sigma_2^2\), \(\beta_2 = \mu_2/\sigma_2^2\)

You will make draws of a latent random variable, say \(z_1\) and \(z_2\) from each of the mixture components and will mix them using the parameter p. The probability of the data conditional on the parameters (i.e, the likelihood) will be composed as

\[ \begin{align*} z_i &\sim \text{Bernoulli}(p)\\ y_i &\sim \text{gamma}(\alpha_{\text{mix},i},\beta_{\text{mix},i})\\ \alpha_{\text{mix},i} &= z_i\alpha_{1} + (1 - z_i)\alpha_{2}\\ \beta_{\text{mix},i} &= z_i\beta_{1} + (1 - z_i)\beta_{2} \end{align*} \]

Note that the parameters in the gamma distribution are equal to (\(\alpha_1\),\(\beta_1\)) if \(z_i = 1\) and equal to (\(\alpha_2\),\(\beta_2\)) if \(z_i = 0\).

Write the full posterior and joint distributions.
Write JAGS code to find marginal posterior distributions of the unobserved quantities.
Conduct posterior predictive checks. Overlay a density plot of simulated data on a histogram of real data.

The full posterior and joint distribution can be written as:

\[\begin{align*} [\boldsymbol{\alpha},\boldsymbol\beta, \mathbf{z}, p\mid \mathbf{y}]&\propto\prod_{i=1}^n\text{gamma}(y_i\mid \alpha_{\text{mix},i},\beta_{\text{mix},i})\text{Bernoulli}(z_i\mid p) \\ & \times \text{gamma}(\mu_{1} | 4/100, 2/100 ) \text{gamma}(\mu_{2} | 49/500, 7/500)_{[\mu_{1}, \infty)} \\ &\times \text{gamma}(\sigma_{1} \mid 4/100, 2/100 ) \text{gamma}(\sigma_{2} \mid 25/200, 5/200)_{[\sigma_{1}, \infty)} \\ &\times \text{uniform}(p | 0,1)\\ \alpha_{\text{mix},i}&=z_i\frac{\mu_1^2}{\sigma_1^2} + (1 - z_i)\frac{\mu_2^2}{\sigma_2^2}\\ \beta_{\text{mix},i} &= z_i\frac{\mu_1}{\sigma_1^2} + (1 - z_i)\frac{\mu_2}{\sigma_2^2}\\ \end{align*}\]

Here is the JAGS code:

{ # Extra bracket needed only for R markdown files
sink("fit_distJAGS.R")
cat("
model{

  # priors
  sigma[1] ~ dgamma(4/100, 2/100) 
  sigma[2] ~ dgamma(25/200, 5/200) T(sigma[1],)
  mu[1] ~ dgamma(4/100, 2/100) 
  mu[2] ~ dgamma(49/500, 7/500) T(mu[1],)
  p ~ dunif(0, 1)

  # likelihood
  for(i in 1:length(y)){
    z[i] ~ dbern(p)
    alpha.mix[i] <- z[i] * alpha[1] + (1 - z[i]) * alpha[2]
    beta.mix[i] <- z[i] * beta[1] + (1 - z[i]) * beta[2]
    y[i] ~ dgamma(alpha.mix[i], beta.mix[i])
    y.new[i] ~ dgamma(alpha.mix[i], beta.mix[i])
  } # end of i 

  # derived quantities
  alpha[1] <- mu[1]^2 / sigma[1]^2
  beta[1] <- mu[1] / sigma[1]^2
  alpha[2] <- mu[2]^2 / sigma[2]^2
  beta[2] <- mu[2] / sigma[2]^2

  sd.y = sd(y[])
  sd.new = sd(y.new[])
  p.sd = step(sd.new - sd.y)
  mean.y = mean(y[])
  mean.new = mean(y.new[])
  p.mean = step(mean.new - mean.y)

} #end of model
",fill=TRUE)
sink()
} # Extra bracket needed only for R markdown files

data = list(y = PlantEnergy$ME)
inits = list(
  list(mu=c(2, 5), sigma = c(1, 3)),
  list(mu=c(1, 3), sigma = c(2, 6)))

jm = jags.model("fit_distJAGS.R", data = data, inits = inits,  n.chains = length(inits), n.adapt = 3000)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 531
##    Unobserved stochastic nodes: 1067
##    Total graph size: 5350
## 
## Initializing model

update(jm, n.iter = 5000)
zc = coda.samples(jm, variable.names=c("mu", "sigma", "p", "y.new", "p.sd", "p.mean","alpha","beta"), n.iter = 5000)

MCMCsummary(zc, params = c("mu", "sigma", "p"))

##               mean         sd      2.5%       50%     97.5% Rhat n.eff
## mu[1]    1.6305512 0.28619118 1.1864811 1.5884375 2.2942629    1   216
## mu[2]    7.8633280 0.12769943 7.6200096 7.8624704 8.1256900    1  1152
## sigma[1] 1.2507216 0.30233073 0.7949266 1.2002239 1.9628996    1   212
## sigma[2] 2.3000526 0.09700191 2.1188185 2.2972187 2.4949091    1  1302
## p        0.1792439 0.02285353 0.1378708 0.1777916 0.2283672    1   467

y.new <- MCMCchains(zc, params = "y.new")
hist(PlantEnergy$ME, freq=FALSE, xlab = "Metabolizable energy (kjoule / g)", ylab = "Probability density", main="Posterior Predictive density of Mixture Model",
  ylim = c(0, .2), breaks = 20)
lines(density(y.new), lwd = 2)

Whoa! That looks much better!

Obtain the posterior mean of \((\alpha,\beta)\) for each component and plot the posterior mean component distributions on the same plot. What weight is given to each component?
What is the probability that a sample of plant tissue drawn from this community will have an ME concentration greater than 10 k joule / g?

Let’s first plot the components.

(ab.pm <- MCMCsummary(zc, params = c("alpha","beta")))

##               mean        sd      2.5%       50%     97.5% Rhat n.eff
## alpha[1]  1.774547 0.3287204 1.2278013  1.740870  2.500452    1   605
## alpha[2] 11.752253 1.0494041 9.8343647 11.703970 13.921863    1   755
## beta[1]   1.140193 0.3574289 0.5827183  1.096424  1.945450    1   346
## beta[2]   1.494235 0.1272998 1.2593498  1.489592  1.755925    1   966

alpha1.pm <-  ab.pm[1,1]
alpha2.pm <-  ab.pm[2,1]
beta1.pm <-  ab.pm[3,1]
beta2.pm <-  ab.pm[4,1]

curve(dgamma(x,alpha1.pm,beta1.pm),col="red",xlim=c(0,max(PlantEnergy$ME)),main="Mixture Components",ylab="Density") 
curve(dgamma(x,alpha2.pm,beta2.pm),col="blue",add=TRUE)

MCMCsummary(zc, params = c("p"))

##        mean         sd      2.5%       50%     97.5% Rhat n.eff
## p 0.1792439 0.02285353 0.1378708 0.1777916 0.2283672    1   467

The posterior mean of \(p\) is about 0.18 indicating component one (in red) gets weight of 0.18 and component two (in blue) gets weight of 0.82.

1 - ecdf(y.new)(10)

## [1] 0.1402379