Bayesian Models for Ecologists
JAGS Primer Lab
June 06, 2024

Motivation for using KnitR

You can complete all the coding exercises in the JAGS Primer using simple R scripts. However, you might be wondering about how we created the JAGS Primer key you are looking at right now. We did this using Yihui Xie’s knitr package in R. This can be a highly useful tools for organizing your Bayesian analyses. Within the same document, you can:

Best of all, knitr can produce beautiful html files as output, which can be easily shared with collaborators. We encourage you to become familiar with knitr. We recommend Karl Broman’s knitr in a nutshell as an excellent introductory tutorial.



Exercise 1: Factoring

There is no x in the posterior distribution in equation 4. What are assuming if x is absent? Draw the Bayesian network, or DAG, for this model. Use the chain rule to fully factor the joint distribution into sensible parts then simplify by assuming that \(r\), \(K\) and \(\tau\) are independent.


There is a dashed line from the \(x\) instead of a solid line because we are assuming it is measured without error.

NOTE: there is a slight error in the DAG below. \(r\) and \(K\) should be separate nodes with separate arrows pointing to \(y_{i}\).

Fig 1. Bayesian network for logistic model.

Factoring the joint using the chain rule: \[[r,K,\tau, \mathbf{y}]=[y\mid r,K,\tau][r|K,\tau][K|\tau][\tau]\]

Simplifying under the assumption of \(r,K,\tau\) are independent: \[[r,K,\tau, \mathbf{y}]=[y\mid r,K,\tau][r][K][\tau]\]

We would obtain the same result by inspecting the DAG and using the rules:

  • All random variables on at the head of arrows must be on the lhs of a conditioning symbol.
  • All random variables at the tails of arrows must be on the rhs of a conditioning symbol.
  • All random variables at the tails of arrows with no arrows coming into them must be expressed unconditionally.



Exercise 2: Can you improve these priors?

A recurring theme in this course will be to use priors that are informative whenever possible. The gamma priors in equation 4 of the JAGS Primer include the entire number line \(>0\). Don’t we know more about population biology than that? Lets, say for now that we are modeling the population dynamics of a large mammal. How might you go about making the priors on population parameters more informative?


A great source for priors in biology are allometric scaling relationships that predict all kinds of biological quantities based on the mass organisms (Peters, 1983; Pennycuick,1992). If you know the approximate mass of the animal, you can compute broad but nonetheless informative priors on \(r\) and \(K\). This might leave out the social scientists, but I would trust the scaling of \(r\) for people if not \(K\).

In the absence of some sort of scholarly way to find priors, we can at least constrain them somewhat. There is no way that a large mammal can have an intrinsic rate of increase exceeding 1 – many values for \(r\) within gamma(.001, .001) are far large than than that and hence are complete nonsense. We know \(r\) must be positive and we can put a plausible bound on its upper limit. The only requirement for a vague prior is that its “\(\ldots\) range of uncertainty should be clearly wider that the range of reasonable values of the parameter\(\ldots\)” (Gelman and Hill, 2009, page 355), so we could use \(r\) ~ uniform(0, 2) and be sure that it would be minimally informative. Similarly, we could use experience and knowledge to put some reasonable bounds on \(K\) and even \(\sigma\), which we can use to calculate \(\tau\) as \(\tau=\frac{1}{\sigma^{2}}\).

Peters. The ecological implications of body size. Cambridge University Press, Cambridge, United Kingdom, 1983.

C. J. Pennycuick. Newton rules biology. Oxford University Press, Oxford United Kingdom, 1992.

A. Gelman and J. Hill. Data analysis using regression and multilievel / hierarchical modeling. Cambridge University Press, Cambridge, United Kingdom, 2009.



Exercise 3: Using for loops

Write a code fragment to set vague normal priors for 5 regression coefficients – dnorm(0, 10E-6) – stored in the vector b.


for(i in 1:5){
  b[i] ~ dnorm(0, .000001)
}



Exercise 4: Coding the logistic regression

Write R code (algorithm 3) to run the JAGS model (algorithm 3) and estimate the parameters, \(r\), \(K\) \(\sigma\), and \(\tau\). We suggest you insert the JAGS model into this R script using the sink() command as shown in algorithm 4. because this model is small. You will find this a convenient way to keep all your code in the same R script. For larger models, you will be happier using a separate file for the JAGS code. Here is the joint distribution for our logistic model again, with the priors updated from exercise 2 and \(\tau\) expressed as a derived quantity:

\[\begin{eqnarray} \mu_{i} & = & r-\frac{rx_{i}}{K}\textrm{,}\nonumber\\[1em] \tau & = & \frac{1}{\sigma^{2}}\textrm{,}\nonumber\\[1em] \left[r,K,\sigma\mid\mathbf{y}\right] & \propto & \prod_{i=1}^{n}\textrm{normal}\left(y_{i} \mid \mu_{i},\tau\right)\textrm{uniform}\left(K\mid 0,4000\right) \textrm{uniform}\left(\sigma\mid 0, 5\right) \textrm{uniform}\left(r\mid 0,2\right)\textrm{.}\nonumber\\ \end{eqnarray}\]


We use the sink() command to create a JAGS script from our joint distribution. This file is created within R and saved in the working directory. Please note that the outer set of brackets are only required when running this code within an R markdown document (as we did to make this answer key). If you are running them in a plain R script, they are not needed.

{ # Extra bracket needed only for R markdown files
sink("LogisticJAGS.R")
cat(" 
model {
  
  # priors
  
  K ~ dunif(0, 4000)
  r ~ dunif (0, 2)
  sigma ~ dunif(0, 50) 
  tau <- 1/sigma^2
  
  # likelihood
  
  for(i in 1:n) {
    
    mu[i] <- r - r/K * x[i]
    y[i] ~ dnorm(mu[i], tau)
  
  }
} 

",fill = TRUE)
sink()
} # Extra bracket needed only for R markdown files

Then we run the remaining commands discussed in the JAGS Primer. Note that jm is calling the JAGS script LogisticJAGS.R we just created.

#Be sure to do this sorting.  It will be important for plotting later.
Logistic <- Logistic[order(Logistic$PopulationSize),]

inits = list(
  list(K = 1500, r = .2, sigma = .01),
  list(K = 1000, r = .15, sigma = .5),
  list(K = 900, r = .3, sigma = .01))

data = list(
  n = nrow(Logistic),
  x = as.double(Logistic$PopulationSize),
  y = as.double(Logistic$GrowthRate))

n.adapt = 1000
n.update = 10000
n.iter = 10000

jm = jags.model("LogisticJAGS.R", data = data, inits = inits, n.chains = length(inits), n.adapt = n.adapt)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 50
##    Unobserved stochastic nodes: 3
##    Total graph size: 212
## 
## Initializing model
update(jm, n.iter = n.update)
zm = coda.samples(jm, variable.names = c("K", "r", "sigma", "tau"), n.iter = n.iter, n.thin = 1)
MCMCsummary(zm)
##               mean           sd         2.5%          50%        97.5% Rhat
## K     1.237541e+03 6.173703e+01 1.129882e+03 1.232866e+03 1371.4924289    1
## r     2.006359e-01 9.558064e-03 1.814643e-01 2.007363e-01    0.2192536    1
## sigma 2.862479e-02 3.028972e-03 2.343769e-02 2.837182e-02    0.0353196    1
## tau   1.260651e+03 2.602156e+02 8.016198e+02 1.242297e+03 1820.4143679    1
##       n.eff
## K      7746
## r      7956
## sigma 14749
## tau   16132



Exercise 5: Understanding coda ojects

  1. Look at the first six rows of the data frame.
  2. Find the maximum value of \(\sigma\).
  3. Find the mean of \(r\) for the first 1000 iterations.
  4. Find the mean of \(r\) after the last 1000 iterations.
  5. Make two publication quality plots of the marginal posterior density of K, one as a smooth curve and the other as a histogram.
  6. Compute the probability that K > 1600. Hint: what type of probability distribution would you use for this computation? Investigate the the dramatically useful R function ecdf().
  7. Compute the probability that 1000 < K < 1300.
  8. Compute the .025 and .975 quantiles of K. Hint–use the R quantile() function.


# Make a data frame
df = as.data.frame(rbind(zm[[1]], zm[[2]], zm[[3]]))

# Look at the first six rows:
head(df)
##          K         r      sigma      tau
## 1 1425.674 0.1817601 0.03055570 1071.064
## 2 1364.547 0.1869651 0.03073484 1058.615
## 3 1261.924 0.2100030 0.02867852 1215.868
## 4 1234.257 0.1967178 0.02833189 1245.801
## 5 1248.645 0.1960525 0.02664854 1408.163
## 6 1189.915 0.2052014 0.02693989 1377.870
# Find the mean of r for the first 1000 iterations
mean(df$r[1: 1000])
## [1] 0.2006209
# Find the mean of r for the last 1000 iterations
nr = length(df$r)
mean(df$r[(nr - 1000): nr])
## [1] 0.201092
# Make a publication quality plot of the marginal posterior distribution of K as a smooth curve.  
# More iterations would produce a smoother cruve) and as a histogram.
plot(density(df$K), main = "", xlim = c(1000, 1500), xlab = "K") 
Fig. 2. Posterior density of K.

Fig. 2. Posterior density of K.

hist(df$K, main = "", xlim = c(1000, 1500), xlab = "K", breaks = 50, freq = FALSE) 
Fig. 2. Posterior density of K.

Fig. 2. Posterior density of K.

# Find the probability that the parameter K exceeds 1600
1 - ecdf(df$K)(1600)
## [1] 0
# Find the probability that the parameter 1000 < K < 1300 
ecdf(df$K)(1300) - ecdf(df$K)(1000)
## [1] 0.8535
# Compute the .025 and .975 quantiles of K
quantile(df$K, c(.025, .975))
##     2.5%    97.5% 
## 1129.882 1371.492



Exercise 6: Summarizing coda objects

Summarize the coda output from the logistic model with 4 significant digits. Include Rhat and effective sample size diagnostics (more about these soon). Summarize the coda output for \(r\) alone.


MCMCsummary(zm, round = 4, n.eff = TRUE)
##            mean       sd      2.5%       50%     97.5% Rhat n.eff
## K     1237.5415  61.7370 1129.8818 1232.8661 1371.4924    1  7746
## r        0.2006   0.0096    0.1815    0.2007    0.2193    1  7956
## sigma    0.0286   0.0030    0.0234    0.0284    0.0353    1 14749
## tau   1260.6509 260.2156  801.6198 1242.2972 1820.4144    1 16132
MCMCsummary(zm, params = "r", round = 4, n.eff = TRUE)
##     mean     sd   2.5%    50%  97.5% Rhat n.eff
## r 0.2006 0.0096 0.1815 0.2007 0.2193    1  7956



Exercise 7: Using MCMCchains

Extract the chains for r and sigma as a data frame using MCMCchains and compute their .025 and .975 quantiles from the extracted object. Display three significant digits. Make a publication quality histogram for the chain for sigma. Indicate the .025 and .975 Bayesian equal-tailed credible interval value with vertical red lines. Overlay the .95 highest posterior density interval with vertical lines in blue. This is a “learn on your own” problem intended to allow you to rehearse the most important goal of this class: being sufficiently confident to figure things out. Load the package HDinterval, enter HDInterval at the console and follow your nose. Also see Hobbs and Hooten Figure 8.2.1.


ex = as.data.frame(MCMCchains(zm, params = c("r", "sigma")))
class(ex)
## [1] "data.frame"
dim(ex)
## [1] 30000     2
signif(apply(ex, 2, quantile, c(.025, .975)), 3)
##           r  sigma
## 2.5%  0.181 0.0234
## 97.5% 0.219 0.0353
hist(ex$sigma,xlab = expression(sigma), ylab = "Probability density", freq = FALSE, breaks = 100, main = "")
abline(v = quantile(ex$sigma, c(.025, .975)), col = "red", lwd = 2)
abline(v = hdi(ex$sigma, .95), lwd = 2, col = "blue")



Exercise 8: Plotting data and model predictions using the MCMCpstr formatting

For the logistic model:

  1. Plot the observations of growth rate as a function of observed population size.
  2. Overlay the median of the model predictions as a solid line.
  3. Overlay the 95% equal-tailed credible intervals as dashed lines in red.
  4. Overlay the 95% highest posterior density intervals as dashed lines in blue.
  5. What do you note about these two intervals? Will this always be the case? Why or why not?
  6. What do the dashed lines represent? What inferential statement can we make about relative these lines?


zm = coda.samples(jm, variable.names = c("K", "r", "sigma", "mu"),  n.iter = 10000)
BCI <- MCMCpstr(zm, params = "mu", func = function(x) quantile(x, c(.025, .5, .975)))
HPDI <-  MCMCpstr(zm, params = "mu", func = function(x) hdi(x, .95))

plot(data$x, data$y, pch = 19, xlab = "Population size", ylab = "Per-capita grwoth rate (1/year)" )
lines(data$x,BCI$mu[,2], typ = "l")
lines(data$x, BCI$mu[,1], lty = "dashed", col = "red")
lines(data$x, BCI$mu[,3], lty = "dashed", col = "red")
lines(data$x, HPDI$mu[,1], lty = "dashed", col = "blue") 
lines(data$x, HPDI$mu[,2], lty = "dashed", col = "blue")  
Fig. 4. Median and 95% credible intervals for predicted growth rate and posterior density of K.

Fig. 4. Median and 95% credible intervals for predicted growth rate and posterior density of K.

The intervals are exactly overlapping in this particular case. Such overlap will not always occur. Equal-tailed intervals based on quantiles will be broader than highest posterior density intervals when marginal posterior distributions are asymmetric. There is a 95% probability that the true value of the mean per-capita population growth rate falls between these lines. Not that this differs from the prediction of a new observation of population growth rate, which would have much broader credible intervals.



Exercise 9: Creating caterpillar plots with MCMCplot

Use the MCMCplot function to create caterpillar plots to visualize the posterior densities for \(r\), and \(\sigma\) using the coda object zm from earlier. Then use MCMCplot() to explore different plotting options. There are lots of these options, including ways to make the plots publication quality, show overlap with zero, etc.


MCMCplot(zm, params = "mu")



Exercise 10: Assessing convergence

Rerun the logistic model with n.adapt = 100. Then do the following:

  1. Keep the next 500 iterations. Assess convergence visually with MCMCtrace() and with the Gelman-Rubin, Heidelberger and Welch, and Raftery diagnostics.
  2. Update another 500 iterations and then keep 500 more iterations. Repeat your assessment of convergence.
  3. Repeat steps 1 and 2 until you feel you have reached convergence.
  4. Change the adapt phase to zero and repeat steps 1 – 4. What happens?


n.adapt = 100
jm.short = jags.model("LogisticJAGS.R", data = data, inits = inits, n.chains = length(inits), n.adapt = n.adapt)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 50
##    Unobserved stochastic nodes: 3
##    Total graph size: 212
## 
## Initializing model
n.iter = 500
zm.short = coda.samples(jm.short, variable.names = c("K", "r", "sigma", "tau"), n.iter = n.iter, n.thin = 1)
par(mfrow=c(1,2))
MCMCtrace(zm.short, pdf = FALSE)

gelman.diag(zm.short)
## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## K           1.11       1.33
## r           1.02       1.09
## sigma       1.01       1.04
## tau         1.01       1.04
## 
## Multivariate psrf
## 
## 1.07
heidel.diag(zm.short)
## [[1]]
##                                     
##       Stationarity start     p-value
##       test         iteration        
## K     passed       1         0.700  
## r     passed       1         0.556  
## sigma passed       1         0.677  
## tau   passed       1         0.721  
##                                   
##       Halfwidth Mean     Halfwidth
##       test                        
## K     passed    1.24e+03 1.47e+01 
## r     passed    2.00e-01 2.09e-03 
## sigma passed    2.85e-02 3.78e-04 
## tau   passed    1.28e+03 3.18e+01 
## 
## [[2]]
##                                     
##       Stationarity start     p-value
##       test         iteration        
## K     passed       1         0.0676 
## r     passed       1         0.1535 
## sigma passed       1         0.7336 
## tau   passed       1         0.9184 
##                                   
##       Halfwidth Mean     Halfwidth
##       test                        
## K     passed    1.24e+03 1.55e+01 
## r     passed    2.01e-01 1.85e-03 
## sigma passed    2.86e-02 3.08e-04 
## tau   passed    1.26e+03 2.60e+01 
## 
## [[3]]
##                                     
##       Stationarity start     p-value
##       test         iteration        
## K     passed       1         0.0871 
## r     passed       1         0.0630 
## sigma passed       1         0.8822 
## tau   passed       1         0.6780 
##                                   
##       Halfwidth Mean     Halfwidth
##       test                        
## K     passed    1.26e+03 3.20e+01 
## r     passed    1.99e-01 3.03e-03 
## sigma passed    2.92e-02 4.05e-04 
## tau   passed    1.21e+03 2.92e+01
raftery.diag(zm.short)
## [[1]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
## 
## [[2]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
## 
## [[3]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
n.update = 500
update(jm.short, n.iter = n.update)
n.iter = 500
zm.short = coda.samples(jm.short, variable.names = c("K", "r", "sigma", "tau"), n.iter = n.iter, n.thin = 1)
MCMCtrace(zm.short, pdf = FALSE)

gelman.diag(zm.short)
## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## K              1       1.02
## r              1       1.00
## sigma          1       1.01
## tau            1       1.01
## 
## Multivariate psrf
## 
## 1
heidel.diag(zm.short)
## [[1]]
##                                     
##       Stationarity start     p-value
##       test         iteration        
## K     passed       1         0.0651 
## r     passed       1         0.2139 
## sigma passed       1         0.1530 
## tau   passed       1         0.1910 
##                                   
##       Halfwidth Mean     Halfwidth
##       test                        
## K     passed    1.24e+03 1.53e+01 
## r     passed    2.01e-01 2.00e-03 
## sigma passed    2.87e-02 3.84e-04 
## tau   passed    1.26e+03 3.16e+01 
## 
## [[2]]
##                                     
##       Stationarity start     p-value
##       test         iteration        
## K     passed       1         0.552  
## r     passed       1         0.298  
## sigma passed       1         0.486  
## tau   passed       1         0.156  
##                                   
##       Halfwidth Mean     Halfwidth
##       test                        
## K     passed    1.23e+03 1.30e+01 
## r     passed    2.01e-01 1.83e-03 
## sigma passed    2.85e-02 3.29e-04 
## tau   passed    1.27e+03 3.10e+01 
## 
## [[3]]
##                                     
##       Stationarity start     p-value
##       test         iteration        
## K     passed       1         0.496  
## r     passed       1         0.477  
## sigma passed       1         0.679  
## tau   passed       1         0.539  
##                                   
##       Halfwidth Mean     Halfwidth
##       test                        
## K     passed    1.23e+03 1.72e+01 
## r     passed    2.01e-01 1.85e-03 
## sigma passed    2.89e-02 3.55e-04 
## tau   passed    1.23e+03 2.49e+01
raftery.diag(zm.short)
## [[1]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
## 
## [[2]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
## 
## [[3]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
n.update = 10000
update(jm.short, n.iter = n.update)
n.iter = 5000
zm.short = coda.samples(jm.short, variable.names = c("K", "r", "sigma", "tau"), n.iter = n.iter, n.thin = 1)
MCMCtrace(zm.short, pdf = FALSE)

gelman.diag(zm.short)
## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## K              1          1
## r              1          1
## sigma          1          1
## tau            1          1
## 
## Multivariate psrf
## 
## 1
heidel.diag(zm.short)
## [[1]]
##                                     
##       Stationarity start     p-value
##       test         iteration        
## K     passed       1         0.1150 
## r     passed       1         0.0667 
## sigma passed       1         0.4049 
## tau   passed       1         0.2157 
##                                   
##       Halfwidth Mean     Halfwidth
##       test                        
## K     passed    1.24e+03 5.91e+00 
## r     passed    2.00e-01 7.19e-04 
## sigma passed    2.87e-02 1.29e-04 
## tau   passed    1.25e+03 1.02e+01 
## 
## [[2]]
##                                     
##       Stationarity start     p-value
##       test         iteration        
## K     passed       1         0.679  
## r     passed       1         0.787  
## sigma passed       1         0.650  
## tau   passed       1         0.290  
##                                   
##       Halfwidth Mean     Halfwidth
##       test                        
## K     passed    1.24e+03 4.871203 
## r     passed    2.01e-01 0.000581 
## sigma passed    2.86e-02 0.000115 
## tau   passed    1.26e+03 9.233694 
## 
## [[3]]
##                                     
##       Stationarity start     p-value
##       test         iteration        
## K     passed       1         0.322  
## r     passed       1         0.220  
## sigma passed       1         0.518  
## tau   passed       1         0.572  
##                                   
##       Halfwidth Mean     Halfwidth
##       test                        
## K     passed    1.24e+03 5.627117 
## r     passed    2.01e-01 0.000703 
## sigma passed    2.86e-02 0.000111 
## tau   passed    1.27e+03 9.353055
raftery.diag(zm.short)
## [[1]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
##                                              
##        Burn-in  Total Lower bound  Dependence
##        (M)      (N)   (Nmin)       factor (I)
##  K     6        7520  3746         2.01      
##  r     15       20205 3746         5.39      
##  sigma 4        5211  3746         1.39      
##  tau   7        7675  3746         2.05      
## 
## 
## [[2]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
##                                              
##        Burn-in  Total Lower bound  Dependence
##        (M)      (N)   (Nmin)       factor (I)
##  K     10       10758 3746         2.87      
##  r     5        6078  3746         1.62      
##  sigma 4        4955  3746         1.32      
##  tau   6        6520  3746         1.74      
## 
## 
## [[3]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
##                                              
##        Burn-in  Total Lower bound  Dependence
##        (M)      (N)   (Nmin)       factor (I)
##  K     12       14134 3746         3.77      
##  r     6        6939  3746         1.85      
##  sigma 5        5673  3746         1.51      
##  tau   6        6878  3746         1.84



ADVANCED Exercise 11: Coding the logistic regression to run in parallel

Append R code (algorithm 5) to the script you made in exercise 4 to run the JAGS model (algorithm 2) in parallel and estimate the parameters, \(r\), \(K\) \(\sigma\), and \(\tau\). Use the proc.time() function in R to compare the time required for the sequential and parallel JAGS run. If your computer has 3 threads, try running only 2 chains in parallel when doing this exercise. If you have fewer than 3 threads, work with a classmate that has at least 3 threads.


cl <- makeCluster(3)

initsP = function() {list(K = runif(1, 10, 4000), r = runif(1, .01, 2), 
  sigma = runif(1, 0, 2),  .RNG.name = "base::Mersenne-Twister", 
  .RNG.seed = runif(1, 1, 2000))}

parallel::clusterExport(cl, c("data", "initsP", "n.adapt", "n.update", "n.iter"))

ptm <- proc.time()

out <- clusterEvalQ(cl, {
  library(rjags)
  jm = jags.model("LogisticJAGS.R", data = data, n.chains = 1, 
  n.adapt = n.adapt, inits = initsP())
  update(jm, n.iter = n.update)
  zm = coda.samples(jm, variable.names = c("K", "r", "sigma", "tau"), 
  n.iter = n.iter, thin = 1)
  return(as.mcmc(zm))
}) 

ParallelTime <- proc.time() - ptm
ParallelTime
##    user  system elapsed 
##   0.002   0.001   0.375
stopCluster(cl)
zmP = mcmc.list(out)
MCMCsummary(zmP)
##               mean           sd         2.5%          50%        97.5% Rhat
## K     1237.5949106 6.225569e+01 1.130747e+03 1.232233e+03 1.376558e+03    1
## r        0.2007066 9.671987e-03 1.814612e-01 2.007002e-01 2.196674e-01    1
## sigma    0.0286298 3.024168e-03 2.338522e-02 2.838048e-02 3.524084e-02    1
## tau   1260.2681591 2.608226e+02 8.052068e+02 1.241539e+03 1.828594e+03    1
##       n.eff
## K      2988
## r      2983
## sigma  9563
## tau   10490

We rerun the model sequentially and use proc.time() again for comparison.

ptm <- proc.time()
jm = jags.model("LogisticJAGS.R", data = data, inits = inits, n.chains = length(inits), n.adapt = n.adapt)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 50
##    Unobserved stochastic nodes: 3
##    Total graph size: 212
## 
## Initializing model
update(jm, n.iter = n.update)
zm = coda.samples(jm, variable.names = c("K", "r", "sigma"), n.iter = n.iter, n.thin = 1)
SequentialTime <- proc.time() - ptm
SequentialTime
##    user  system elapsed 
##   0.836   0.002   0.842

Looks as if the parallel model runs 2.25 times faster. This factor should increase the more iterations you run (why?).


You might be very tempted, in the name of reproducible science, to set the seed inside each worker to the same value. What happens when you do this? Compare the results from this run to the previous run using MCMCsummary() and MCMC(trace).


cl <- makeCluster(3)

initsP = function() {list(K = runif(1, 10, 4000), r = runif(1, .01, 2), 
  sigma = runif(1, 0, 2),  .RNG.name = "base::Mersenne-Twister", 
  .RNG.seed = runif(1, 1, 2000))}

parallel::clusterExport(cl, c("data", "initsP", "n.adapt", "n.update", "n.iter"))

out <- clusterEvalQ(cl, {
  library(rjags)
  set.seed(10)
  jm = jags.model("LogisticJAGS.R", data = data, n.chains = 1, 
  n.adapt = n.adapt, inits = initsP())
  update(jm, n.iter = n.update)
  zm = coda.samples(jm, variable.names = c("K", "r", "sigma", "tau"), 
  n.iter = n.iter, thin = 1)
  return(as.mcmc(zm))
}) 

stopCluster(cl)
zmP = mcmc.list(out)

MCMCsummary(zm)
##               mean           sd         2.5%          50%        97.5% Rhat
## K     1.237961e+03 62.201887658 1.129998e+03 1.233304e+03 1.371477e+03    1
## r     2.007044e-01  0.009691454 1.818624e-01 2.008000e-01 2.198861e-01    1
## sigma 2.865748e-02  0.003012082 2.349517e-02 2.839737e-02 3.533227e-02    1
##       n.eff
## K      2199
## r      3086
## sigma  7692
MCMCsummary(zmP)
##               mean           sd         2.5%          50%        97.5% Rhat
## K     1.239659e+03 6.227385e+01 1.134518e+03 1.233489e+03 1378.0986189  NaN
## r     2.005185e-01 9.584991e-03 1.817614e-01 2.004705e-01    0.2194044  NaN
## sigma 2.859709e-02 2.896285e-03 2.358206e-02 2.843383e-02    0.0348032  NaN
## tau   1.260124e+03 2.518494e+02 8.255847e+02 1.236885e+03 1798.1932583  NaN
##       n.eff
## K      3953
## r      4096
## sigma  9014
## tau   10309
MCMCtrace(zm, pdf = FALSE)

MCMCtrace(zmP, pdf = FALSE)


Repeat this exercise with fixed initial values using algorithm 6.


cl <- makeCluster(3)

myWorkers <- NA
for(i in 1:3) myWorkers[i] <- word(capture.output(cl[[i]]),-1)

initsP = list(
  list(K = 1500, r = .2, sigma = 1, RNG.seed = 1,
    .RNG.name = "base::Mersenne-Twister"),
  list(K = 1000, r = .15, sigma = .1, RNG.seed = 23,
    .RNG.name = "base::Mersenne-Twister"),
  list(K = 900, r = .3, sigma = .01, .RNG.seed = 255,
    .RNG.name = "base::Mersenne-Twister"))

parallel::clusterExport(cl, c("myWorkers", "data", "initsP", "n.adapt", "n.update", "n.iter"))

out <- clusterEvalQ(cl, {
  library(rjags)
  jm = jags.model("LogisticJAGS.R", data = data, n.chains = 1, 
  n.adapt = n.adapt, inits = initsP[[which(myWorkers==Sys.getpid())]])
  update(jm, n.iter = n.update)
  zm = coda.samples(jm, variable.names = c("K", "r", "sigma", "tau"), 
  n.iter = n.iter, thin = 1)
  return(as.mcmc(zm))
}) 

stopCluster(cl)
zmP = mcmc.list(out)

MCMCsummary(zmP)
##               mean           sd         2.5%          50%        97.5% Rhat
## K     1.239021e+03 6.308794e+01 1.130208e+03 1.234069e+03 1.380525e+03    1
## r     2.005644e-01 9.754911e-03 1.812908e-01 2.005283e-01 2.195311e-01    1
## sigma 2.869483e-02 3.070605e-03 2.340323e-02 2.845907e-02 3.535849e-02    1
## tau   1.255631e+03 2.636193e+02 7.998575e+02 1.234692e+03 1.825780e+03    1
##       n.eff
## K      1781
## r      2514
## sigma  7646
## tau    8427
MCMCtrace(zmP, pdf = FALSE)