ConjugatePriorsLab.knit

Bayesian Models for Ecologists

Using Conjugate Priors

June 05, 2024

Motivation

Conjugate relationship between likelihoods and priors are important in MCMC and offer a handy way to find values for the parameters of posterior distributions in closed form, on the back of a cocktail napkin. Here we practice that important bar skill. Go over your notes on conjugate priors before diving in to this problem set. For each of the problems below you should:

Identify appropriate conjugate distributions and write out full posteriors for the quantity you are modeling.
Plot the posterior and the prior distributions for the modeled quantity and interpret (below is code to get you started).
Using the posterior distribution, determine the 95% credible interval for the parameter in each scenario.
Comment on the informative nature of the prior in terms of its relationship to the posterior.

Example plotting code:

Suppose the prior distribution of \(\alpha\) is a beta(4,5) and the posterior distribution of \(\alpha\) given the data \(y\) is a beta(20,3). Below is code that plots both of these distributions and finds the posterior 95% credible interval for \(\alpha\).

prior <- function(x) dbeta(x, 4, 5)
post <- function(x) dbeta(x, 20, 3)

plot(prior, 0, 1, ylim=c(0, 7), main = "Beta prior (dash) and  Beta posterior (red)", xlab = expression(alpha), ylab = "Density",
  lty = 2, lwd = 2)
plot(post, 0, 1, add = TRUE, col = "red", lwd = 2)

# 95% CI from posterior
round(qbeta(c(.025,.975),20,3),2)

## [1] 0.71 0.97

Thus, after observing the data we have 95% confidence (or belief) that \(\alpha\) is between 0.71 and 0.97.

Scenarios

Climate Referendum Voting (small sample): You are conducting a political opinion poll and poll a small number (\(n=10\)) of voters at random. You ask each voter whether they will vote “yes” in an upcoming referendum on funding for local climate adaptation. Seven voters say they intend to vote “no” and 3 report intending to vote “yes.” These data can be represented as a single observation \(y=3\). Little is known about how people might vote in advance. Assuming a vague prior on the probability \(\phi\) that a person votes “yes,” answer questions a-d. In addition, find the posterior probability that the referendum passes: \(\textrm{Pr}\big(\phi \geq .5\big|y)\).

Recall that we can use the binomial - beta conjugate relationship to find the parameters of the beta posterior using

\[\phi\sim\textrm{beta}\big(\alpha_{prior}+y,\beta_{prior}+n-y\big)\]

Parameter of interest: \(\phi\) = probability that a person will vote “yes” on upcoming referendum (or equivalently, the proportion of the population that intends to vote “yes”)

Prior distribution: \(\phi \sim\) beta\((1,1)\) (Note, this is equivalent to a uniform distribution on the interval (0,1).)

Data model: \(y \sim\) binomial\((10,\phi)\)

Posterior distribution: \((\phi | y = 3) \sim\) beta\((4,8)\)

Note: no calculations were required for determining the posterior distribution. We identified the beta-binomial prior/data model on the distribution worksheet and knew the posterior was a beta distribution. The parameters of the posterior were determined using the prior hyperparameter and the observed data.

Now, let’s plot the prior and posterior distributions.

aPrior <- 1
bPrior <- 1 
y <- 3
n <- 10
(aPost <- aPrior + y)

## [1] 4

(bPost <- bPrior + n - y)

## [1] 8

prior <- function(x) dbeta(x, aPrior, bPrior)
post <- function(x) dbeta(x, aPost, bPost)

plot(prior, 0, 1, ylim=c(0, 3), main = "Beta prior (dash) and  beta posterior (red)", xlab = expression(phi), ylab = "Density",
  lty = 2, lwd = 2)
plot(post, 0, 1, add = TRUE, col = "red", lwd = 2)

Now let’s find the posterior 95% credible interval.

bounds95 <- qbeta(c(0.025,0.975), aPost, bPost)
bounds95

## [1] 0.1092634 0.6097426

Finally, using the posterior distribution, let’s calculate the probability the vote passes.

1 - pbeta(.5, aPost, bPost)

## [1] 0.1132812

For those that are interested, below is the derivation of the posterior distribution: \[\begin{eqnarray} \big[\phi \mid y, n\big] & \propto & \textrm{binomial}\big(y\mid\phi,n\big)\,\textrm{beta}\big(\phi\mid \alpha_{prior},\beta_{prior}\big)\\[1em] & \propto & \phi^y \big(1-\phi\big)^{n-y}\phi^{\alpha_{prior}-1}\big(1-\phi\big)^{\beta_{prior}-1}\\[1em] & \propto & \phi^{\alpha_{prior}+y-1}\big(1-\phi\big)^{\beta_{prior}+n-y-1}\\[1em] & = & \frac{\Gamma\big(\alpha_{prior}+y+\beta_{prior}+n-y\big)}{\Gamma\big(\alpha_{prior}+y\big)\Gamma\big(\beta_{prior}+n-y\big)} \phi^{\alpha_{prior}+y-1}\big(1-\phi\big)^{\beta_{prior}+n-y-1}\\[1em] & = & \textrm{beta}\big(\alpha_{prior}+y,\beta_{prior}+n-y\big)\\[1em] & = & \textrm{beta}\big(1+3,1+10-3\big)\\[1em] & = & \textrm{beta}\big(4,8\big) \end{eqnarray}\]

Climate Referendum Voting (larger sample): You are conducting a political opinion poll and have recorded data \(\mathbf{z}=(z_1,\ldots,z_n)'\) from \(n=100\) voters at random (saved on the ClimateVote data frame from the BayesNSF package). Each voter is asked whether they will vote “yes” in an upcoming referendum on funding for local climate adaptation. Find the posterior probability that the referendum passes. The poll results are stored in the data frame ClimateVote. In this case, assume we have already conducted the study from question 1 above. Thus, we seek to use the posterior distribution from the previous scenario as the prior for this new analysis, answer questions a-d. In addition:

Find the posterior probability that the referendum passes as \(\textrm{Pr}\big(\phi \geq .5\big|\mathbf{z})\).
Try using a vague prior for \(\phi\). Does your posterior distribution change much? Why or why not?

aPrior <- 4
bPrior <- 8 
(y <- sum(ClimateVote$vote))

## [1] 38

(n <- dim(ClimateVote)[1])

## [1] 100

(aPost <- aPrior + y)

## [1] 42

(bPost <- bPrior + n - y)

## [1] 70

Prior distribution: \(\phi \sim\) beta\((4,8)\) (posterior from above)

Data model: \(y \sim\) binomial\((100,\phi)\)

Posterior distribution: \((\phi | y = 38) \sim\) beta\((42,70)\)

prior <- function(x) dbeta(x, aPrior, bPrior)
post <- function(x) dbeta(x, aPost, bPost)
postFlat <- function(x) dbeta(x, 1 + y, 1 + n - y)

plot(prior, 0, 1, ylim=c(0, 10), main = "Beta prior (dash) and beta posterior (red, blue)", xlab = expression(phi), ylab = "Density", lty = 2, lwd = 2)
plot(post, 0, 1, add = TRUE, col = "red", lwd = 2)
plot(postFlat, 0, 1, add = TRUE, col = "blue", lwd = 2)

Now let’s find the posterior 95% credible interval.

bounds95 <- qbeta(c(0.025,0.975), aPost, bPost)
bounds95

## [1] 0.2880426 0.4661870

Finally, using the posterior distribution, let’s calculate the probability the vote passes.

1 - pbeta(.5, aPost, bPost)

## [1] 0.003792406

For those that are interested, below is the derivation of the posterior distribution viewing the data as coming from Bernoulli distributions rather than a binomial. \[\begin{eqnarray} \big[\phi \mid \mathbf{z}\big] & \propto & \prod_{i=1}^{100}\textrm{Bernoulli}\big(z_{i}\mid\phi)\,\textrm{beta}\big(\phi\mid \alpha_{prior},\beta_{prior}\big)\\[1em] & \propto & \prod_{i=1}^{100}\phi^{z_{i}}\big(1-\phi\big)^{1-z_{i}}\phi^{\alpha_{prior}-1}\big(1-\phi\big)^{\beta_{prior}-1}\\[1em] & \propto & \phi^{\alpha_{prior}-1+\sum_{i=1}^{100}z_{i}}\big(1-\phi\big)^{\beta_{prior}-1+\sum_{i=1}^{100}(1 - z_{i})}\\[1em] & \propto & \phi^{\alpha_{prior}+y-1}\big(1-\phi\big)^{\beta_{prior}+100-y-1}\\[1em] & = & \frac{\Gamma\big(\alpha_{prior}+y+\beta_{prior}+100-y\big)}{\Gamma\big(\alpha_{prior}+y\big)\Gamma\big(\beta_{prior}+100-y\big)} \phi^{\alpha_{prior}+y-1}\big(1-\phi\big)^{\beta_{prior}+100-y-1}\\[1em] & = & \textrm{beta}\big(\alpha_{prior}+y,\beta_{prior}+100-y\big)\\[1em] & = & \textrm{beta}\big(4+38,8+100-38\big)\\[1em] & = & \textrm{beta}\big(42,70\big)\\[1em] && \textrm{where}\quad y = \sum_{i=1}^{100}z_{i} \end{eqnarray}\]

Lobster Catch: A group of \(n=5\) people go lobster fishing for an hour. They catch 15, 8, 6, 11, and 4 lobsters. Estimate average catch \(\lambda\) based on these data, given that, prior to this fishing trip, other trips have resulted in an average of 3 lobsters per hour \((\mu=3)\) an hour with a standard deviation of 1.5 lobsters per hour \((\sigma=1.5)\).

Let’s first identify the data model and appropriate conjugate prior.

Data model: \(y_i \sim\) Poisson(\(\lambda\)) for \(i=1,...,5\)

Conjugate prior: \(\lambda \sim\) gamma(\(a_{prior}\),\(b_{prior}\))

Now, let’s use moment-matching to convert the prior mean \(\mu=3\) and standard deviation \(\sigma = 1.5\) to \(a_{prior}\) and \(b_{prior}\).

mu <- 3
sigma <- 1.5

(aPrior <- mu^2/sigma^2)

## [1] 4

(bPrior <- mu/sigma^2)

## [1] 1.333333

Using the gamma-Poisson conjugate relationship, we know the posterior distribution is \(\lambda | y_1,...,y_5 \sim\) gamma(\(a_{prior} + \sum y_i, b_{prior} +n\)).

y <- c(15, 8, 6, 11, 4 )
(aPost <- aPrior + sum(y))

## [1] 48

(bPost <- bPrior + length(y))

## [1] 6.333333

prior <- function(x) dgamma(x, aPrior, bPrior)
post <- function(x) dgamma(x, aPost, bPost)

plot(prior, 0, 20, main = "Gamma prior (dash) and gamma posterior (red)", xlab = expression(lambda), 
  ylab = "Density", lty = 2, lwd = 2, ylim = c(0, 0.4))
plot(post, 0, 20, add = TRUE, col = "red", lwd = 2)

bounds95 <- qgamma(c(0.025,0.975), aPost, bPost)
bounds95

## [1] 5.588117 9.868427

For those interested, the derivation of the posterior distribution is below: \[\begin{eqnarray} \big[\lambda \mid y\big] & \propto & \prod_{i=1}^{n}\textrm{Poisson}\big(y_{i}\mid\lambda\big)\,\textrm{gamma}\big(\lambda\mid\alpha_{prior},\beta_{prior}\big)\\[1em] & = & \prod_{i=1}^{n}\lambda^{y_{i}}e^{-\lambda}\lambda^{\alpha_{prior}-1}e^{-\beta_{prior}\lambda}\\[1em] & = & \lambda^{\alpha_{prior}-1 + \sum_{i=1}^{n}y_{i}}e^{-\beta_{prior}\lambda-n\lambda}\\[1em] & = & \lambda^{\alpha_{prior}-1 + \sum_{i=1}^{n}y_{i}}e^{-(n + \beta_{prior})\lambda}\\[1em] & = & \textrm{gamma}\big(\alpha_{prior} + \sum_{i=1}^{n}y_{i}, n + \beta_{prior}\big)\\[1em] & = & \textrm{gamma}\big(4 + 44, 5 + 1.333\big)\\[1em] & = & \textrm{gamma}\big(48, 6.333\big) \end{eqnarray}\]

Solstice temperature: Your agency has recorded temperature data on the December solstice for the last 50 years in Maryland (saved on the SolsticeTemp data frame from the BayesNSF package). You seek to learn about mean temperature \(\mu\) for the past 50 years, but suppose that we know the standard deviation of the distribution that gave rise to our observed measurements and it is \(\sigma=15\). In a separate study, researchers found mean temperature to be 30 \((\mu_{prior}=30)\) with a standard deviation of 12 \((\sigma_{prior}=12)\). Given this prior information, answer questions a-d.

Parameter of interest: \(\mu\) = true mean temperature

Data model: \(y_i \sim\) normal(\(\mu,15^2\)) for \(i=1,...,50\)

Prior: \(\mu \sim\) normal\((30,12^2)\)

(varKnown <- 15^2)

## [1] 225

(n <- dim(SolsticeTemp)[1])

## [1] 50

(varPrior <- 12^2)

## [1] 144

(muPrior <- 30)

## [1] 30

(y <- sum(SolsticeTemp$temp))

## [1] 1538.549

(muPost <-((muPrior/varPrior) + y/varKnown)/((1/varPrior) + (n/varKnown)))

## [1] 30.74762

(varPost <- 1/((1/varPrior) + (n/varKnown)))

## [1] 4.363636

Posterior: \(\mu | \boldsymbol{y} \sim\) normal(30.75,4.36)

prior <- function(x) dnorm(x, muPrior, varPrior^.5)
post <- function(x) dnorm(x, muPost, varPost^.5)

plot(prior, 0, 60, main = "Normal prior (dash) and normal posterior (red)", xlab = expression(mu), 
  ylab = "Density", lty = 2, ylim = c(0, 0.2), lwd = 2)
plot(post, 0, 60, add = TRUE, col = "red", lwd = 2)

bounds95 <- qnorm(c(0.025,0.975), muPost, varPost^.5)
bounds95

## [1] 26.65338 34.84185

For those interested, the derivation of the posterior distribution is below: \[\begin{eqnarray} \big[\mu \mid y\big] & \propto & \prod_{i=1}^{n}\textrm{normal}\big(y_{i}\mid\mu, \sigma^{2}\big)\,\textrm{normal}\big(\mu\mid\mu_{prior},\sigma_{prior}^{2}\big)\\[1em] & = & \textrm{normal}\Bigg(\cfrac{\bigg(\cfrac{\mu_{prior}}{\sigma_{prior}^{2}} +\cfrac{\sum_{i=1}^{n}y_{i}}{\sigma^{2}}\bigg)}{\bigg( \cfrac{1}{\sigma_{prior}^{2}} +\cfrac{n}{\sigma^{2}} \bigg)} ,\bigg(\cfrac{1}{\sigma_{prior}^{2}}+{\cfrac{n}{\sigma^{2}}}\bigg)^{-1}\Bigg)\\[1em] & = & \textrm{normal}\Bigg(\cfrac{\bigg(\cfrac{30}{144} +\cfrac{1538.549}{225}\bigg)}{\bigg( \cfrac{1}{144} +\cfrac{50}{225} \bigg)} ,\bigg(\cfrac{1}{144}+{\cfrac{50}{225}}\bigg)^{-1}\Bigg)\\[1em] & = & \textrm{normal}\big(30.75, 4.36 \big) \end{eqnarray}\]

Chromium exposure in the workplace: Hexavalent chromium is used in textile dyes, wood preservation, anti-corrosion products, chromate conversion coatings, and a variety of niche uses. Industrial uses of hexavalent chromium compounds include chromate pigments in dyes, paints, inks, and plastics; chromates added as anticorrosive agents to paints, primers, and other surface coatings; and chromic acid is electroplated onto metal parts to provide a decorative or protective coating. Inhaled hexavalent chromium is recognized as a human carcinogen. You have data \(\mathbf{y}=(y_1,\ldots,y_n)'\) on the airborne chromium concentration in \(n=100\) stainless-steel welding workplaces saved in the Chrome data frame from the BayesNSF package. You seek to learn about the variance \(\sigma^2\) of the distribution that gave rise to these data (about which you know very little before the study), but you already know that in a typical stainless-steel welding workplace, airborne concentrations have a known mean of \(\mu=225\) μg/m3. Given this prior information, answer questions a-d. Hint: the R library actuar has the function dinvgamma() for calculating probability densities of the inverse gamma distribution.

Parameter of interest: \(\sigma^2\) = variance of airborne chromium concentrations

Data model: \(y_i \sim\) normal(225,\(\sigma^2\)) for \(i=1,...,100\)

Prior: \(\sigma^2 \sim\) inverse-gamma\((.001,.001)\) (vague prior for variances)

muKnown <- 225
n <- nrow(Chrome)
alphaPrior <- 0.001 #shape
betaPrior <- 0.001 #scale

(alphaPost <-alphaPrior + (n/2))

## [1] 50.001

(betaPost <- betaPrior + sum((Chrome$conc-muKnown)^2)/2)

## [1] 110410.8

Posterior: \(\sigma^2 | \boldsymbol{y} \sim\) inverse-gamma(alphaPost,betaPost)

library(actuar)

## 
## Attaching package: 'actuar'

## The following objects are masked from 'package:stats':
## 
##     sd, var

## The following object is masked from 'package:grDevices':
## 
##     cm

prior <- function(x) dinvgamma(x, shape = alphaPrior, scale = betaPrior)
post <- function(x) dinvgamma(x, shape = alphaPost, scale = betaPost)

plot(prior, 0, 4000, main = "Inverse Gamma Prior (dash) and Inverse Gamma Posterior (red)", xlab = expression(sigma^2), 
     ylab = "Density", lty = 2, lwd = 2, ylim = c(0,.0013))
plot(post, 0, 4000, add = TRUE, col = "red", lwd = 2)

bounds95 <- qinvgamma(c(0.025,.975), alphaPost, scale = betaPost)
bounds95

## [1] 1704.351 2975.084

For those interested, the derivation of posterior distribution is given below: \[\begin{eqnarray} \big[\sigma^2 \mid y\big] & \propto & \prod_{i=1}^{n}\textrm{normal}\big(y_{i}\mid\mu, \sigma^{2}\big)\,\textrm{inverse gamma}\big(\sigma^2|\alpha_{prior},\beta_{prior}\big)\\[1em] & = & \textrm{inverse gamma}\Bigg(\bigg(\alpha_{prior} + \cfrac{n}{2}\bigg),\bigg(\beta_{prior} + \cfrac{\Sigma_{i=1}^n(y_i-\mu)^2}{2}\bigg)\Bigg)\\[1em] & = & \textrm{inverse gamma}\Bigg(\bigg(0.001 + \cfrac{n}{2}\bigg),\bigg(0.001 + \cfrac{\Sigma_{i=1}^n(y_i-225)^2}{2}\bigg)\Bigg) \end{eqnarray}\]