Bayesian Models for Ecologists
Probability Lab 1: Chain rule, joints distributions, and DAGs
June 02, 2024

Motivation

Bayesian analysis is predicated on the idea that we learn about unobserved quantities from quantities we are able to observe. All unobserved quantities (i.e. parameters, latent states, missing data, and even the data themselves before they are observed) are treated as random variables in the Bayesian approach. All Bayesian analysis extends from the laws of probability, that is, from the “mathematics of random variables.”

Random variables are quantities whose value is determined by chance. Statistical distributions represent how “chance” works by specifying the probability that a random variable takes on a value (in the discrete case) or falls within a range of values (in the continuous case). The goal of Bayesian analysis is to discover the characteristics of probability distributions that govern the behavior of random variables of interest, for example, the size of a population, the rate of nitrogen accumulation in a stream, the diversity of plants on a landscape, the prevalence of a disease in a population, the stress levels of juvenile elephants.

It follows that understanding the laws of probability and statistical distributions provides the foundation for Bayesian analysis. Keep in mind the following learning objectives



Chain rule

1. Use the chain rule of probability to factor the joint distribution \(\Pr(\theta_1,\theta_2, \theta_3,\theta_4)\) into a joint conditional distribution.

\[\begin{align} \Pr\big(\theta_1,\theta_2,\theta_3,\theta_4\big)=\\ &\times\Pr\big(\theta_1 \mid \theta_2,\theta_3,\theta_4\big)\\ &\times\Pr\big(\theta_2 \mid \theta_3,\theta_4\big)\\ &\times\Pr\big(\theta_3 \mid \theta_4\big)\\ &\times\Pr\big(\theta_4\big) \end{align}\]

The order doesn’t matter. The following is also correct:

\[\begin{align} \Pr\big(\theta_1,\theta_2,\theta_3,\theta_4\big)=\\ &\times\Pr\big(\theta_4 \mid \theta_2,\theta_3,\theta_1\big)\\ &\times\Pr\big(\theta_2 \mid \theta_3,\theta_1\big)\\ &\times\Pr\big(\theta_3 \mid \theta_1\big)\\ &\times\Pr\big(\theta_1\big) \end{align}\]

as are many other variations.


2. Consider the following factored joint distribution: \[\Pr\big(\theta,w,\eta,\alpha\big)=\Pr\big(\alpha \mid w,\eta,\theta\big)\Pr\big(w \mid \eta,\theta\big)\Pr\big(\eta \mid \theta\big)\Pr\big(\theta\big)\] Simplify this equation using the knowledge that \(\eta\) and \(\theta\) are independent and that \(w\) and \(\theta\) are independent.

\[\Pr\big(\theta,w,\eta,\alpha\big)=\Pr\big(\alpha \mid w,\eta,\theta\big)\Pr\big(w \mid \eta\big)\Pr\big(\eta\big)\Pr\big(\theta\big)\]



Converting DAGs to joint distributions

Write the joint and conditional distributions for the following Bayesian networks. For discrete random variables, \(\big[A\big]\) is equivalent to \(\Pr\big(A\big)\). For continuous random variables \(\big[A\big]\) is the probability density of \(A\).




\[\begin{eqnarray} & &\textrm{1.}\quad\big[a,b,c,d,e\big] = \big[a \mid b,c\big]\big[b\mid d,e\big]\big[c\big]\big[d\big]\big[e\big] \\[1em] & &\textrm{2.}\quad\big[y,z,\beta_1,\beta_2,\theta_d,\theta_p\big] = \big[y\mid z,\theta_d\big]\big[z \mid \theta_p\big]\big[\theta_d\mid\beta_1,\beta_2\big]\big[\theta_p\big]\big[\beta_1\big]\big[\beta_2\big]\\[1em] & &\textrm{3.}\quad\big[y,x,\mu,\phi_1,\phi_2,\sigma,z \big] = \big[y\mid x, z \big] \big[x \mid \mu, \sigma \big]\big[z \mid \phi_1,\phi_2\big]\big[\sigma\big]\big[\phi_1\big]\big[\phi_2\big]\big[\mu\big]\\[1em] & &\textrm{4.}\quad\big[y_1,y_2,z,\theta_1,\theta_2,\gamma_1,\gamma_2,\gamma_3\big] = \big[y_1\mid\theta_1,z\big]\big[y_2\mid \theta_2,z\big]\big[z\mid\gamma_1,\gamma_2,\gamma_3\big]\big[\theta_1\big]\big[\theta_2\big]\big[\gamma_1\big]\big[\gamma_2\big] \big[\gamma_3\big]\\[1em] & &\textrm{5.}\quad\big[A,B,D,E,F,H\big] = \big[A \mid D,E\,\big]\big[D \mid H\big]\big[B \mid H,F\big]\big[E\big]\big[H\big]\big[F\big] \end{eqnarray}\]



Converting joint distributions to DAGs

Draw Bayesian networks (DAGs) for the joint and conditional distributions below.

\[\begin{eqnarray} & &\textrm{I.}\quad \Pr\big(A,B\big) = \Pr\big(A\mid B\big)\Pr\big(B\big)\\[1em] & &\textrm{II.}\quad \Pr\big(A,B,C\big) = \Pr\big(A \mid B,C\big)\Pr\big(B\mid C\big)\Pr\big(C\big)\\[1em] & &\textrm{III.}\quad \Pr\big(A,B,C,D\big) = \Pr\big(A \mid C\big)\Pr\big(B \mid C\big)\Pr\big(C \mid D\big)\Pr\big(D\big)\\[1em] & &\textrm{IV.}\quad \Pr\big(A,B,C,D,E\big) = \Pr\big(A \mid C\big)\Pr\big(B\mid C\big) \Pr\big(C \mid D,E\big)\Pr\big(D\big)\Pr\big(E\big)\\[1em] & &\textrm{V.}\quad \Pr\big(A,B,C,D\big) = \Pr\big(A \mid B,C,D\big)\Pr\big(B \mid C,D\big)\Pr\big(C \mid D\big),\Pr\big(D\big)\\[1em] & &\textrm{VI.}\quad \Pr\big(A,B,C,D\big) = \Pr\big(A \mid B,C,D\big)\Pr\big(C \mid D\big)\Pr\big(B\big)\Pr\big(D\big)\\[1em] \end{eqnarray}\]





Simplifying

Simplify the expression below, given that \(z_2\) and \(z_3\) are independent random variables. \[ \Pr\big(z_1,z_2,z_3\big)=\Pr\big(z_1\mid z_2,z_3\big)\Pr\big(z_2 \mid z_3\big)\Pr\big(z_3\big)\]


The expression simplifies to \[ \Pr\big(z_1,z_2,z_3\big)=\Pr(z_1\mid z_2,z_3\big)\Pr\big(z_2\big)\Pr\big(z_3\big)\] because \(\Pr\big(z_2\mid z_3\big) = \Pr\big(z_2\big)\) if \(z_2\) and \(z_3\) are independent.


Interpreting and Factoring

The probability of an observation \(y\) depends on a true ecological state of interest, \(z\), and the parameters in a data model, \(\pmb\theta_d\). The probability of the true state \(z\) depends on the parameters in an ecological process model, \(\pmb\theta_p\). We know that \(\pmb\theta_d\) and \(\pmb\theta_p\) are independent. Write out a factored expression for the joint distribution, \(\Pr\big(y,z,\pmb\theta_d,\pmb\theta_p\big)\). Drawing a Bayesian network will help.


\[\Pr\big(y,z,\pmb\theta_d,\pmb\theta_p\big) = \Pr\big(y \mid z,\pmb\theta_d\big)\Pr\big(z \mid \pmb\theta_p\big)\Pr\big(\theta_d\big)\Pr\big(\pmb\theta_p\big)\]