Bayesian Models for Ecologists
Probability Lab 2: Probability Distributions
June 03, 2024

Please enter set.seed(10) at the R console before doing any of the R coding below. This will assure that your answers are the same as ours for random numbers.

  1. We commonly represent the following general framework for linking models to data:

\[\big[y_i \mid g\big(\mathbf\theta,x_i\big),\sigma^2\big],\]

which represents the probability of obtaining the observation \(y_i\) given that our model predicts the mean of a distribution \(g(\mathbf\theta,x_i)\) with variance \(\sigma^2\). Assume we have count data. What distribution would be a logical choice to model these data? Write out a model for the data.


The Poisson is a logical choice. We predict the mean of the Poisson for each \(x_i\), i.e. \(\lambda_i = g\big(\mathbf\theta,x_i\big)\), which also controls the uncertainty because in the Poisson distribution the variance equals the mean.A model for the data is:

\[ y_i \sim \textrm{Poisson}\big(g\big(\mathbf\theta,x_i\big)\big).\]


  1. Choose the appropriate distribution for the types of data shown below and justify your decision.


Random variable Distribution Justification
The mass of carbon in above ground biomass in square m plot. gamma or lognormal continuous and non-negative
The number of seals on a haul-out beach in the gulf of AK. Poisson or negative binomial counts
Presence or absence of an invasive species in forest patches. Bernoulli zero or one
The probability that a white male will vote republican in a presidential election. beta zero to one
The number of individuals in four mutually exclusive income categories. multinomial counts in more than two categories
The number of diseased individuals in a sample of 100. binomial counts in two categories, number of successes on a given number of trials.
The political party affiliation (democrat, republican, independent) of a voter. multinomial counts in more than two categories


  1. Find the mean, variance, and 95% quantiles of 10000 random draws from a Poisson distribution with \(\lambda=33\).


lambda <- 33
n <- 10000
y <- rpois(n, lambda)
mean(y)
## [1] 33.0171
var(y)
## [1] 33.06711
quantile (y, c(0.025, 0.975))
##  2.5% 97.5% 
##    22    45


  1. Simulate one observation of survey data with five categories on a Likert scale, i.e. strongly disagree to strongly agree. Assume a sample of 80 respondents and the following probabilities:
  1. Strongly disagree = 0.07
  2. Disagree = .13
  3. Neither agree nor disagree = .15
  4. Agree = .23
  5. Strongly agree = .42


prob <- c(.07,.13,.15,.23,.42)
size <- 80 
n <- 1 
rmultinom(n, size, prob)
##      [,1]
## [1,]    7
## [2,]    7
## [3,]   12
## [4,]   20
## [5,]   34


  1. The average above ground biomass in a grazing allotment of sagebrush grassland is 103 g/m2, with a standard deviation of 23. You clip a 1 m2 plot. Write out the model for the probability density of the data point. What is the probability density of an observation of 94 assuming the data are normally distributed? Is there a problem using normal distribution? What is the probability that your plot will contain between 90 and 110 gm of biomass?


The normal distribution isn’t an ideal choice because it extends below 0, which isn’t possible for measurements of above ground biomass. Nonetheless:

\[y_i \sim \textrm{normal}(103, 23^{2})\]

x <- 94 
mean <- 103
sd <- 23
dnorm(x, mean, sd)
## [1] 0.01606693
q <- c(110, 90) 
p.bound <- pnorm(q, mean = mean, sd = sd)
p.bound[1] - p.bound[2]
## [1] 0.3336056


  1. The prevalence of a disease in a population is the proportion of the population that is infected with the disease. The prevalence of chronic wasting disease in male mule deer on winter range near Fort Collins, CO is 12 percent. A sample of 24 male deer included 4 infected individuals. Write out a model that represents how the data arise. What is the probability of obtaining these data conditional on the given prevalence (p=0.12)?


\[y_i \sim \textrm{binomial}(24, 0.12)\]

x <- 4 
size <- 24
p <- 0.12
dbinom(x, size, p)
## [1] 0.1709024


  1. Researchers know that the true proportion of related age-sex classifications for elk in Rocky Mountain National Park are: Adult females (p = 0.56), Yearling males (p = 0.06), Bulls (p = 0.16), and Calves (p = 0.22). What is the probability of obtaining the classification data conditional on the known sex-age population proportions given the following counts?
  1. Adult females (count = 65)
  2. Yearling males (count = 4)
  3. Bulls (count = 25)
  4. Calves (count = 26)


p <- c(0.56, 0.06, 0.16, 0.22)
y <- c(65, 4, 25, 26)

dmultinom(x = y, prob = p)
## [1] 0.0003043713


  1. Nitrogen fixation by free-living bacteria occurs at a rate of 1.9 g/N/ha/yr with a standard deviation (\(\sigma\)) of 1.4. What is the lowest fixation rate that exceeds 2.5% of the distribution? Use a normal distribution for this problem, but discuss why this might not be a good choice.


mu <- 1.9
sigma <- 1.4
p <- 0.025
qnorm(p, mu, sigma)
## [1] -0.8439496

The normal distribution isn’t an ideal choice because it extends below 0, which isn’t possible for measurements of nitrogen fixation.