Please enter set.seed(10)
at the R console before doing
any of the R coding below. This will assure that your answers are the
same as ours for random numbers.
\[\big[y_i \mid g\big(\mathbf\theta,x_i\big),\sigma^2\big],\]
which represents the probability of obtaining the observation \(y_i\) given that our model predicts the mean of a distribution \(g(\mathbf\theta,x_i)\) with variance \(\sigma^2\). Assume we have count data. What distribution would be a logical choice to model these data? Write out a model for the data.
The Poisson is a logical choice. We predict the mean of the Poisson for each \(x_i\), i.e. \(\lambda_i = g\big(\mathbf\theta,x_i\big)\), which also controls the uncertainty because in the Poisson distribution the variance equals the mean.A model for the data is:
\[ y_i \sim \textrm{Poisson}\big(g\big(\mathbf\theta,x_i\big)\big).\]
Random variable | Distribution | Justification |
---|---|---|
The mass of carbon in above ground biomass in square m plot. | gamma or lognormal | continuous and non-negative |
The number of seals on a haul-out beach in the gulf of AK. | Poisson or negative binomial | counts |
Presence or absence of an invasive species in forest patches. | Bernoulli | zero or one |
The probability that a white male will vote republican in a presidential election. | beta | zero to one |
The number of individuals in four mutually exclusive income categories. | multinomial | counts in more than two categories |
The number of diseased individuals in a sample of 100. | binomial | counts in two categories, number of successes on a given number of trials. |
The political party affiliation (democrat, republican, independent) of a voter. | multinomial | counts in more than two categories |
lambda <- 33
n <- 10000
y <- rpois(n, lambda)
mean(y)
## [1] 33.0171
var(y)
## [1] 33.06711
quantile (y, c(0.025, 0.975))
## 2.5% 97.5%
## 22 45
prob <- c(.07,.13,.15,.23,.42)
size <- 80
n <- 1
rmultinom(n, size, prob)
## [,1]
## [1,] 7
## [2,] 7
## [3,] 12
## [4,] 20
## [5,] 34
The normal distribution isn’t an ideal choice because it extends below 0, which isn’t possible for measurements of above ground biomass. Nonetheless:
\[y_i \sim \textrm{normal}(103, 23^{2})\]
x <- 94
mean <- 103
sd <- 23
dnorm(x, mean, sd)
## [1] 0.01606693
q <- c(110, 90)
p.bound <- pnorm(q, mean = mean, sd = sd)
p.bound[1] - p.bound[2]
## [1] 0.3336056
\[y_i \sim \textrm{binomial}(24, 0.12)\]
x <- 4
size <- 24
p <- 0.12
dbinom(x, size, p)
## [1] 0.1709024
p <- c(0.56, 0.06, 0.16, 0.22)
y <- c(65, 4, 25, 26)
dmultinom(x = y, prob = p)
## [1] 0.0003043713
mu <- 1.9
sigma <- 1.4
p <- 0.025
qnorm(p, mu, sigma)
## [1] -0.8439496
The normal distribution isn’t an ideal choice because it extends below 0, which isn’t possible for measurements of nitrogen fixation.