Please enter set.seed(10)
at the R console before doing
any of the R coding below. This will assure that your answers are the
same as ours for random numbers.
When we say support, we are referring to the values of a random variable for which probability density or probability exceed 0 and are defined. The support of lognormal distribution is continuous and strictly non-negative, which makes it particularly useful in ecology and the social sciences. Moreover, it is often useful because it is asymmetric, allowing for values that are extreme in the positive direction. Finally, it is useful for representing products of random variables. The central limit theorem would predict that the distribution of sums of random variables will be normal, no matter how each is individually distributed. The products of random variables will be lognormally distributed regardless of their individual distributions.
If a random variable is lognormally distributed then the log of that random variable is normally distributed (conversely, if you exponentiate a normal random variable it generates a lognormal random variable). The first parameter of the lognormal distribution is the mean of the random variable on the log scale (i.e., \(\alpha\) on cheat sheet, meanlog in R) and the second parameter is the variance (or sometimes the standard deviation) of the random variable on the log scale (i.e., \(\beta\) on cheatsheet, sdlog in R). We often predict the median of the distribution with our deterministic model, which means that we can use the log of the median as the first parameter because
\[\begin{eqnarray} z \sim \textrm{lognormal}\big(\alpha,\beta\big)\\ \textrm{median}\big(z\big) = e^{\alpha}\\ \textrm{log}\big(\textrm{median}\big(z\big)\big) = \alpha \end{eqnarray}\]
par(mfrow = c(1, 2))
mean.var <- 0
sd.var <- 1
n <- 100000
log.y <- rnorm(n, mean = mean.var, sd = sd.var)
hist(log.y, breaks = 50, prob = TRUE, main = "")
x <- seq(-4, 4, .01)
z <- dnorm(x, mean = mean.var, sd = sd.var)
lines(x, z, col = "red", lwd = 2)
y <- rlnorm(n, meanlog = mean.var, sdlog = sd.var)
hist(y, ylim = c(0, .7), breaks = 250, prob = TRUE, xlim = c(0, 10), main = "")
x <- seq(0, 50, .01)
z <- dlnorm(x, meanlog = mean.var, sdlog = sd.var)
lines(x, z, col = "red", lwd = 2)
mean(log.y)
## [1] -0.005958381
var(log.y)
## [1] 1.007373
(mean.y <- exp(mean.var + sd.var^2/2))
## [1] 1.648721
(mean(y))
## [1] 1.636947
(var.y <- (exp(sd.var^2) - 1) * exp(2 * mean.var + sd.var^2))
## [1] 4.670774
(var(y))
## [1] 4.648033
\[y_i \sim \textrm{gamma}\bigg(\frac{103^{2}}{23^{2}}, \frac{103}{23^{2}}\bigg)\]
x <- 94
mu <- 103
sd <- 23
shape <- mu**2/sd**2
rate <- mu/sd**2
dgamma(x, shape, rate)
## [1] 0.01744875
q <- c(110, 90)
p.bound <- pgamma(q, shape, rate)
p.bound[1] - p.bound[2]
## [1] 0.3419165
Here is the case where we moment match into the mean of the lognormal. We are saying the the mean of the unlogged data is 103 and the standard deviation is 23. In this case we need to express the parameters of the lognormal as a function of these untransformed moments.
\[y_i \sim \textrm{lognormal}\bigg(\log(103)-\log\bigg(\sqrt{1+\frac{23^2}{103^2}}\bigg),\,\sqrt{\log\bigg(1+\frac{23^2}{103^2}\bigg)\bigg)}\]
x <- 94
mu <- 103
sd <- 23
alpha = log(mu) - 1/2 * log((sd^2+mu^2)/mu^2)
beta = sqrt(log((sd^2+mu^2)/mu^2))
dlnorm(x, alpha, beta)
## [1] 0.01836968
#check the momment match
y.sim = rlnorm(100000, alpha, beta)
mean(y.sim)
## [1] 103.1197
sd(y.sim)
## [1] 23.14762
q <- c(110, 90)
p.bound <- plnorm(q, alpha, beta)
p.bound[1] - p.bound[2]
## [1] 0.3504293
\[
\begin{align}
\alpha &=\cfrac{\big(\mu^2 - \mu^3 - \mu\sigma^2\big)}{\sigma^2}\\
\beta &=\cfrac{\big(\mu - 2\mu^2 + \mu^3 - \sigma^2 +
\mu\sigma^2\big)}{\sigma^2}\\
\phi &\sim \textrm{beta}\big(\alpha,\beta\big)
\end{align}
\]
shapeit <- function(mu, sigma){
a <- (mu^2 - mu^3 - mu * sigma^2)/sigma^2
b <- (mu - 2 * mu^2 + mu^3 - sigma^2 + mu * sigma^2)/sigma^2
shape_ps <- c(a, b)
return(shape_ps)
}
betaParms <- shapeit(mu = .04, sigma = .01)
x <- seq(0, .2, .001)
y <- dbeta(x, betaParms[1], betaParms[2])
plot(x, y,type = 'l', ylab = expression(paste("[",phi,"]")), xlab = expression(phi),
xlim = c(0, 0.1), lwd = 2)
\[\Pr(y=5 \mid \phi, n=50) = \textrm{binomial}\big(y=5\mid \phi=0.04,n=50\big) =\binom{50}{5}0.04^5(1-0.04)^{50-5}\]
x <- 5
phi <- 0.04
n <- 50
dbinom(x = x, p = phi, size = n)
## [1] 0.03456107
x <- seq(0, 50)
y <- dbinom(x = x, p = phi, size = n)
plot(x, y, type = 'h', ylab = expression(paste("Pr(y|",phi,")"),), xlab = expression(y), lwd = 3, col = "blue")
\[\begin{align*} \Pr\big(y \geq5 \mid \phi, n=50\big) & = \textrm{binomial}\big(y \geq5 \mid \phi=0.04,n=50\big)\\ &= \sum_{y_{i}\in(5,6,\ldots,50)}\binom{50}{y_{i}}\big(0.04\big)^{\,y_{i}}\big(1-0.04\big)^{50-y_{i}}\\ \end{align*}\]
q <- 4
phi <- 0.04
n <- 50
(1 - pbinom(q = q, p = phi, size = n))
## [1] 0.04897147
x <- seq(0, 10)
y <- pbinom(q = x-1, p = phi, size = n) # note -1 to represent "fewer"
plot(x, y, ylab = expression(paste("P(",Y <= y," | ",phi,")")), xlab = expression(y),
pch = 16, col = "red", cex = 1.5)
\[ y \sim \textrm{binomial}(1, \phi) \equiv y \sim \textrm{Bernoulli}(\phi) \]
n <- 75
size <- 1
phi <- 0.04
(ySim <- rbinom(n = n, size = size, prob = phi))
## [1] 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1
## [39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
\[\begin{eqnarray} \mu_i &=& \beta_0 + \beta_1x_1\\ \alpha_i &=& \cfrac{\mu_i^2}{\sigma^2}\\ \beta_i &=& \cfrac{\mu_i}{\sigma^2}\\ y_i &\sim& \textrm{gamma}\big(\alpha_i,\beta_i\big)\\ \end{eqnarray}\]
x <- runif(20, 0.01, 0.2)
mu <- 0.01 + 0.9 * x
sigma <- 0.03
alpha <- mu^2/sigma^2
beta <- mu/sigma^2
y <- rgamma(length(x), alpha, beta)
plot(x, y, xlab = "Soil Moisture", ylab = "Growth Rate", pch = 16, col = "red", cex = 1.5)
lines(x, mu, lwd = 2)
\[ \big[z\mid\lambda,r\big] = \cfrac{\Gamma\big(z + r\big)}{\Gamma\big(r\big)z!}\Big(\cfrac{r}{r+\lambda}\Big)^r \Big(\cfrac{\lambda}{r+\lambda}\Big)^z\textrm{,} \]
where \(z\) is a discrete random variable, \(\lambda\) is the mean of the distribution, and \(r\) is the dispersion parameter. Here, the variance of \(z\) equals:
\[\lambda + \cfrac{\lambda^{2}}{r}.\]
The second parameterization is more often implemented in coding environments (i.e. JAGS):
\[ \big[z \mid r,\phi \big] = \cfrac{\Gamma\big(z+r\big)}{\Gamma\big(r\big)z!}\phi^r\big(1-\phi\big)^z\textrm{,} \]
where \(z\) is the discrete random variable representing the number of failures that occur in a sequence of Bernoulli trials before \(r\) successes are obtained. The parameter \(\phi\) is the probability of success on a given trial. Note that \(\phi=\cfrac{r}{\big(\lambda+r\big)}\).
Simulate \(100,000\) observations from a negative binomial distribution with mean of \(\mu = 100\) and variance of \(\sigma^2 = 400\) using the first parameterization that has a mean and a dispersion parameter (remember to moment match). Do the same simulation using the second parameterization. Plot side-by-side histograms of the simulated data.
n <- 100000
lambda <- 100
sigma2 <- 400
r <- lambda^2/(sigma2 - lambda)
z1 <- rnbinom(n, mu = lambda, size = r)
mean(z1)
## [1] 99.89035
var(z1)
## [1] 398.817
n <- 100000
lambda <- 100
sigma2 <- 400
r <- lambda^2/(sigma2 - lambda)
phi <- r/(r + lambda)
z2 <- rnbinom(n, prob = phi ,size = r)
mean(z2)
## [1] 100.0316
var(z2)
## [1] 400.0573
par(mfrow = c(1, 2))
hist(z1, col = "gray")
hist(z2, col = "gray")