The Gamma-Poisson Model

Dr. Mine Dogucu

The notes for this lecture are derived from Chapter 5 of the Bayes Rules! book

Choosing a Prior

Prior distribution depends on our current information. When choosing a prior we may consider

computational ease
interpretability

Conjugate Prior

Let the prior model for parameter \(\theta\) have pdf \(f(\theta)\) and the model of data \(Y\) conditioned on \(\theta\) have likelihood function \(L(\theta|y)\). If the resulting posterior model with pdf \(f(\theta|y) \propto f(\theta)L(\theta|y)\) is of the same model family as the prior, then we say this is a conjugate prior.

Examples

The Beta-Binomial Model - Beta is a conjugate prior for the Binomial Likelihood.

The Gamma-Poisson Model

The Normal-Normal Model

Non-Conjugate Prior for Binomial Likelihood

\[f(\pi)=e-e^\pi\; \text{ for } \pi \in [0,1] \] Is this a valid pdf?

\(f(\pi)\) is non-negative on the support of \(\pi\).

Non-Conjugate Prior for Binomial Likelihood

\[f(\pi)=e-e^\pi\; \text{ for } \pi \in [0,1] \]

Is this a valid pdf?

\(\int_0^1 f(\pi) \stackrel{?}{=} 1\)

\(\int_0^1 e-e^\pi\ d\pi=(e\pi - e^\pi)|_0^1 = [e-e] -[0-e^0]= 1\)

\(\int_0^1 f(\pi) = 1\)

Getting to the posterior model

Assume \(Y = 10\) and \(n = 50\)

\(L(\pi | (y=10)) = {50 \choose 10} \pi^{10} (1-\pi)^{40} \; \; \text{ for } \pi \in [0,1] \; .\)

\(f(\pi | (y = 10)) \propto f(\pi) L(\pi | (y = 10)) = (e-e^\pi) \cdot \binom{50}{10} \pi^{10} (1-\pi)^{40}.\)

\(f(\pi | y ) \propto (e-e^\pi) \pi^{10} (1-\pi)^{40}.\)

\(f(\pi|y=10)= \frac{(e-e^\pi) \pi^{10} (1-\pi)^{40}}{\int_0^1(e-e^\pi) \pi^{10} (1-\pi)^{40}d\pi} \; \; \text{ for } \pi \in [0,1].\)

We would need to integrate to calculate this posterior model, and integrate again for its mean, mode, and variance.

The Question

We are interested in modeling \(\lambda\) the rate of fraud risk calls received per day. We plan on collecting data on the number of fraud risk phone calls received each day.

The Poisson model

Let random variable \(Y\) be the number of independent events that occur in a fixed amount of time or space, where \(\lambda > 0\) is the rate at which these events occur. Then the dependence of \(Y\) on parameter \(\lambda\) can be modeled by the Poisson. In mathematical notation:

\[Y | \lambda \sim \text{Pois}(\lambda) \]

The Poisson model is specified by a conditional pmf:

\[\begin{equation} f(y|\lambda) = \frac{\lambda^y e^{-\lambda}}{y!}\;\; \text{ for } y \in \{0,1,2,\ldots\} \end{equation}\]

A Poisson random variable \(Y\) is assumed to have equal mean and variance,

\[\begin{equation} E(Y|\lambda) = \text{Var}(Y|\lambda) = \lambda \; \end{equation}\]

Joint likelihood function

Let \((Y_1,Y_2,\ldots,Y_n)\) be an independent sample of random variables and \(\vec{y} = (y_1,y_2,\ldots,y_n)\) be the corresponding vector of observed values.

\[\begin{equation} L(\lambda | \vec{y}) = \prod_{i=1}^n L(\lambda | y_i) = L(\lambda | y_1) \cdot L(\lambda | y_2) \cdots L(\lambda | y_n) \; \end{equation}\]

\[\begin{equation} L(\lambda | \vec{y}) = \prod_{i=1}^{n}f(y_i | \lambda) = \prod_{i=1}^{n}\frac{\lambda^{y_i}e^{-\lambda}}{y_i!} \;\; \text{ for } \; \lambda > 0 \; \end{equation}\]

Joint likelihood function

\[\begin{split} L(\lambda | \vec{y}) & = \prod_{i=1}^{n}\frac{\lambda^{y_i}e^{-\lambda}}{y_i!} \\ & = \frac{\lambda^{y_1}e^{-\lambda}}{y_1!} \cdot \frac{\lambda^{y_2}e^{-\lambda}}{y_2!} \cdots \frac{\lambda^{y_n}e^{-\lambda}}{y_n!} \\ & =\frac{\lambda^{\sum y_i}e^{-n\lambda}}{\prod_{i=1}^n y_i!} \\ \end{split}\]

Poisson Likelihood

We collect four days of data and receive 6, 2, 2, 1 spam calls each day. Write out the likelihood model.

\[L(\lambda | \vec{y}) =\frac{\lambda^{\sum y_i}e^{-n\lambda}}{\prod_{i=1}^n y_i!}\]

\[L(\lambda | \vec{y}) = \frac{\lambda^{6 +2+2+1}e^{-4\lambda}}{6!\times2!\times2!\times1!} \propto \lambda^{11}e^{-4\lambda} \; .\]

plot_poisson_likelihood(y = c(6, 2, 2, 1), 
                        lambda_upper_bound = 10)

Gamma Prior

Let \(\lambda\) be a random variable which can take any value between 0 and \(\infty\), ie. \(\lambda \in [0,\infty)\). Then the variability in \(\lambda\) might be well modeled by a Gamma model with shape parameter \(s > 0\) and rate parameter \(r > 0\):

\[\lambda \sim \text{Gamma}(s, r)\]

The Gamma model is specified by continuous pdf

\[\begin{equation} f(\lambda) = \frac{r^s}{\Gamma(s)} \lambda^{s-1} e^{-r\lambda} \;\; \text{ for } \lambda > 0 \end{equation}\]

where constant \(\Gamma(s) = \int_0^\infty z^{s - 1} e^{-z}dz\). When \(s\) is a positive integer, \(s \in \{1,2,3,\ldots\}\), this constant simplifies to \(\Gamma(s) = (s - 1)!\).

The Exponential model

\[\lambda \sim \text{Exp}(r)\]

is a special case of the Gamma with shape parameter \(s = 1\), \(\text{Gamma}(1,r)\).

Trend and Variability

\[\begin{split} E(\lambda) & = \frac{s}{r} \\ \text{Mode}(\lambda) & = \frac{s - 1}{r} \;\; \text{ for } s \ge 1 \\ \text{Var}(\lambda) & = \frac{s}{r^2} \\ \end{split}\]

Gamma Model

What is \(f(\lambda)\) if \(\lambda =1\) and \(\lambda \sim \text{Gamma}(2, 5)\) ?

plot_gamma(shape = 2, rate = 5)

dgamma(1, shape = 2, rate = 5)

[1] 0.1684487

Tuning Gamma Prior

Before we collected any data, our guess was that the rate of fraud risk calls was most likely around 5 calls per day, but could also reasonably range between 2 and 7 calls per day.

In other words, \(E(\lambda) = 5\) and \(\lambda\) most likely is between 2 and 7. You can use plot_gamma() function to try out different gamma distributions.

\[E(\lambda) = \frac{s}{r} \approx 5 \; .\]

Tuning Gamma Prior

The Posterior Model

\[f(\lambda|\vec{y}) \propto f(\lambda)L(\lambda|\vec{y}) = \frac{r^s}{\Gamma(s)} \lambda^{s-1} e^{-r\lambda} \cdot \frac{\lambda^{\sum y_i}e^{-n\lambda}}{\prod y_i!} \;\;\; \text{ for } \lambda > 0.\]

\[\begin{split} f(\lambda|\vec{y}) & \propto \lambda^{s-1} e^{-r\lambda} \cdot \lambda^{\sum y_i}e^{-n\lambda} \\ & = \lambda^{s + \sum y_i - 1} e^{-(r+n)\lambda} \\ \end{split}\]

\[ \lambda|\vec{y} \; \sim \; \text{Gamma}\bigg(s + \sum y_i, r + n \bigg) \; .\]

The Gamma-Poisson model

Let \(\lambda > 0\) be an unknown rate parameter and \((Y_1,Y_2,\ldots,Y_n)\) be an independent \(\text{Pois}(\lambda)\) sample. The Gamma-Poisson Bayesian model complements the Poisson structure of data \(Y\) with a Gamma prior on \(\lambda\):

\[\begin{split} Y_i | \lambda & \stackrel{ind}{\sim} \text{Pois}(\lambda) \\ \lambda & \sim \text{Gamma}(s, r) \\ \end{split}\]

Upon observing data \(\vec{y} = (y_1,y_2,\ldots,y_n)\), the posterior model of \(\lambda\) is also a Gamma with updated parameters:

\[\begin{equation} \lambda|\vec{y} \; \sim \; \text{Gamma}(s + \sum y_i, \; r + n) \; . \end{equation}\]

The Gamma-Poisson model

plot_gamma_poisson(shape = 10, rate = 2, sum_y = 11, n = 4)

The Gamma-Poisson model

summarize_gamma_poisson(shape = 10, rate = 2, sum_y = 11, n = 4)

      model shape rate mean     mode       var        sd
1     prior    10    2  5.0 4.500000 2.5000000 1.5811388
2 posterior    21    6  3.5 3.333333 0.5833333 0.7637626