Maximum Likelihood Estimation

Dr. Mine Dogucu

class: middle

Let \(X\sim \text{Bernoulli}(\pi)\)

If we flip a fair coin, what is the probability that we get a head (success)?

Parameter \(\pi\) is known, \(\pi = 0.5\)

We would like to know what \(P(X = 1)\) is.

Likelihood

If we observe a head what is the likelihood that \(\pi\) is 0.5?

We know that \(x = 1\) and we would like to know \(L(\pi = 0.5 | x = 1)\)

Probability

Let \(X \sim \text{Binomial}(n = 2, \pi = 0.3)\)

Let X be the number of spam emails received among two emails with a 0.3 probability of spam email.

What is the probability that one of the two emails received is spam?

\(P(X = 1)\) =

dbinom(x = 1, size = 2, prob = 0.3)
[1] 0.42

Likelihood

Let \(X \sim \text{Binomial}(n, \pi)\)

There were 5 emails received and 3 of them turned out to be spam. What is the likelihood that \(\pi\) is 0.1 (i.e. \(L(\pi = 0.1 | x = 3)\)?

dbinom(x = 3, size = 5, prob = 0.1)
[1] 0.0081

What about \(L(\pi = 0.8 | x = 3)\)?

dbinom(x = 3, size = 5, prob = 0.8)
[1] 0.2048

What about \(L(\pi = 0.5 | x = 3)\)?

dbinom(x = 3, size = 5, prob = 0.5)
[1] 0.3125

\(\pi\) L(\(\pi\) | x = 3)
0 0
0.1 0.0081
0.2 0.0512
0.3 0.1323
0.4 0.2304
0.5 0.3125
0.6 0.3456
0.7 0.3087
0.8 0.2048
0.9 0.0729
1 0


Likelihood

  • Likelihood is a function.

  • We call 0.6 the maximum likelihood estimate (MLE) of \(\pi\) where the function reaches its maximum value.

  • A maximum likelihood estimate, in this case, is our best estimate of the unknown parameter \(\pi\).

Math Review

Review

\(\ln(ab) = \ln(a) + \ln(b)\)

\(\ln(a^b) = b\ln(a)\)

\(\frac{d}{dx}x^n = nx^{n-1}\)

\(\frac{d}{dx}e^x = e^x\)

\(\frac{d}{dx}\ln x = \frac{1}{x}\)

\(\frac{d}{dx}\ln (1-x) = -\frac{1}{(1-x)}\)

MLE

Deriving MLE for \(\pi\)

Let \(X \sim \text{Bernoulli(}\pi)\) and \(x_1, x_2,...x_n\) following this distribution.

. . . e.g. 1, 1, 0, 0 ,1

\(x_1 = 1, x_2 = 1, x_3 = 0, x_4 = 0, x_5 = 1\)

\(x_1 = 1, x_2 = 1, x_3 = 0, x_4 = 0, x_5 = 1\) and \(n = 5\)

\(L(\pi) = \pi^{x_1}(1-\pi)^{1-x_1} \pi^{x_2}(1-\pi)^{1-x_2}.... \pi^{x_n}(1-\pi)^{1-x_n}\)

\(L(\pi) = \prod_{i=1}^{n} \pi^{x_i}(1-\pi)^{1-x_i}\)

\(L(\pi) = \pi^{\sum_{i=1}^nx_i}(1-\pi)^{\sum_{i=1}^n1-x_i}\)

Steps

We want to find the maximum value of \(\pi\).

  1. Find \(L(\pi)\)

  2. Take the first derivative of the likelihood with respect to the parameter, in this case \(\pi\).

  3. Set the first derivative equal to 0 and solve for .

  4. Check whether the second derivative of the likelihood is negative.

Steps

We want to find the maximum value of \(\pi\).

  1. Find \(L(\pi)\)

  2. Find ln L()

  3. Take the first derivative of the likelihood with respect to the parameter, in this case \(\pi\).

  4. Set the first derivative equal to 0 and solve for .

  5. Check whether the second derivative of the likelihood is negative.

Step 2: Find ln L()

\(\ell(\pi = \ln L(\pi) = \ln[\pi^{\sum_{i=1}^nx_i}(1-\pi)^{\sum_{i=1}^n1-x_i}]\)

\(\ell(\pi)= {\sum_{i=1}^nx_i}\ln(\pi) + n-{\sum_{i=1}^nx_i}\ln(1-\pi)\)

Step 3: Take the first derivative with respect \(p\)

\(\frac{d}{d\pi}\ell(\pi) = \frac{d}{d\pi}[{\sum_{i=1}^nx_i}\ln(\pi) + n-{\sum_{i=1}^nx_i}\ln(1-\pi)]\)

\(\frac{d}{d\pi}\ell(\pi) = \frac{\sum_{i=1}^nx_i}{\pi} + \frac{n-{\sum_{i=1}^nx_i}}{1-\pi} (-1)\)

\(\frac{d}{d\pi}\ell(\pi) = \frac{\sum_{i=1}^nx_i}{\pi} - \frac{n-{\sum_{i=1}^nx_i}}{1-\pi}\)

Step 4: Set the first derivative equal to 0 and solve for p

\(\frac{\sum_{i=1}^nx_i}{\pi} - \frac{n-{\sum_{i=1}^nx_i}}{1-\pi} = 0\)

\(\frac{\sum_{i=1}^nx_i}{\pi} = \frac{n-{\sum_{i=1}^nx_i}}{1-\pi}\)

\(\sum_{i=1}^nx_i -\pi \sum_{i=1}^nx_i= \pi n-\pi\sum_{i=1}^nx_i\)

\(\sum_{i=1}^nx_i= \pi n\)

\(\hat \pi = \frac{\sum_{i=1}^nx_i}{n}\)

Step 5: Check that the second derivative is always negative

\(\frac{d^2}{d^2\pi}\ell(\pi) = \frac{d}{d\pi}[\frac{\sum_{i=1}^nx_i}{\pi} - \frac{n-{\sum_{i=1}^nx_i}}{1-\pi}]\)

\(\frac{d^2}{d^2\pi}\ell(\pi) = -\frac{\sum_{i=1}^nx_i}{\pi^2} - \frac{n-{\sum_{i=1}^nx_i}}{(1-\pi)^2}\)

\(\sum_{i=1}^nx_i \geq 0\)

. . . and \(n-\sum_{i=1}^nx_i \geq 0\)

. . . and \(n \geq\sum_{i=1}^nx_i\)

\(\pi^2 >0\)

. . . and \((1-\pi)^2 > 0\)

\(-\frac{\sum_{i=1}^nx_i}{\pi^2} - \frac{n-{\sum_{i=1}^nx_i}}{(1-\pi)^2} <0\)

Conclusion

\(\hat \pi = \frac{\sum_{i=1}^nx_i}{n}\) is in fact the maximum likelihood estimator of \(\pi\).

Let’s use this estimator!

Let X represent a random variable with a binomial distribution. We have observed 3 successes in 5 trials. What is the maximum likelihood of \(\pi\)?

5 Bernoulli trials so \(n = 5\)

3 successes so \(\sum_{i=1}^nx_i = 3\)

\(\hat \pi = \frac{\sum_{i=1}^nx_i}{n} = \frac{3}{5} = 0.6\)

is most likely \(\frac{3}{5} = 0.6\)

Let X represent number of failures that follows a Geometric distribution. We have observed the first success at the 10th trial. What is the maximum likelihood of \(\pi\)?

\(n = 10\)

Geometric distribution always has failures first so \(x_1 = 0, x_2, = 0....x_8 =0, x_9 = 0\) but \(x_{10}=1\).

\(\sum_{i=1}^nx_i =1\)

\(\hat \pi = \frac{\sum_{i=1}^nx_i}{n} = \frac{1}{10} = 0.1\)