Teaching Bayesian Modeling at the Undergraduate Level

# Teaching Bayesian Modeling at the Undergraduate Level
## <a href="https://bit.ly/teach-bayes">bit.ly/teach-bayes</a> <br> <br>Talk at University of California Berkeley, Department of Statistics
### Mine Dogucu, Ph.D.
### 2022-01-04

---

<img src="img/headshot.jpeg"
        alt="A headshot of a woman with curly, short, ear-length hair with green eyes and red lipstick."
        style="width:165px; margin-top:20px; border: 3px solid whitesmoke; padding: 10px;">
        
.large[<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"/></svg>] <a href = "http://minedogucu.com">minedogucu.com</a>  
.large[<svg aria-hidden="true" role="img" viewBox="0 0 496 512" style="height:1em;width:0.97em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg>] <a href = "http://github.com/mdogucu">mdogucu</a>   
.large[<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg>] <a href = "http://twitter.com/MineDogucu">MineDogucu</a>

---

## Outline

- Course
- Lecture demo
- Reflections / Questions

---

[Introduction to Bayesian Data Analysis](https://www.stats115.com)

---

<a href="https://arxiv.org/abs/2109.00848" style="font-size: 15px">Dogucu, M., & Hu, J. (2021). The Current State of Undergraduate Bayesian Education and Recommendations for the Future. arXiv preprint arXiv:2109.00848.</a>

---

class:center

.pull-left[
<script src="https://use.fontawesome.com/releases/v5.15.1/js/all.js" data-auto-replace-svg="nest"></script>

[Bayes Rules! An Introduction to Applied Bayesian Modeling](https://bayesrulesbook.com)

]

[{bayesrules}](https://www.github.com/bayes-rules/bayesrules)
]

---

## Southern California Data Science Program

HDR DSC awards: \#2123366 \#2123380 \#2123384

Through this collaborative grant between University of California Irvine, California State University Fullerton, and Cypress College, a Bayesian course will be adopted at California State University Fullerton.

---

.panelset.sideways[
.panel[.panel-name[Unit 1]

### Bayesian Foundations

- Bayes' Rule
- <mark> The Beta-Binomial Bayesian Model </mark>
- Balance and Sequentiality in Bayesian Analysis
- Conjugate Families

]

<img src="index_files/figure-html/unnamed-chunk-7-1.png" title="three curves on a single plot with no axis labeled. It is coloring scheme indicates its similarity to the previous plot with prior, scaled likelihood and posterior" alt="three curves on a single plot with no axis labeled. It is coloring scheme indicates its similarity to the previous plot with prior, scaled likelihood and posterior" style="display: block; margin: auto;" />
]

]

### Posterior Simulation & Analysis

]

- Grid Approximation

- The Metropolis-Hastings Algorithm

- Posterior Estimation

- Posterior Hypothesis Testing

- Posterior Prediction

]

### Regression and Classification

]

- Normal Regression
- Poisson and Negative Binomial Regression
- Logistic Regression
- Naive Bayes Classification

]
]

### Hierarchical Models

]

- Normal hierarchical models without predictors
- Normal hierarchical models with predictors
- Non-Normal Hierarchical Regression & Classification

]

---

## Background of students taking the course

- Prerequisite: STATS 120C. Introduction to Probability and Statistics III
- Recommended: STATS 110. Statistical Methods for Data Analysis I
- Students: Data Science major (required), Statistics minor (elective)

---

---
class: middle

## Zoom chat drop

- I will ask a question.
- You will write the answer in the chat. 
- You will **not** send the answer until I say "drop".
- Where are you connecting from? (e.g. Irvine, CA)
---

Prior to this lecture students learn:

- The big Bayesian perspective
- Bayes' rule for events
- Bayes' rule for random variables

---

## Review - Big Picture

---

## Review - Three Steps

1.Construct a __prior model__ for your variable of interest, `$\pi$`.

2.Upon observing data `$Y = y$`, define the __likelihood function__ `$L(\pi|y)$`.

3.Build the __posterior model__ of `$\pi$` via Bayes' Rule.    
    The posterior model is constructed by balancing the prior and likelihood:

`$$\text{posterior} = \frac{\text{prior} \cdot \text{likelihood}}{\text{normalizing constant}} \propto \text{prior} \cdot \text{likelihood}$$`
More technically,
    
`$$f(\pi|y) = \frac{f(\pi)L(\pi|y)}{f(y)} \propto f(\pi)L(\pi|y)$$`

---

## Goals

- Utilize and tune continuous priors. You will learn how to interpret and tune a continuous Beta prior model to reflect your prior information about `$\pi$`.
- Interpret and communicate features of prior and posterior models using properties such as mean, mode, and variance.
- Construct the fundamental Beta-Binomial model for proportion `$\pi$`.

---

## Context

In Alison Bechdel’s 1985 comic strip The Rule, a character states that they only see a movie if it satisfies the following three rules ([Bechdel 1986](https://dykestowatchoutfor.com/the-essential-dtwof/)):

- the movie has to have at least two women in it;
- these two women talk to each other; and
- they talk about something besides a man.

Let `$\pi$`, a random value between 0 and 1, denote the unknown proportion of  movies that pass the Bechdel test (i.e. `$\pi \in [0,1]$`).

---

**Continuous probability models**    
 
Let `$\pi$` be a continuous random variable with pdf `$f(\pi)$`.
Then `$f(\pi)$` has the following properties:

- `$\int_\pi f(\pi)d\pi = 1$`, ie. the area under `$f(\pi)$` is 1
- `$f(\pi) \ge 0$`
- `$P(a < \pi < b) = \int_a^b f(\pi) d\pi$` when `$a \le b$`

---

]

.pull-right2[
- **Feminist** thinks that women are not represented in movies often. 
- **Clueless** is unsure. 
- **Optimist** thinks that Bechdel is a low bar for representation of women in movies and thinks almost all movies pass the test.
- Can you match the plots with the personas?]

---

## Beta Prior model

Let `$\pi$` be a random variable which can take any value between 0 and 1, ie. `$\pi \in [0,1]$`.
Then the variability in `$\pi$` might be well modeled by a Beta model with __shape parameters__ `$\alpha > 0$` and `$\beta > 0$`:

`$$\pi \sim \text{Beta}(\alpha, \beta)$$`
The Beta model is specified by continuous pdf
`\begin{equation}
f(\pi) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \pi^{\alpha-1} (1-\pi)^{\beta-1} \;\; \text{ for } \pi \in [0,1] 
\end{equation}`
 where `$\Gamma(z) = \int_0^\infty y^{z-1}e^{-y}dy$` and `$\Gamma(z + 1) = z \Gamma(z)$`.  Fun fact: when `$z$` is a positive integer, then `$\Gamma(z)$` simplifies to `$\Gamma(z) = (z-1)!$`.

---

## Plotting Beta Prior with `bayesrules` package

Use the `plot_beta()` function in the `bayesrules` package to try different shape parameters. Example:

```r
library(bayesrules)
plot_beta(alpha = 2, beta = 10) 
```

---

## Plotting Beta Prior

---

In Breakout Rooms, as a group, discuss:

1. How would you describe the typical behavior of a Beta( `$\alpha,\beta$` ) variable `$\pi$` when `$\alpha = \beta$`?    
  a) Right-skewed with `$\pi$` tending to be less than 0.5.  
  b) Symmetric with `$\pi$` tending to be around 0.5.  
  c) Left-skewed with `$\pi$` tending to be greater than 0.5.

2. Using the same options as above, how would you describe the typical behavior of a Beta( `$\alpha,\beta$` ) variable `$\pi$` when `$\alpha > \beta$`?

3. For which model is there greater variability in the plausible values of `$\pi$`, Beta(20,20) or Beta(5,5)?

---

By yourself

Tune your own Beta prior. What Beta prior reflects your current belief about `$\pi$` - the proportion of movies that pass the Bechdel test?

Students would normally use R and the functions in the `bayesrules` package. To save time and avoid installation issues we will use this [Shiny app](https://mdogucu.shinyapps.io/berkeley-bayes-ed-shiny/) for now.

---

## Beta Descriptive Measurements

`$$E(\pi) = \frac{\alpha}{\alpha + \beta}$$`

$$\text{Mode}(\pi) = \frac{\alpha - 1}{\alpha + \beta - 2} \text{  when } \alpha, \beta > 1 $$

`$$\text{Var}(\pi) = \frac{\alpha \beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}$$`
---

## Beta Descriptives with `bayesrules` package

Use the `summarize_beta()` function in the `bayesrules` package to find the mean, mode, and variance of various Beta distributions. Example:

```r
summarize_beta(alpha = 2, beta = 10)
```

```
       mean mode        var        sd
1 0.1666667  0.1 0.01068376 0.1033623
```

---

```r
plot_beta(alpha = 2, beta = 10, mean = TRUE, mode = TRUE)
```

---

Check the mean, mode, and variance of the Beta model you picked as your prior. Do these values reflect your prior belief? If not, tune your prior again and finalize your prior. Get it ready for chat drop (e.g. Beta(2, 10)).

---
## The Binomial data model and likelihood function

Let random variable `$Y$` be the _number of successes_ in `$n$` _trials_.  Assume that the number of trials is _fixed_, the trials are _independent_, and the _probability of success_  in each trial is `$\pi$`.  Then the _dependence_ of `$Y$` on `$\pi$` can be modeled by the Binomial model with __parameters__ `$n$` and `$\pi$`.  In mathematical notation:
 
$$Y | \pi \sim \text{Bin}(n,\pi) $$

then, the Binomial model is specified by a conditional pmf:

`$$f(y|\pi) = {n \choose y} \pi^y (1-\pi)^{n-y} \;\; \text{ for } y \in \{0,1,2,\ldots,n\}$$`

---

`$n = 20$` movies, `$y = ?$`, `$\pi = ?$`

---

```r
data(bechdel, package = "bayesrules")

# Take a sample of 20 movies
set.seed(84735)
bechdel_20 <- bechdel %>% 
  dplyr::sample_n(20)

bechdel_20 %>% 
  janitor::tabyl(binary) %>% 
  janitor::adorn_totals("row")
```

```
 binary  n percent
   FAIL 11    0.55
   PASS  9    0.45
  Total 20    1.00
```

---
class: middle

---

---

---

## Posterior for the Beta-Binomial Model

Let `$\pi \sim \text{Beta}(\alpha, \beta)$` and `$Y|n \sim \text{Bin}(n,\pi)$`.

`$f(\pi|y) \propto \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\pi^{\alpha-1} (1-\pi)^{\beta-1} {n \choose y}\pi^y(1-\pi)^{n-y}$`

`$f(\pi|y) \propto \pi^{\alpha-1} (1-\pi)^{\beta-1} \pi^y(1-\pi)^{n-y}$`

`$f(\pi|y) \propto \pi^{(\alpha+y)-1} (1-\pi)^{(\beta+n-y)-1}$`

`$\pi|y \sim \text{Beta}(\alpha +y, \beta+n-y)$`

`$f(\pi|y) = \frac{\Gamma(\alpha+\beta+n)}{\Gamma(\alpha+y)\Gamma(\beta+n-y)} \pi^{(\alpha+y)-1} (1-\pi)^{(\beta+n-y)-1}$`

---

```r
plot_beta_binomial(alpha = 2, beta = 10, y = 9, n = 20)
```

<img src="index_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" />
]

Prior: `$\pi \sim \text{Beta(2, 10)}$`   
Data: `$y = 9, n = 20$`  
Posterior: `$\pi|y \sim \text{Beta}(\alpha +y, \beta+n-y)$`
`$\pi|y \sim \text{Beta(2 + 9, 10 + (20-9))}$`  
`$\pi|y \sim \text{Beta(11, 21)}$`

]

---

```r
summarize_beta_binomial(alpha = 2, beta = 10, y = 9, n = 20)
```

```
      model alpha beta      mean      mode         var         sd
1     prior     2   10 0.1666667 0.1000000 0.010683761 0.10336228
2 posterior    11   21 0.3437500 0.3333333 0.006835938 0.08267973
```

---

## Posterior Descriptives

`$\pi|(Y=y) \sim \text{Beta}(\alpha+y, \beta+n-y)$`

`$$E(\pi | (Y=y)) = \frac{\alpha + y}{\alpha + \beta + n}$$` 
`$$\text{Mode}(\pi | (Y=y))  = \frac{\alpha + y - 1}{\alpha + \beta + n - 2} \text{  when } \alpha, \beta > 1$$`
`$$\text{Var}(\pi | (Y=y))   = \frac{(\alpha + y)(\beta + n - y)}{(\alpha + \beta + n)^2(\alpha + \beta + n + 1)}\\$$`
---

## Conjugate prior

We say that `$f(\pi)$` is a **conjugate prior** for `$L(\pi|y)$` if the posterior, `$f(\pi|y) \propto f(\pi)L(\pi|y)$`, is from the same model family as the prior.

Thus, Beta model is a conjugate prior for the corresponding Binomial data model.

---

In Breakout rooms:

- Plot and summarize your own posterior using the [Shiny app](https://mdogucu.shinyapps.io/berkeley-bayes-ed-shiny). 
- Take turns to discuss for your own analysis how the descriptives have been updated after observing the data. 
- Be ready to chat drop your posterior own posterior model (e.g. Beta(11,21))

---