class: center, middle, inverse, title-slide # Teaching Bayesian Modeling at the Undergraduate Level ##
bit.ly/teach-bayes
Talk at University of California Berkeley, Department of Statistics ### Mine Dogucu, Ph.D. ### 2022-01-04 --- class: middle center <img src="img/headshot.jpeg" alt="A headshot of a woman with curly, short, ear-length hair with green eyes and red lipstick." style="width:165px; margin-top:20px; border: 3px solid whitesmoke; padding: 10px;"> .large[
] <a href = "http://minedogucu.com">minedogucu.com</a> .large[
] <a href = "http://github.com/mdogucu">mdogucu</a> .large[
] <a href = "http://twitter.com/MineDogucu">MineDogucu</a> --- class: middle ## Outline - Course - Lecture demo - Reflections / Questions --- class: middle center .font75[Course] --- class: center middle <img src="img/stats-115-logo.png" title="Stats 115 course logo with prior, likelihood, and posterior plots!" alt="Stats 115 course logo with prior, likelihood, and posterior plots!" width="25%" style="display: block; margin: auto;" /> [Introduction to Bayesian Data Analysis](https://www.stats115.com) --- <img src="img/bayes-courses-us.png" title="A table with list of majors that include Bayesian courses as part of the major. There are two majors that require a Bayesian course." alt="A table with list of majors that include Bayesian courses as part of the major. There are two majors that require a Bayesian course." width="68%" style="display: block; margin: auto;" /> <a href="https://arxiv.org/abs/2109.00848" style="font-size: 15px">Dogucu, M., & Hu, J. (2021). The Current State of Undergraduate Bayesian Education and Recommendations for the Future. arXiv preprint arXiv:2109.00848.</a> --- class:center <img src="img/bayes-rules-hex.png" title="a hex shaped logo with shiny green-pink disco ball and purple starry background. There is text that says Bayes Rules!" alt="a hex shaped logo with shiny green-pink disco ball and purple starry background. There is text that says Bayes Rules!" width="25%" style="display: block; margin: auto;" /> .pull-left[ <script src="https://use.fontawesome.com/releases/v5.15.1/js/all.js" data-auto-replace-svg="nest"></script> <i class="fas fa-book fa-2x" aria-hidden="true" title="Book icon"></i> [Bayes Rules! An Introduction to Applied Bayesian Modeling](https://bayesrulesbook.com) ] .pull-right[ <i class="fab fa-r-project fa-2x" aria-hidden="true" title="R logo"></i> [{bayesrules}](https://www.github.com/bayes-rules/bayesrules) ] --- class: middle ## Southern California Data Science Program HDR DSC awards: \#2123366 \#2123380 \#2123384 <img src="img/nsf-logo.png" title="NSF logo" alt="NSF logo" width="10%" style="display: block; margin: auto;" /> Through this collaborative grant between University of California Irvine, California State University Fullerton, and Cypress College, a Bayesian course will be adopted at California State University Fullerton. --- class: middle <style type="text/css"> .panelset { --panel-tab-foreground: whitesmoke; --panel-tab-active-foreground: whitesmoke; --panel-tabs-border-bottom: #00a1a1; --panel-tab-inactive-opacity: 0.5;} </style> .panelset.sideways[ .panel[.panel-name[Unit 1] ### Bayesian Foundations .pull-right[ - Bayes' Rule - <mark> The Beta-Binomial Bayesian Model </mark> - Balance and Sequentiality in Bayesian Analysis - Conjugate Families ] .pull-left[ <img src="index_files/figure-html/unnamed-chunk-7-1.png" title="three curves on a single plot with no axis labeled. It is coloring scheme indicates its similarity to the previous plot with prior, scaled likelihood and posterior" alt="three curves on a single plot with no axis labeled. It is coloring scheme indicates its similarity to the previous plot with prior, scaled likelihood and posterior" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Unit 2] ### Posterior Simulation & Analysis .pull-right[ <img src="img/unit2.png" title="A traceplot with no axis labels. Traceplots have thin vertical lines with varying lengths." alt="A traceplot with no axis labels. Traceplots have thin vertical lines with varying lengths." width="100%" style="display: block; margin: auto;" /> ] .pull-left[ - Grid Approximation - The Metropolis-Hastings Algorithm - Posterior Estimation - Posterior Hypothesis Testing - Posterior Prediction ] ] .panel[.panel-name[Unit 3] ### Regression and Classification .pull-right[ <img src="img/unit3.png" title="A scatterplot with multiple regression lines passing through points. These regression lines are not all over the place, they are clustered with similar but varyin intercepts and slopes." alt="A scatterplot with multiple regression lines passing through points. These regression lines are not all over the place, they are clustered with similar but varyin intercepts and slopes." width="100%" style="display: block; margin: auto;" /> ] .pull-left[ - Normal Regression - Poisson and Negative Binomial Regression - Logistic Regression - Naive Bayes Classification ] ] .panel[.panel-name[Unit 4] ### Hierarchical Models .pull-right[ <img src="img/unit4.png" title="a figure showing hierarchy with a rectangle on top. With a set of arrows pointing downwards leading to a set of rectangles below which also have a set of arrows pointing downwards leading to a different set of rectangles." alt="a figure showing hierarchy with a rectangle on top. With a set of arrows pointing downwards leading to a set of rectangles below which also have a set of arrows pointing downwards leading to a different set of rectangles." width="100%" style="display: block; margin: auto;" /> ] .pull-left[ - Normal hierarchical models without predictors - Normal hierarchical models with predictors - Non-Normal Hierarchical Regression & Classification ] ] ] --- class: middle ## Background of students taking the course - Prerequisite: STATS 120C. Introduction to Probability and Statistics III - Recommended: STATS 110. Statistical Methods for Data Analysis I - Students: Data Science major (required), Statistics minor (elective) --- class: middle center .font75[Lecture] --- class: middle ## Zoom chat drop - I will ask a question. - You will write the answer in the chat. - You will **not** send the answer until I say "drop". - Where are you connecting from? (e.g. Irvine, CA) --- class: middle Prior to this lecture students learn: - The big Bayesian perspective - Bayes' rule for events - Bayes' rule for random variables --- class: middle center ## Review - Big Picture <img src="img/bayes-restaurant.png" width="55%" style="display: block; margin: auto;" /> --- ## Review - Three Steps 1.Construct a __prior model__ for your variable of interest, `\(\pi\)`. 2.Upon observing data `\(Y = y\)`, define the __likelihood function__ `\(L(\pi|y)\)`. 3.Build the __posterior model__ of `\(\pi\)` via Bayes' Rule. The posterior model is constructed by balancing the prior and likelihood: `$$\text{posterior} = \frac{\text{prior} \cdot \text{likelihood}}{\text{normalizing constant}} \propto \text{prior} \cdot \text{likelihood}$$` More technically, `$$f(\pi|y) = \frac{f(\pi)L(\pi|y)}{f(y)} \propto f(\pi)L(\pi|y)$$` --- class: middle ## Goals - Utilize and tune continuous priors. You will learn how to interpret and tune a continuous Beta prior model to reflect your prior information about `\(\pi\)`. - Interpret and communicate features of prior and posterior models using properties such as mean, mode, and variance. - Construct the fundamental Beta-Binomial model for proportion `\(\pi\)`. --- class: middle ## Context In Alison Bechdel’s 1985 comic strip The Rule, a character states that they only see a movie if it satisfies the following three rules ([Bechdel 1986](https://dykestowatchoutfor.com/the-essential-dtwof/)): - the movie has to have at least two women in it; - these two women talk to each other; and - they talk about something besides a man. -- Let `\(\pi\)`, a random value between 0 and 1, denote the unknown proportion of movies that pass the Bechdel test (i.e. `\(\pi \in [0,1]\)`). --- class: middle **Continuous probability models** Let `\(\pi\)` be a continuous random variable with pdf `\(f(\pi)\)`. Then `\(f(\pi)\)` has the following properties: - `\(\int_\pi f(\pi)d\pi = 1\)`, ie. the area under `\(f(\pi)\)` is 1 - `\(f(\pi) \ge 0\)` - `\(P(a < \pi < b) = \int_a^b f(\pi) d\pi\)` when `\(a \le b\)` --- .pull-left[ ![](index_files/figure-html/ch4-bechdel-priors-1.png)<!-- --> ] -- .pull-right2[ - **Feminist** thinks that women are not represented in movies often. - **Clueless** is unsure. - **Optimist** thinks that Bechdel is a low bar for representation of women in movies and thinks almost all movies pass the test. - Can you match the plots with the personas?] --- class: middle ## Beta Prior model Let `\(\pi\)` be a random variable which can take any value between 0 and 1, ie. `\(\pi \in [0,1]\)`. Then the variability in `\(\pi\)` might be well modeled by a Beta model with __shape parameters__ `\(\alpha > 0\)` and `\(\beta > 0\)`: `$$\pi \sim \text{Beta}(\alpha, \beta)$$` The Beta model is specified by continuous pdf `\begin{equation} f(\pi) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \pi^{\alpha-1} (1-\pi)^{\beta-1} \;\; \text{ for } \pi \in [0,1] \end{equation}` where `\(\Gamma(z) = \int_0^\infty y^{z-1}e^{-y}dy\)` and `\(\Gamma(z + 1) = z \Gamma(z)\)`. Fun fact: when `\(z\)` is a positive integer, then `\(\Gamma(z)\)` simplifies to `\(\Gamma(z) = (z-1)!\)`. --- class: middle ## Plotting Beta Prior with `bayesrules` package Use the `plot_beta()` function in the `bayesrules` package to try different shape parameters. Example: ```r library(bayesrules) plot_beta(alpha = 2, beta = 10) ``` <img src="index_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ## Plotting Beta Prior <img src="index_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- class: middle In Breakout Rooms, as a group, discuss: 1. How would you describe the typical behavior of a Beta( `\(\alpha,\beta\)` ) variable `\(\pi\)` when `\(\alpha = \beta\)`? a) Right-skewed with `\(\pi\)` tending to be less than 0.5. b) Symmetric with `\(\pi\)` tending to be around 0.5. c) Left-skewed with `\(\pi\)` tending to be greater than 0.5. 2. Using the same options as above, how would you describe the typical behavior of a Beta( `\(\alpha,\beta\)` ) variable `\(\pi\)` when `\(\alpha > \beta\)`? 3. For which model is there greater variability in the plausible values of `\(\pi\)`, Beta(20,20) or Beta(5,5)? --- class: middle By yourself Tune your own Beta prior. What Beta prior reflects your current belief about `\(\pi\)` - the proportion of movies that pass the Bechdel test? Students would normally use R and the functions in the `bayesrules` package. To save time and avoid installation issues we will use this [Shiny app](https://mdogucu.shinyapps.io/berkeley-bayes-ed-shiny/) for now. --- class: middle ## Beta Descriptive Measurements `$$E(\pi) = \frac{\alpha}{\alpha + \beta}$$` $$\text{Mode}(\pi) = \frac{\alpha - 1}{\alpha + \beta - 2} \text{ when } \alpha, \beta > 1 $$ `$$\text{Var}(\pi) = \frac{\alpha \beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}$$` --- class: middle ## Beta Descriptives with `bayesrules` package Use the `summarize_beta()` function in the `bayesrules` package to find the mean, mode, and variance of various Beta distributions. Example: ```r summarize_beta(alpha = 2, beta = 10) ``` ``` mean mode var sd 1 0.1666667 0.1 0.01068376 0.1033623 ``` --- ```r plot_beta(alpha = 2, beta = 10, mean = TRUE, mode = TRUE) ``` <img src="index_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- class: middle Check the mean, mode, and variance of the Beta model you picked as your prior. Do these values reflect your prior belief? If not, tune your prior again and finalize your prior. Get it ready for chat drop (e.g. Beta(2, 10)). --- ## The Binomial data model and likelihood function Let random variable `\(Y\)` be the _number of successes_ in `\(n\)` _trials_. Assume that the number of trials is _fixed_, the trials are _independent_, and the _probability of success_ in each trial is `\(\pi\)`. Then the _dependence_ of `\(Y\)` on `\(\pi\)` can be modeled by the Binomial model with __parameters__ `\(n\)` and `\(\pi\)`. In mathematical notation: $$Y | \pi \sim \text{Bin}(n,\pi) $$ then, the Binomial model is specified by a conditional pmf: `$$f(y|\pi) = {n \choose y} \pi^y (1-\pi)^{n-y} \;\; \text{ for } y \in \{0,1,2,\ldots,n\}$$` --- class: middle `\(n = 20\)` movies, `\(y = ?\)`, `\(\pi = ?\)` <img src="index_files/figure-html/binoms-3-1.png" style="display: block; margin: auto;" /> --- ```r data(bechdel, package = "bayesrules") # Take a sample of 20 movies set.seed(84735) bechdel_20 <- bechdel %>% dplyr::sample_n(20) bechdel_20 %>% janitor::tabyl(binary) %>% janitor::adorn_totals("row") ``` ``` binary n percent FAIL 11 0.55 PASS 9 0.45 Total 20 1.00 ``` --- class: middle <img src="index_files/figure-html/binoms-32-1.png" style="display: block; margin: auto;" /> --- class: middle <img src="index_files/figure-html/likelihood-election-ch3-1.png" style="display: block; margin: auto;" /> --- class: middle <img src="index_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- ## Posterior for the Beta-Binomial Model Let `\(\pi \sim \text{Beta}(\alpha, \beta)\)` and `\(Y|n \sim \text{Bin}(n,\pi)\)`. -- `\(f(\pi|y) \propto \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\pi^{\alpha-1} (1-\pi)^{\beta-1} {n \choose y}\pi^y(1-\pi)^{n-y}\)` -- `\(f(\pi|y) \propto \pi^{\alpha-1} (1-\pi)^{\beta-1} \pi^y(1-\pi)^{n-y}\)` -- `\(f(\pi|y) \propto \pi^{(\alpha+y)-1} (1-\pi)^{(\beta+n-y)-1}\)` -- `\(\pi|y \sim \text{Beta}(\alpha +y, \beta+n-y)\)` -- `\(f(\pi|y) = \frac{\Gamma(\alpha+\beta+n)}{\Gamma(\alpha+y)\Gamma(\beta+n-y)} \pi^{(\alpha+y)-1} (1-\pi)^{(\beta+n-y)-1}\)` --- class: middle ```r plot_beta_binomial(alpha = 2, beta = 10, y = 9, n = 20) ``` .pull-left[ <img src="index_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ Prior: `\(\pi \sim \text{Beta(2, 10)}\)` Data: `\(y = 9, n = 20\)` Posterior: `\(\pi|y \sim \text{Beta}(\alpha +y, \beta+n-y)\)` `\(\pi|y \sim \text{Beta(2 + 9, 10 + (20-9))}\)` `\(\pi|y \sim \text{Beta(11, 21)}\)` ] --- class: middle ```r summarize_beta_binomial(alpha = 2, beta = 10, y = 9, n = 20) ``` ``` model alpha beta mean mode var sd 1 prior 2 10 0.1666667 0.1000000 0.010683761 0.10336228 2 posterior 11 21 0.3437500 0.3333333 0.006835938 0.08267973 ``` --- class: middle ## Posterior Descriptives `\(\pi|(Y=y) \sim \text{Beta}(\alpha+y, \beta+n-y)\)` `$$E(\pi | (Y=y)) = \frac{\alpha + y}{\alpha + \beta + n}$$` `$$\text{Mode}(\pi | (Y=y)) = \frac{\alpha + y - 1}{\alpha + \beta + n - 2} \text{ when } \alpha, \beta > 1$$` `$$\text{Var}(\pi | (Y=y)) = \frac{(\alpha + y)(\beta + n - y)}{(\alpha + \beta + n)^2(\alpha + \beta + n + 1)}\\$$` --- class: middle ## Conjugate prior We say that `\(f(\pi)\)` is a **conjugate prior** for `\(L(\pi|y)\)` if the posterior, `\(f(\pi|y) \propto f(\pi)L(\pi|y)\)`, is from the same model family as the prior. Thus, Beta model is a conjugate prior for the corresponding Binomial data model. --- class: middle In Breakout rooms: - Plot and summarize your own posterior using the [Shiny app](https://mdogucu.shinyapps.io/berkeley-bayes-ed-shiny). - Take turns to discuss for your own analysis how the descriptives have been updated after observing the data. - Be ready to chat drop your posterior own posterior model (e.g. Beta(11,21)) --- class: center middle .font75[Reflections / Questions]