Teaching and Learning Bayesian Statistics with {bayesrules}

Talk at the Department of Statistics, University of Auckland

Mine Dogucu, PhD
University of California Irvine

2024-12-11

About Me

  • Faculty member in the Department of Statistics at University of California Irvine

  • UC Irvine campus is located on the homelands of the Acjachemen and Tongva peoples

About You

???

About Bayesian Methods

Frequentist

Bayesian

An Overview of Undegraduate Bayesian Education

Dogucu, M. & Hu, J. (2022) The Current State of Undergraduate Bayesian Education and Recommendations for the Future. The American Statistician, 74(2), 405-413.

Majors

Prerequisites

Why Teach Bayesian Methods?

  1. Intuition

Hypothesis Testing

Suppose that during a recent doctor’s visit, you tested positive for a very rare disease.

\(H_0\): no disease
\(H_A\): disease

If you only get to ask the doctor one question, which would it be?

  1. What’s the chance that I actually have the disease?
  2. If in fact I don’t have the disease, what’s the chance that I would’ve gotten this positive test result?

Confidence Intervals

Why Teach Bayesian Methods?

2 – Perfect blend of statistics and computing

Metropolis-Hastings Algorithm

Why Teach Bayesian Methods?

3 – Bayesian methods are becoming more common

Andrews, 2024

Bayes Rules!

A headshot of a woman with long blonde hair wearing a brownish yellow tshirt and a red and pink floral silk scarf wrapped around her neck.

Alicia Johnson
Macalester College

A headshot of a man with short dark hair, and a dark moustache. He is wearing a blue button up shirt and dark gray jacket

Miles Ott
Tubi

a hex shaped logo with shiny green-pink disco ball and purple starry background. There is text that says Bayes Rules!

A quick example

Let \(\pi\) be the proportion of spam emails where \(\pi \in [0, 1]\).

What do you think \(\pi\) is? How certain are you?

Prior Model

plot_beta(alpha = 4, beta = 4)

X axis reads pi with values from 0 to 1. Y axis reads f of pi). The curve of the graph has a high peak on the y-axis when pi equals to 0.5. The distribution is symmentric.

plot_beta(alpha = 1, beta = 10)

X axis reads pi with values from 0 to 1. Y axis reads f of pi). The curve of the graph has a high peak on the y-axis when pi equals to 0. The curve is a concave one and y is decreasing as pi is increasing.

Binomial Likelihood

plot_binomial_likelihood(y = 3, n = 10)
X axis reads pi with values from 0 to 1. Y axis reads l of pi given capital Y = y). The curve of the graph has a high peak on the y-axis when pi equals to 0.3. The y-values are almost zero when pi is greater than 0.75.

Posterior Model

plot_beta_binomial(alpha = 1, beta = 10, y = 3, n = 10)
X axis reads pi with values from 0 to 1. Y axis reads density. Three curves are shown labeled as prior, (scaled) likelihood, and posterior. The prior curve of the graph has a high peak on the y-axis when pi equals to 0. The curve is a concave one and y is decreasing as pi is increasing. The likelihood curve has a high peak on the y-axis when pi equals to 0.3. The y-values are almost zero when pi is greater than 0.75. The posterior sits between the prior and likelihood curves.

Target Audience of the Book

  • Advanced Undergraduate Students in Statistics / Data Science Programs

  • Equally trained learners

  • Prior course/training in statistics is required

  • Familiarity with probability, calculus, and tidyverse is recommended.

Bayes’ Rule

The Beta-Binomial Bayesian Model

Balance and Sequentiality in Bayesian Analysis

Conjugate Families

three curves on a single plot with no axis labeled. It is coloring scheme indicates its similarity to the previous plot with prior, scaled likelihood and posterior

A traceplot with no axis labels. Traceplots have thin vertical lines with varying lengths.

Grid Approximation

The Metropolis-Hastings Algorithm

Posterior Estimation

Posterior Hypothesis Testing

Posterior Prediction

A scatterplot with multiple regression lines passing through points. These regression lines are not all over the place, they are clustered with similar but varyin intercepts and slopes.

Normal Regression

Poisson and Negative Binomial Regression

Logistic Regression

Naive Bayes Classification

a figure showing hierarchy with a rectangle on top. With a set of arrows pointing downwards leading to a set of rectangles below which also have a set of arrows pointing downwards leading to a different set of rectangles.

Normal hierarchical models without predictors

Normal hierarchical models with predictors

Non-Normal Hierarchical Regression & Classification

Pedagogical Approach

Checking Intuition

There are two ellipses at the top of the image. The first ellipse reads 'Prior: Only 40% of articles are fake'. The second ellipse reads 'Data: Exclamation points are more common among fake news'. There are two arrows each from the upper two ellipses leading to a third ellipse in the lower part of the image. The third ellipse reads 'Posterior: Is the article fake or not?'

Active learning with quizzes

Hands-on programming

Computing and Math Together

Compute for a Single Case then use built in functions

Accessibility and Inclusion

Accessibility and Inclusion Criteria Questions
Accessibility Is the cost affordable for learners from diverse socioeconomic backgrounds?
Are plots distinguishable to color blind learners?
Is alt text provided for images?

Accessibility and Inclusion Criteria Questions
Inclusivity of scholars Do the cited scholars represent diversity across identities, experiences, and expertise?
Are scholars cited using the correct names and pronouns?

Accessibility and Inclusion Criteria Questions
Inclusivity of students Do examples avoid the necessity of specialized knowledge?
Do names and pronouns reflect diverse cultural and personal identities?
Are there examples that could potentially speak to younger as well as older students?
Does the delivery embrace mistakes and critical thinking?
Are efforts made to accommodate different academic experiences and create a shared foundation?

Dogucu, M., Johnson, A. A., & Ott, M. (2023). Framework for Accessible and Inclusive Teaching Materials for Statistics and Data Science Courses. Journal of Statistics and Data Science Education, 31(2), 144–150. https://doi.org/10.1080/26939169.2023.2165988

R packages

devtools::install_github("bayes-rules/bayesrules")

plot_beta(alpha = 3, beta = 8)

X axis reads pi with values from 0 to 1. Y axis reads f of pi). The curve of the graph has a high peak on the y-axis when pi equals to 0.25. The y-values are almost zero when pi is greater than 0.70.

plot_beta(alpha = 10, beta = 2)

X axis reads pi with values from 0 to 1. Y axis reads f of pi). The curve of the graph has a high peak on the y-axis when pi equals to 0.9. The y-values are almost zero when pi is less than 0.50.

plot_beta_binomial(alpha = 3, beta = 8, y = 19, n = 20)
X axis reads pi with values from 0 to 1. Y axis reads density. Three curves are shown labeled as prior, (scaled) likelihood, and posterior. The prior curve of the graph has a high peak on the y-axis when pi equals to 0.25 and y values are close to zero when pi is greater than 0.7. The likelihood curve has a high peak on the y-axis when pi equals to 0.95 and is quite peaked with low variance. The posterior sits between the prior and likelihood curves.

Plotting Functions

plot_beta()
plot_binomial_likelihood()
plot_beta_binomial

plot_gamma()
plot_poisson_likelihood()
plot_gamma_poisson()

plot_normal()
plot_normal_likelihood()
plot_normal_normal()

Summary Functions

summarize_beta() summarize_beta_binomial()


summarize_gamma() summarize_gamma_poisson()


summarize_normal_normal()

Model Evaluation Functions

Functions Response Model Type
prediction_summary()
prediction_summary_cv()
Quantitative rstanreg
classification_summary() classification_summary_cv() Binary rstanreg
naive_classification_summary() naive_classification_summary_cv() Categorical naiveBayes

Prediction Summary

prediction_summary(model, data, 
                   prob_inner = 0.6, 
                   prob_outer = 0.80)
       mae mae_scaled within_60 within_80
1 3.939371  0.6974147      0.65       0.9
prediction_summary_cv(model = model, 
                      data = data, 
                      k = 2, #<<
                      prob_inner = 0.6, 
                      prob_outer = 0.80)
$folds
  fold      mae mae_scaled within_60 within_80
1    1 4.640325  0.7003211       0.5       0.9
2    2 3.934895  0.5546164       0.6       0.9

$cv
      mae mae_scaled within_60 within_80
1 4.28761  0.6274688      0.55       0.9

library(rstan)

# STEP 1: DEFINE the model
stan_bike_model <- "
  data {
    int<lower=0> n;
    vector[n] Y;
    vector[n] X;
  }
  parameters {
    real beta0;
    real beta1;
    real<lower=0> sigma;
  }
  model {
    Y ~ normal(beta0 + beta1 * X, sigma);
  }
"
# STEP 2: SIMULATE the posterior
stan_bike_sim <- 
  stan(model_code = stan_bike_model, 
  data = list(n = nrow(bikes), 
              Y = bikes$rides, X = bikes$temp_feel), 
  chains = 4, iter = 5000*2, seed = 84735)

library(rstanarm)

normal_model_sim <- stan_glm(rides ~ temp_feel, 
                             data = bikes, 
                             family = gaussian, 
                             chains = 4, iter = 5000*2,
                             seed = 84735)

library(bayesplot)

mcmc_trace(normal_model_sim, size = 0.1)

mcmc_dens_overlay(normal_model_sim)

Resources

QUESTIONS?

minedogucu.com

mdogucu

minedogucu.com

mastodon.social/@MineDogucu

minedogucu