Posterior Prediction

class: title-slide

<br>
<br>
.right-panel[

# Posterior Prediction
## Dr. Mine Dogucu
Examples from [bayesrulesbook.com](https://bayesrulesbook.com)

]

---

class: middle

---

```r
head(normal_model_df, 1)
```

```
##   (Intercept) temp_feel    sigma
## 1   -2339.928  84.37713 1289.935
```

```r
`(Intercept)` <- round(head(normal_model_df, 1)$`(Intercept)`)
temp_feel     <- round(head(normal_model_df, 1)$temp_feel, 2)
sigma         <- round(head(normal_model_df, 1)$sigma)
```

---

```r
`(Intercept)`
```

```
## [1] -2340
```

```r
temp_feel
```

```
## [1] 84.38
```

```r
sigma
```

```
## [1] 1290
```

---

Under this particular scenario, `$\left(\beta_0^{(1)}, \beta_1^{(1)}, \sigma^{(1)}\right) = (-2340, 84.38, 1290)$`, thus the regression trend is defined by

`$$\beta_0^{(1)} + \beta_1^{(1)} X = -2340 + 84.38X \; .$$`

```r
y_trend <- `(Intercept)` + temp_feel*75
y_trend
```

```
## [1] 3988.5
```

---

To capture the __sampling variability__ around this plausible trend, ie. the fact that not all 75 degree days have this exact ridership, we can simulate our first official prediction `$Y_{\text{new}}^{(1)}$` by taking a random draw from the Normal likelihood specified by this first parameter set:

`$$Y_{\text{new}}^{(1)} | \beta_0, \beta_1, \sigma  \; \sim \; N\left(3988.5, 1290^2\right) \; .$$`

```r
set.seed(84735)
y_new <- rnorm(1, y_trend, sigma)
y_new
```

```
## [1] 4849.23
```

---

---

```r
# Predict rides for each parameter set in the chain
set.seed(84735)
predict_75 <- normal_model_df %>% 
  mutate(y_trend = `(Intercept)` + temp_feel*75) %>% 
  mutate(y_new = rnorm(20000, y_trend, sigma))
```

```r
head(predict_75, 3)
```

```
##   (Intercept) temp_feel    sigma  y_trend    y_new
## 1   -2339.928  84.37713 1289.935 3988.357 4849.044
## 2   -2153.331  81.67121 1250.547 3972.010 3817.142
## 3   -2199.164  81.99670 1313.540 3950.589 4976.290
```

---

```r
ggplot(predict_75, aes(x = y_trend)) + 
  geom_density()
ggplot(predict_75, aes(x = y_new)) + 
  geom_density()
```

---

```r
# Construct 80% posterior credible intervals
predict_75 %>% 
  summarize(lower_trend = quantile(y_trend, 0.1),
    upper_trend = quantile(y_trend, 0.9),
    lower_new = quantile(y_new, 0.1),
    upper_new = quantile(y_new, 0.9))
```

```
##   lower_trend upper_trend lower_new upper_new
## 1    3879.099    4044.275  2339.071  5606.899
```

---

```r
# Simulate a set of predictions
set.seed(84735)
shortcut_prediction <- posterior_predict(
  normal_model_sim, 
  newdata = data.frame(temp_feel = 75))
head(shortcut_prediction, 3)
```

```
##             1
## [1,] 4849.044
## [2,] 3817.142
## [3,] 4976.290
```

---

```r
# Plot the approximate predictive model
mcmc_dens(shortcut_prediction) + 
  labs(x = "predicted ridership on a 75 degree day")
```

---

```r
# Construct an 80% posterior credible interval
predictions <- data.frame(y_new = shortcut_prediction[,1]) 
predictions %>% 
  summarize(lower_80 = quantile(y_new, 0.1),
    upper_80 = quantile(y_new, 0.9))
```

```
##   lower_80 upper_80
## 1 2339.071 5606.899
```