class: title-slide <br> <br> .right-panel[ # Evaluating Regression Models ## Dr. Mine Dogucu Examples from [bayesrulesbook.com](https://bayesrulesbook.com) ] --- 1. __How fair is the model?__ 2. __How wrong is the model?__ 3. __How accurate are the posterior predictive models?__ --- __Posterior predictive check__ Consider a regression model with response variable `\(Y\)`, predictor `\(X\)`, and a set of regression parameters `\(\theta\)`. For example, in the model above `\(\theta = (\beta_0,\beta_1,\sigma)\)`. Further, let `\(\left\lbrace \theta^{(1)}, \theta^{(2)}, \ldots, \theta^{(N)}\right\rbrace\)` be an `\(N\)`-length Markov chain for the posterior model of `\(\theta\)`. Then a "good" Bayesian model will produce _predictions_ of `\(Y\)` with features similar to the _original_ `\(Y\)` data. To evaluate whether your model satisfies this goal: 1. At each set of posterior plausible parameters `\(\theta^{(i)}\)`, simulate a sample of `\(Y\)` values from the likelihood model, one corresponding to each `\(X\)` in the original sample of size `\(n\)`. This produces `\(N\)` separate samples of size `\(n\)`. 2. Compare the features of the `\(N\)` simulated `\(Y\)` samples, or a subset of these samples, to those of the original `\(Y\)` data. --- ```r head(normal_model_df, 1) ``` ``` ## (Intercept) temp_feel sigma ## 1 -2339.928 84.37713 1289.935 ``` ```r `(Intercept)` <- round(head(normal_model_df, 1)$`(Intercept)`) temp_feel <- round(head(normal_model_df, 1)$temp_feel, 2) sigma <- round(head(normal_model_df, 1)$sigma) ``` --- ```r `(Intercept)` ``` ``` ## [1] -2340 ``` ```r temp_feel ``` ``` ## [1] 84.38 ``` ```r sigma ``` ``` ## [1] 1290 ``` --- ```r set.seed(84735) one_simulation <- bikes %>% mutate(simulated_rides = rnorm( 500, mean = -2340 + 84.38 * temp_feel, sd = 1290)) %>% select(temp_feel, rides, simulated_rides) ``` -- ```r head(one_simulation, 3) ``` ``` ## temp_feel rides simulated_rides ## 1 64.72625 654 3982.331 ## 2 49.04645 1229 1638.786 ## 3 51.09098 1454 2978.376 ``` --- ```r ggplot(one_simulation, aes(x = simulated_rides)) + geom_density(color = "lightblue") + geom_density(aes(x = rides), color = "darkblue") ``` <img src="slide-08c-reg-eval_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ```r # Examine 50 of the 20000 simulated samples set.seed(84735) bayesplot::pp_check(normal_model_sim, nreps = 50) ``` <img src="slide-08c-reg-eval_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ```r bikes %>% filter(date == "2012-10-22") %>% select(temp_feel, rides) ``` ``` ## temp_feel rides ## 1 75.46478 6228 ``` --- class: middle __observed value__: `\(Y\)` __posterior predictive median__: `\(Y'\)` __predictive error__: `\(Y - Y'\)` --- ```r predict_75 %>% summarize( median = median(y_new), error = 6228 - median(y_new)) ``` ``` ## median error ## 1 3946.558 2281.442 ``` --- class: middle __median absolute deviation (mad)__ the typical distance between a posterior prediction and the posterior predictive median. We estimate the _mad_ by calculating the absolute deviations of our 20,000 posterior predictions of `\(Y\)`, `\(\left\lbrace Y_{\text{new}}^{(1)}, Y_{\text{new}}^{(2)}, \ldots, Y_{\text{new}}^{(20000)}\right\rbrace\)`, from the posterior predictive median `\(Y'\)` and calculating the median of these deviations: `$$\text{mad} = \text{median}_{i \in \{1,2,\ldots,20000\}} \vert Y_{\text{new}}^{(i)} - Y' \vert \; .$$` --- ```r predict_75 %>% summarize( error = 6228 - median(y_new), predictive_mad = mad(y_new, constant = 1), error_scaled = error / predictive_mad) ``` ``` ## error predictive_mad error_scaled ## 1 2281.442 867.8032 2.628986 ``` --- ## Posterior Prediction Interval ```r predict_75 %>% summarize(lower_95 = quantile(y_new, 0.025), lower_50 = quantile(y_new, 0.25), upper_50 = quantile(y_new, 0.75), upper_95 = quantile(y_new, 0.975)) ``` ``` ## lower_95 lower_50 upper_50 upper_95 ## 1 1494.581 3086.11 4822.178 6500.741 ``` --- ```r set.seed(84735) predictions <- posterior_predict(normal_model_sim, newdata = bikes) dim(predictions) ``` ``` ## [1] 20000 500 ``` --- ```r ppc_intervals(bikes$rides, yrep = predictions, x = bikes$temp_feel, prob = 0.5, prob_outer = 0.95) ``` <img src="slide-08c-reg-eval_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- Let `\(Y_1, Y_2, \ldots, Y_n\)` denote `\(n\)` _observed_ outcomes. Then each `\(Y_i\)` has a corresponding posterior predictive model with _median_ `\(Y_i'\)` and _median absolute deviation_ `\(\text{mad}_i\)`. We can evaluate the overall posterior predictive model quality by the following measures: - The __median absolute error__ `mae` `$$\text{mae} = \text{median}_{i \in \{1,2,\ldots,n\}} |Y_i - Y_i'|$$` - The __scaled median absolute error__ `scaled_mae` `$$\text{mae scaled} = \text{median}_{i \in \{1,2,\ldots,n\}} \frac{|Y_i - Y_i'|}{\text{mad}_i}$$` - `within_50` and `within_95` measure the proportion of observed values `\(Y_i\)` that fall within their 50% and 95% posterior prediction intervals respectively. --- ```r # Posterior predictive summaries prediction_summary(model = normal_model_sim, data = bikes) ``` ``` ## mae mae_scaled within_50 within_95 ## 1 995.1727 0.770901 0.44 0.97 ``` --- __The k-fold cross validation algorithm__ 1. __Create folds.__ Let `\(k\)` be some integer from 2 to our original sample size `\(n\)`. Split the data into `\(k\)` __folds__, or subsets, of roughly equal size. 2. __Train and test the model.__ - _Train_ the model using the first `\(k - 1\)` data folds combined. - _Test_ this model on the `\(k\)`th data fold. - Measure the prediction quality (eg: by MAE). 3. __Repeat.__ Repeat step 2 `\(k - 1\)` times, each time leaving out a different fold for testing. 4. __Calculate cross-validation estimates.__ Steps 2 and 3 produce `\(k\)` different training models and `\(k\)` corresponding measures of prediction quality. _Average_ these `\(k\)` measures to obtain a single cross-validation estimate of prediction quality. --- class: middle ```r set.seed(84735) cv_procedure <- prediction_summary_cv( data = bikes, model = normal_model_sim, k = 10) ``` --- ```r cv_procedure$folds ``` ``` ## fold mae mae_scaled within_50 within_95 ## 1 1 990.8046 0.7707640 0.46 0.98 ## 2 2 966.2503 0.7442039 0.42 1.00 ## 3 3 953.5580 0.7314050 0.42 0.98 ## 4 4 1015.8608 0.7914139 0.46 0.98 ## 5 5 1161.5925 0.9089318 0.36 0.96 ## 6 6 935.2628 0.7290865 0.46 0.94 ## 7 7 1267.1400 1.0025401 0.32 0.96 ## 8 8 1108.0451 0.8591984 0.36 1.00 ## 9 9 1108.9709 0.8748064 0.40 0.92 ## 10 10 788.3889 0.6075533 0.56 0.96 ``` --- ```r cv_procedure$cv ``` ``` ## mae mae_scaled within_50 within_95 ## 1 1029.587 0.8019903 0.422 0.968 ```