class: title-slide <br> <br> .right-panel[ # Simple Normal Regression ## Dr. Mine Dogucu Examples from [bayesrulesbook.com](https://bayesrulesbook.com) ] --- ```r glimpse(bikes) ``` ``` ## Rows: 500 ## Columns: 13 ## $ date <date> 2011-01-01, 2011-01-03, 2011-01-04, 2011-01-05, 2011-01-0… ## $ season <fct> winter, winter, winter, winter, winter, winter, winter, wi… ## $ year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011… ## $ month <fct> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan… ## $ day_of_week <fct> Sat, Mon, Tue, Wed, Fri, Sat, Mon, Tue, Wed, Thu, Fri, Sat… ## $ weekend <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALS… ## $ holiday <fct> no, no, no, no, no, no, no, no, no, no, no, no, no, yes, n… ## $ temp_actual <dbl> 57.39952, 46.49166, 46.76000, 48.74943, 46.50332, 44.17700… ## $ temp_feel <dbl> 64.72625, 49.04645, 51.09098, 52.63430, 50.79551, 46.60286… ## $ humidity <dbl> 80.5833, 43.7273, 59.0435, 43.6957, 49.8696, 53.5833, 48.2… ## $ windspeed <dbl> 10.749882, 16.636703, 10.739832, 12.522300, 11.304642, 17.… ## $ weather_cat <fct> categ2, categ1, categ1, categ1, categ2, categ2, categ1, ca… ## $ rides <int> 654, 1229, 1454, 1518, 1362, 891, 1280, 1220, 1137, 1368, … ``` --- ## Rides `$$Y_i | \mu, \sigma \stackrel{ind}{\sim} N(\mu, \sigma^2)$$` <img src="slide-08a-snr_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ## Regression Model `\(Y_i\)` the number of rides `\(X_i\)` temperature (in Fahrenheit) on day `\(i\)`. -- `\(\mu_i = \beta_0 + \beta_1X_i\)` -- `\(\beta_0:\)` the typical ridership on days in which the temperature was 0 degrees ( `\(X_i\)`=0). It is not interpretable in this case. `\(\beta_1:\)` the typical change in ridership for every one unit increase in temperature. --- ## Normal likelihood model `\begin{split} Y_i | \beta_0, \beta_1, \sigma & \stackrel{ind}{\sim} N\left(\mu_i, \sigma^2\right) \;\; \text{ with } \;\; \mu_i = \beta_0 + \beta_1X_i \; .\\ \end{split}` <img src="slide-08a-snr_files/figure-html/ch-9-normal-assumptions-1.png" style="display: block; margin: auto;" /> --- ## Prior Models `\(\text{likelihood model:} \; \; \; Y_i | \beta_0, \beta_1, \sigma \;\;\;\stackrel{ind}{\sim} N\left(\mu_i, \sigma^2\right)\text{ with } \mu_i = \beta_0 + \beta_1X_i\)` `\(\text{prior models:}\)` `\(\beta_0\sim N(m_0, s_0^2 )\)` `\(\beta_1\sim N(m_1, s_1^2 )\)` `\(\sigma \sim \text{Exp}(l)\)` -- Recall: `\(\text{Exp}(l) = \text{Gamma}(1, l)\)` --- ## Prior Models `\begin{split} Y_i | \beta_0, \beta_1, \sigma & \stackrel{ind}{\sim} N\left(\mu_i, \sigma^2\right) \;\; \text{ with } \;\; \mu_i = \beta_0 + \beta_1X_i \\ \beta_0 & \sim N\left(3482, 3937^2 \right) \\ \beta_1 & \sim N\left(0, 351^2 \right) \\ \sigma & \sim \text{Exp}(0.00064) \; .\\ \end{split}` --- <img src="slide-08a-snr_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ## Simulation via `rstanarm` ```r normal_model_sim <- stan_glm( rides ~ temp_feel, data = bikes, family = gaussian, chains = 4, iter = 5000*2, seed = 84735) ``` --- ```r prior_summary(normal_model_sim) ``` ``` ## Priors for model 'normal_model_sim' ## ------ ## Intercept (after predictors centered) ## Specified prior: ## ~ normal(location = 3482, scale = 2.5) ## Adjusted prior: ## ~ normal(location = 3482, scale = 3937) ## ## Coefficients ## Specified prior: ## ~ normal(location = 0, scale = 2.5) ## Adjusted prior: ## ~ normal(location = 0, scale = 351) ## ## Auxiliary (sigma) ## Specified prior: ## ~ exponential(rate = 1) ## Adjusted prior: ## ~ exponential(rate = 0.00064) ## ------ ## See help('prior_summary.stanreg') for more details ``` --- ```r mcmc_trace(normal_model_sim, size = 0.1) ``` <img src="slide-08a-snr_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ```r mcmc_dens_overlay(normal_model_sim) ``` <img src="slide-08a-snr_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ```r normal_model_df <- as.data.frame(normal_model_sim) head(normal_model_df, 3) ``` ``` ## (Intercept) temp_feel sigma ## 1 -2339.928 84.37713 1289.935 ## 2 -2153.331 81.67121 1250.547 ## 3 -2199.164 81.99670 1313.540 ``` --- ```r # STEP 1: DEFINE the model stan_bike_model <- " data { int<lower=0> n; vector[n] Y; vector[n] X; } parameters { real beta0; real beta1; real<lower=0> sigma; } model { Y ~ normal(beta0 + beta1 * X, sigma); } " ``` --- ```r # STEP 2: SIMULATE the posterior stan_bike_sim <- stan( model_code = stan_bike_model, data = list(n = nrow(bikes), Y = bikes$rides, X = bikes$temp_feel), chains = 4, iter = 5000*2, seed = 84735) ``` --- ## Interpreting the posterior ```r # Posterior summary statistics model_summary <- as.data.frame(summary(normal_model_sim)) head(model_summary, -2) ``` ``` ## mean mcse sd 10% 50% ## (Intercept) -2177.96495 2.51708710 360.669636 -2636.05876 -2174.34098 ## temp_feel 81.86012 0.03605003 5.161012 75.31814 81.83427 ## sigma 1283.12357 0.29332218 40.547168 1231.74044 1282.43943 ## 90% n_eff Rhat ## (Intercept) -1721.62254 20532 1.000031 ## temp_feel 88.42322 20496 1.000028 ## sigma 1335.63969 19109 1.000236 ``` --- ```r # Posterior credible intervals posterior_interval(normal_model_sim, prob = 0.80) ``` ``` ## 10% 90% ## (Intercept) -2636.05876 -1721.62254 ## temp_feel 75.31814 88.42322 ## sigma 1231.74044 1335.63969 ``` --- ```r # Shade in the 80% CI. For example: mcmc_areas( normal_model_sim, pars = c("(Intercept)"), prob = 0.80, point_est = "mean" ) ``` --- <img src="slide-08a-snr_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- ```r head(normal_model_df, 3) ``` ``` ## (Intercept) temp_feel sigma ## 1 -2339.928 84.37713 1289.935 ## 2 -2153.331 81.67121 1250.547 ## 3 -2199.164 81.99670 1313.540 ``` ```r nrow(normal_model_df) ``` ``` ## [1] 20000 ``` --- <img src="slide-08a-snr_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- ```r # Tabulate the beta_1 values that exceed 0 normal_model_df %>% mutate(exceeds_0 = temp_feel > 0) %>% tabyl(exceeds_0) ``` ``` ## exceeds_0 n percent ## TRUE 20000 1 ``` --- ```r sigma_lower <- quantile(normal_model_df$sigma, 0.1) sigma_lower ``` ``` ## 10% ## 1231.74 ``` ```r sigma_upper <- quantile(normal_model_df$sigma, 0.9) sigma_upper ``` ``` ## 90% ## 1335.64 ``` ```r exp_slope <- mean(normal_model_df$temp_feel) exp_intercept <- mean(normal_model_df$`(Intercept)`) ``` --- ### Simulate data ```r set.seed(1) sim_data <- data.frame(x = rep(bikes$temp_feel, 2)) %>% mutate(y = c(rnorm(500, exp_intercept + exp_slope * x, sigma_lower), rnorm(500, exp_intercept + exp_slope * x, sigma_upper)), sigma = rep(c("small", "large"), each = 500)) ``` --- ```r head(sim_data, 3) ``` ``` ## x y sigma ## 1 64.72625 2348.9052 small ## 2 49.04645 2063.1843 small ## 3 51.09098 975.0713 small ``` ```r tail(sim_data, 3) ``` ``` ## x y sigma ## 998 53.81600 1667.4031 large ## 999 52.85300 168.1661 large ## 1000 52.11383 1156.7136 large ``` --- ```r # Plot the simulated data ggplot(sim_data, aes(x = x, y = y)) + geom_point(size = 0.5) + facet_grid(~ sigma) ``` --- <img src="slide-08a-snr_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" />