```{r}
#| echo: false
library(knitr)
```
## Bayes Rules! Dataset from Exercise 10.21 (Open-ended: more bikes)
## Overview
- For this discussion session you will use the **bikes** data in the **bayesrules** package to explore the Normal regression model of **rides** by **humidity**.
## Exercise 10.13:
(Getting started with bike dataset) Before doing any modeling, let’s get to know the bike data by loading the dataset and selecting rides and humidity features. The bikes data in the bayesrules package is a subset of the Bike Sharing dataset made available on the UCI Machine Learning Repository (2017) by Fanaee-T and Gama (2014). For each of 500 days in the study, bikes contains the number of rides taken and a measure of what the humidity felt like when incorporating factors such as temperature.
```{r echo=FALSE, warning=FALSE, message=FALSE}
# Load packages
library(bayesrules)
library(tidyverse)
library(rstan)
library(rstanarm)
library(bayesplot)
library(tidybayes)
library(janitor)
library(broom.mixed)
```
```{r}
data(bikes)
bikes <- bikes %>%
select(rides, humidity)
```
## Exercise 10.14:
In this exercise you will build a Bayesian Normal regression model of the number of Capital Bikeshare rides (Y) by its corresponding humidity (X) with $\mu = \beta_{0} + \beta_{1}X$. In doing so, assume that on an average humidity level for D.C., there are typically around 5000 riders, though this average could be somewhere between 3000 and 7000.
For every one degree increase in humidity level, ridership typically increases by 100 rides, though this average increase could be as low as 20 or as high as 180.
At any given humidity level, daily ridership will tend to vary with a moderate standard deviation of 1250 rides.
a. Plot and discuss the relationship between the number of Capital Bikeshare rides (Y) and its corresponding humidity level.
b) Use stan_glm() to simulate the Normal regression posterior model.
c) Provide visual and numerical posterior summaries for the humidity coefficient $\beta_{1}$
d) Interpret the posterior median of $\beta_{1}$
.
e) Do you have significant posterior evidence that, the higher the humidity level, the lower the number of Capital Bikeshare rides tends to be? Explain.
## Exercise 10.15:
(bike data analysis: Is it wrong?) Before putting too much stock into your regression analysis, step back and consider whether it’s wrong.
a) Your posterior simulation contains multiple sets of posterior plausible parameter sets, ($\beta_{0}$, $\beta_{1}$ , $\sigma$ ). Use the first of these to simulate a sample of 500 new bike numbers from the observed humidity levels.
b) Construct a density plot of your simulated sample and superimpose this with a density plot of the actual observed rides data. Discuss.
c) Think bigger. Use pp_check() to implement a more complete posterior predictive check.
d) Putting this together, do you think that assumptions 2 and 3 of the Normal regression model are reasonable? Explain.
## Exercise 10.16:
(Bike rides: Are the posterior predictions accurate? (Part 1))
Next, let’s explore how well our posterior model predicts number of bike rides.
a) Your posterior simulation contains multiple sets of posterior plausible parameter sets, ($\beta_{0}$, $\beta_{1}$ , $\sigma$ ). Use the first of these to simulate a sample of 500 new bike numbers from the observed humidity levels. simulate and plot a posterior predictive model for the output of this batch.
b) In reality, this batch have 4121 number of rides. Without using prediction_summary(), calculate and interpret two measures of the posterior predictive error for this batch: both the raw and standardized error.
c) To get a sense of the posterior predictive accuracy for all batches in bikes data, construct and discuss a ppc_intervals() plot.
d) How many batches have bike rides that are within their 50% posterior prediction interval? (Answer this using R code; don’t try to visually count it up!)
## Exercise 10.17:
(Bike rides: Are the posterior predictions accurate? (Part 2))
a) Use prediction_summary_cv() to obtain 10-fold cross-validated measurements of our model’s posterior predictive quality.
b) Interpret each of the four cross-validated metrics reported in part a.
c) Verify the reported cross-validated MAE using information from the 10 folds.
## Exercise 10.18:
(Bike Rides: Is it fair?) Is our bike rides analysis fair?