class: center, middle, inverse, title-slide # Teaching an Introductory Data Science course with tidyverse workshop ## Data Visualization ### Dr. Mine Dogucu ### 2021-12-15 --- layout: true <!-- This file by Mine Dogucu is licensed under a Attribution-ShareAlike 2.5 Generic License (CC BY-SA 2.5) More information about the license can be found at --> <div class="my-header"></div> <div class="my-footer"> CC BY-NC-ND 4.0 <a href="">Mine Dogucu</a></div> --- class: middle [How LGBTQ+ hate crime is committed by young people against young people]( [Why Time Flies]( [Mandatory Paid Vacation]( [Why are K-pop groups so big?]( --- class: middle Data Visualizations - are graphical representations of data -- - use different colors, shapes, and the coordinate system to summarize data -- - tell a story -- - are useful for exploring data --- class: middle First I go over different types of visualizations so that students can interpret what they see in the visuals. The following is a useful resource for understanding histograms. [Exploring Histograms Interactively]( --- class: middle __gg__plot is based on __g__rammar of __g__raphics. <img src="img/grammar_graphics.jpeg" width="237" /> --- ## Data ```r glimpse(titanic) ``` ``` ## Rows: 891 ## Columns: 6 ## $ survived <lgl> FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE,… ## $ pclass <chr> "Third", "First", "Third", "First", "Third", … ## $ sex <fct> sex, sex, sex, sex, sex, sex, sex, sex, sex, … ## $ age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58,… ## $ fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4… ## $ embarked <fct> Southampton, Cherbourg, Southampton, Southamp… ``` .footnote[The data frame has been cleaned for you.] --- class:inverse middle .font75[Visualizing a Single Categorical Variable] --- class: middle .left-panel[ <br> <br> If you could speak to R in English, how would you tell R to make this plot for you? OR If you had the data and had to draw this bar plot by hand, what would you do? ] .right-panel[ ![](2-data-viz_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] --- class: middle **3 Steps of Making a Basic ggplot** 1.Pick data 2.Map data onto aesthetics 3.Add the geometric layer --- class: middle ### Step 1 - Pick Data .pull-left[ ```r ggplot(data = titanic) ``` ] .pull-right[ ![](2-data-viz_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] --- class: middle ### Step 2 - Map Data to Aesthetics .pull-left[ ```r ggplot(data = titanic, * aes(x = pclass)) ``` ] .pull-right[ ![](2-data-viz_files/figure-html/unnamed-chunk-8-1.png)<!-- --> ] --- class: middle ### Step 3 - Add the Geometric Layer .pull-left[ ```r ggplot(data = titanic, aes(x = pclass)) + * geom_bar() ``` ] .pull-right[ ![](2-data-viz_files/figure-html/unnamed-chunk-10-1.png)<!-- --> ] --- class: middle .panelset[ .panel[ .panel-name[Plot] <img src="2-data-viz_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[English] - Create a ggplot using the `titanic` data frame. - Map the `pclass` to the x-axis. - Add a layer of a bar plot. ] .panel[ .panel-name[R] ```r ggplot(data = titanic, aes(x = pclass)) + geom_bar() ``` ] ] --- class:inverse middle .font75[Visualizing a Single Numeric Variable] --- class: middle .panelset[ .panel[ .panel-name[Plot] ``` ## `stat_bin()` using `bins = 30`. Pick better value with ## `binwidth`. ``` <img src="2-data-viz_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[English] - Create a ggplot using the `titanic` data frame. - Map the `fare` to the x-axis. - Add a layer of a histogram. ] .panel[ .panel-name[R] ```r ggplot(data = titanic, aes(x = fare)) + geom_histogram() ``` ] ] --- class: middle ### Step 1 - Pick Data .pull-left[ ```r ggplot(data = titanic) ``` ] .pull-right[ ![](2-data-viz_files/figure-html/unnamed-chunk-16-1.png)<!-- --> ] --- class: middle ### Step 2 - Map Data to Aesthetics .pull-left[ ```r ggplot(data = titanic, * aes(x = fare)) ``` ] .pull-right[ ![](2-data-viz_files/figure-html/unnamed-chunk-18-1.png)<!-- --> ] --- class: middle ### Step 3 - Add the Geometric Layer .pull-left[ ```r ggplot(data = titanic, aes(x = fare)) + * geom_histogram() ``` ] .pull-right[ ``` ## `stat_bin()` using `bins = 30`. Pick better value with ## `binwidth`. ``` ![](2-data-viz_files/figure-html/unnamed-chunk-20-1.png)<!-- --> ] --- ## What is this warning? ``` ## `stat_bin()` using `bins = 30`. Pick better value with ## `binwidth`. ``` <img src="2-data-viz_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> --- ```r ggplot(data = titanic, aes(x = fare)) + * geom_histogram(binwidth = 15) ``` <img src="2-data-viz_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> --- class: middle .panelset[ .panel[.panel-name[binwidth = 15] .left-panel[ ] <img src="2-data-viz_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> ] .panel[.panel-name[binwidth = 50] <img src="2-data-viz_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> ] .panel[.panel-name[binwidth = 100] <img src="2-data-viz_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> ] ] --- class: middle center .font150[ 🌈 ] Pick your favorite color(s) from the list at: []( --- ```r ggplot(data = titanic, aes(x = fare)) + geom_histogram(binwidth = 15, * color = "white") ``` <img src="2-data-viz_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> --- ```r ggplot(data = titanic, aes(x = fare)) + geom_histogram(binwidth = 15, * fill = "darkred") ``` <img src="2-data-viz_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> --- ```r ggplot(data = titanic, aes(x = fare)) + geom_histogram(binwidth = 15, * color = "white", * fill = "darkred") ``` <img src="2-data-viz_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> --- class: inverse middle center .font75[Visualizing Two Categorical Variables] --- ## Stacked Bar-Plot .pull-left[ ```r ggplot(data = titanic, aes(x = pclass, * fill = survived)) + geom_bar() ``` ] .pull-right[ ![](2-data-viz_files/figure-html/unnamed-chunk-30-1.png)<!-- --> ] --- ## Standardized Bar Plot .pull-left[ ```r ggplot(data = titanic, aes(x = pclass, fill = survived)) + * geom_bar(position = "fill") ``` ] .pull-right[ ![](2-data-viz_files/figure-html/unnamed-chunk-32-1.png)<!-- --> ] .footnote[Note that y-axis is no longer count but we will learn how to change that later.] --- ## Dodged Bar Plot .pull-left[ ```r ggplot(data = titanic, aes(x = pclass, fill = survived)) + * geom_bar(position = "dodge") ``` ] .pull-right[ ![](2-data-viz_files/figure-html/unnamed-chunk-34-1.png)<!-- --> ] .footnote[Note that y-axis is no longer count but we will change that later.] --- ## New Data <img src="img/penguins.png" width="667" style="display: block; margin: auto;" /> .footnote[Artwork by [@allison_horst]( ] --- ## New Data ```r glimpse(penguins) ``` ``` ## Rows: 344 ## Columns: 8 ## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adel… ## $ island <fct> Torgersen, Torgersen, Torgersen, Tor… ## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38… ## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17… ## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 19… ## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 36… ## $ sex <fct> male, female, female, NA, female, ma… ## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, … ``` --- <img src="img/penguin_bill.png" width="1036" style="display: block; margin: auto;" /> .footnote[Artwork by [@allison_horst]( ] --- class: middle inverse .font75[Visualizing a single numerical and single categorical variable.] --- class: middle .panelset[ .panel[ .panel-name[Plot] ``` ## Warning: Removed 2 rows containing non-finite values ## (stat_ydensity). ``` <img src="2-data-viz_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[English] - Create a ggplot using the `penguins` data frame. - Map the `species` to the x-axis and `bill_length_mm` to the y-axis. - Add a layer of a violin plot. ] .panel[ .panel-name[R] ```r ggplot(penguins, aes(x = species, y = bill_length_mm)) + geom_violin() ``` ] ] --- class: middle .panelset[ .panel[ .panel-name[Plot] ``` ## Warning: Removed 2 rows containing non-finite values ## (stat_boxplot). ``` <img src="2-data-viz_files/figure-html/unnamed-chunk-40-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[English] - Create a ggplot using the `penguins` data frame. - Map the `species` to the x-axis and `bill_length_mm` to the y-axis. - Add a layer of a box plot. ] .panel[ .panel-name[R] ```r ggplot(penguins, aes(x = species, y = bill_length_mm)) + geom_boxplot() ``` ] ] --- class: middle .pull-left[ ![](2-data-viz_files/figure-html/unnamed-chunk-42-1.png)<!-- --> ] .pull-right[ ![](2-data-viz_files/figure-html/unnamed-chunk-43-1.png)<!-- --> ] .footnote[Note: Violin plots display densities, not counts!] --- class: inverse middle .font75[Visualizing Two Numerical Variables] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() ``` ] .right-panel[ ``` ## Warning: Removed 2 rows containing missing values (geom_point). ``` ![](2-data-viz_files/figure-html/unnamed-chunk-45-1.png)<!-- --> ] --- class: middle inverse .font75[Considering More Than Two Variables] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() ``` ] .right-panel[ ``` ## Warning: Removed 2 rows containing missing values (geom_point). ``` ![](2-data-viz_files/figure-html/unnamed-chunk-47-1.png)<!-- --> ] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species)) + geom_point() ``` ] .right-panel[ ``` ## Warning: Removed 2 rows containing missing values (geom_point). ``` ![](2-data-viz_files/figure-html/unnamed-chunk-49-1.png)<!-- --> ] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species)) + geom_point() ``` ] .right-panel[ ``` ## Warning: Removed 2 rows containing missing values (geom_point). ``` ![](2-data-viz_files/figure-html/unnamed-chunk-51-1.png)<!-- --> ] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point() ``` ] .right-panel[ ``` ## Warning: Removed 2 rows containing missing values (geom_point). ``` ![](2-data-viz_files/figure-html/unnamed-chunk-53-1.png)<!-- --> ] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species, size = body_mass_g)) + geom_point() ``` ] .right-panel[ ``` ## Warning: Removed 2 rows containing missing values (geom_point). ``` ![](2-data-viz_files/figure-html/unnamed-chunk-55-1.png)<!-- --> ] --- <img src="img/ggplot-summary.jpeg" width="95%" /> --- class: middle ## Practice Using either the `babies`, `titanic` or `penguins` data frame ask a question that you are interested in answering. Visualize data to get a visual answer to the question. What is the visual telling you? Note all of this down in your lecture notes. --- class: middle .panelset[ .panel[ .panel-name[Plot] <img src="2-data-viz_files/figure-html/unnamed-chunk-57-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[English] - Using the `penguins` data, - Map `bill depth` to x-axis, `bill length` to y-axis, `species` to shape and color. - Add a layer of points and set the size of the points to 4. ] .panel[ .panel-name[R] ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point(size = 4) ``` ] ] --- class: middle ## Labs .panelset[ .panel[ .panel-name[Plot] <img src="2-data-viz_files/figure-html/unnamed-chunk-59-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[English] - Using the `penguins` data, - Map `bill depth` to x-axis, `bill length` to y-axis, `species` to shape. - Add a layer of points and set the size of the points to 4. - .highlight-text[Add labels to x-axis (Bill Depth(mm)), y-axis (Bill Length(mm)), and the title of the plot (Palmer Penguins).] .panel[ .panel-name[R] ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point(size = 4) + * labs(x = "Bill Depth (mm)", * y = "Bill Length (mm)", * title = "Palmer Penguins") ``` ] ] ] --- class: middle .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point() + labs(x = "Bill Depth (mm)", y = "Bill Length (mm)", title = "Palmer Penguins") + * theme_bw() ``` ] .right-panel[ ![](2-data-viz_files/figure-html/unnamed-chunk-62-1.png)<!-- --> ] --- class: middle ## Themes .panelset[ .panel[ .panel-name[`theme_gray()`] <img src="2-data-viz_files/figure-html/unnamed-chunk-63-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[`theme_bw()`] <img src="2-data-viz_files/figure-html/unnamed-chunk-64-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[`theme_classic()`] <img src="2-data-viz_files/figure-html/unnamed-chunk-65-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[`theme_dark()`] <img src="2-data-viz_files/figure-html/unnamed-chunk-66-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[`theme_minimal()`] <img src="2-data-viz_files/figure-html/unnamed-chunk-67-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[`theme_void()`] <img src="2-data-viz_files/figure-html/unnamed-chunk-68-1.png" style="display: block; margin: auto;" /> ] ] --- class: middle .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point() + labs(x = "Bill Depth (mm)", y = "Bill Length (mm)", title = "Palmer Penguins") + theme_bw() + * theme(text = element_text(size=20)) ``` ] .right-panel[ ![](2-data-viz_files/figure-html/unnamed-chunk-70-1.png)<!-- --> ] --- class: middle ```r ?theme ``` --- class: middle .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point(size = 4) + * facet_grid(.~species) ``` ] .right-panel[ <img src="2-data-viz_files/figure-html/unnamed-chunk-73-1.png" style="display: block; margin: auto;" /> ] --- class: middle .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point(size = 4) + * facet_grid(species~.) ``` ] .right-panel[ <img src="2-data-viz_files/figure-html/unnamed-chunk-75-1.png" style="display: block; margin: auto;" /> ] --- class: middle .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() + * xlim(0, 30) + * ylim(0,70) ``` ] .right-panel[ ![](2-data-viz_files/figure-html/unnamed-chunk-77-1.png)<!-- --> ] --- class: middle ## code style [The tidyverse style guide]( has the following convention for writing ggplot2 code. The plus sign for adding layers `+` always has a space before it and is followed by a new line. The new line is indented by two spaces. RStudio does this automatically for you. --- class: middle [Top Ten Dos and Don't for Charts and Graphs]( --- class: middle Check out [the ggplot flipbook]( for some inspiration. Find your favorite new function/feature. Share it with your neighbor. --- class: middle ## ggplot extensions - [`patchwork`]( combining plots into a single plot - [`gganimate`]( animated graphics - [`ggthemes`]( additional set of themes - [`ggtext`]( improved text rendering support for ggplot2 There are [more extensions]( --- class: middle ## Detour: R Markdown chunk options ```{r, echo=TRUE, message=FALSE} ``` --- class: center middle ## (some) Chunk Options in R Markdown <table align = "center"> <tr> <td>echo = FALSE</td> <td>hides the code</td> </tr> <tr> <td>message = FALSE</td> <td>hides messages</td> </tr> <tr> <td>warning = FALSE</td> <td>hides warning</td> </tr> <tr> <td>error = TRUE</td> <td>renders despite errors and displays the error</td> </tr> <tr> <td>fig.cap = "Some figure caption"</td> <td>creates a figure caption</td> </tr> <tr> <td>fig.alt = "Some alternate text for figure"</td> <td>creates alternate text for figures</td> </tr> </table> --- class: middle ## Schedule for the Day __10:00 - 10:15 Introduction and Setup__ __10:15 - 11:15 Introduction to Toolkit and Data Basics__ __11:20 - 12:30 Data Visualization__ 1:00 - 1:45 Data Wrangling 1:45 - 2:15 Packages and External Datasets 2:15 - 2:30 Wrap Up