class: center, middle, inverse, title-slide # Visualizing Data ### Dr. Dogucu ### 2019-10-09 --- layout: true <div class="my-header"></div> <div class="my-footer"> Copyright © <a href="https://mdogucu.ics.uci.edu">Dr. Mine Dogucu</a>. All Rights Reserved.</div> --- ## Recap - Always make a copy of today's project, whenever you start the project for the first time. <img src = "img/cloud-copy.png" class="center-image"> </img> --- ## Recap You can "run" or "process" .Rmd file by knitting. To knit your document - Clicking the Knit button or - Using short cut Ctrl (Cmd on Mac) + Shift + K Every time you knit your .Rmd file is automatically saved. --- ## Recap We have learned that we can insert R code chunk to an R Markdown file by - Clicking Insert > R or - using shortcuts Ctrl+Alt+I (Windows) & Command+Option+I (Mac) --- ## One More Short Cut For %>% (aka piper operator) you can use the shortcut Ctrl + Shift + M (Windows) or Cmd + Shift + M (Mac) When you read out your code, you can read %>% as "and then" --- ## Today Data Visualization using ggplot2 package. ## Examples [BBC](https://bbc.github.io/rcookbook/) [FiveThirtyEight](https://fivethirtyeight.com/features/the-rise-of-religiously-inspired-terrorism-in-france/) [Master List](http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Histogram) --- ## Three steps of basic visualization with ggplot2 package 1. Make some space (this soon will make sense) using the `ggplot()` function. 2. Draw your axes using `aes()` function. 3. Add you plot, whether it is histogram, bar plot or something else. We will call these geom objects. --- ## Flow for the Day For every plot, 1) Decide, what variables to use. 2) Decide which variable is on which axes. 3) Decide what kind of plot it is. 4) Watch the demo for plotting. --- <br> <br> <br> <img src="slide-2l-intro-viz_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ??? Make sure to do this step by step please Show that titanic %>% ggplot() makes space. titanic %>% ggplot(aes(x = Survived)) puts the axes. and so on. Note that they will only see + in R code while using ggplot function because the visualization works in layers and we are adding layers by +. --- ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` <img src="slide-2l-intro-viz_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> Why are we getting this warning? ??? They should know what bins are from reading. Remind them. ggplot will set up a default binwidth but it ill also warn you about it. Make sure to set your own bindwidth. Remind them we had 3 steps of plots 1) making space 2) axes 3) geom object. Which of these steps is bindwidth related to? --- <img src="slide-2l-intro-viz_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> What step of the code is binwidth related to? ??? Since binwidth is related to the geom object we can put the binwidth argument inside the geom_histogram function. --- ## Cheatsheet Look at the ggplot cheatsheet. Moving on you will attempt at making the plot by finding the geom object from your cheatsheet. --- <br> <br> <br> <img src="slide-2l-intro-viz_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ## You Attempt This First Make a plot that shows the relationship between age of the passenger and the fare that they paid. Comment whether older passengers bought more expensive tickets. --- ``` ## Warning: Removed 177 rows containing missing values (geom_point). ``` <img src="slide-2l-intro-viz_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- ## Color ```r titanic_train %>% ggplot(aes(x = Fare )) + geom_histogram(color = "salmon") ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](slide-2l-intro-viz_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- ## Fill ```r titanic_train %>% ggplot(aes(x = Fare)) + geom_histogram(fill = "salmon") ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](slide-2l-intro-viz_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- ## Shape ```r titanic_train %>% ggplot(aes(x = Age, y = Fare )) + geom_point(shape = 4) ``` <img src="slide-2l-intro-viz_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- class: center, inverse ## More colors <br> <br> <br> [bit.ly/colors-r](bit.ly/colors-r) --- class: center, inverse ## More shapes <br> <br> <br> [bit.ly/shapes-r](bit.ly/shapes-r) --- ## Stacked bar plot Can you guess where the fill argument will go for this stacked bar plot? <img src="slide-2l-intro-viz_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ??? Note that since it is a barplot y axis is reserved for count Note that these are the default color scheme of ggplot They should focus on adding a second variable rather than changing colors. --- ## Your Turn Make this plot with any color other than the default one and submit it. Make sure to set binwidth. `bechdel` is loaded already. <img src="slide-2l-intro-viz_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ## Getting Ready for Submission At the top of the .Rmd document you will see a part where my name is written. Replace it with full names of your group members separated by commas. Be careful about the quotation. Knit your document as a .pdf one last time. On the files panel, in the lower right, you should see your pdf file. Check the pdf file. Click on More > Export > Download. You now have downloaded your pdf file to your computer. --- ## Submission - Log onto Gradescope and upload it. - Gradescope may ask you which page has which questions, provide this information. - Make sure to add your group members' names