+ - 0:00:00
Notes for current slide
Notes for next slide

Teaching an Introductory Data Science course with tidyverse workshop

Data Visualization

Dr. Mine Dogucu

2021-12-15

1 / 62

Data Visualizations

  • are graphical representations of data
3 / 62

Data Visualizations

  • are graphical representations of data

  • use different colors, shapes, and the coordinate system to summarize data

3 / 62

Data Visualizations

  • are graphical representations of data

  • use different colors, shapes, and the coordinate system to summarize data

  • tell a story

3 / 62

Data Visualizations

  • are graphical representations of data

  • use different colors, shapes, and the coordinate system to summarize data

  • tell a story

  • are useful for exploring data

3 / 62

First I go over different types of visualizations so that students can interpret what they see in the visuals.

The following is a useful resource for understanding histograms.

Exploring Histograms Interactively

4 / 62

ggplot is based on grammar of graphics.

5 / 62

Data

glimpse(titanic)
## Rows: 891
## Columns: 6
## $ survived <lgl> FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE,…
## $ pclass <chr> "Third", "First", "Third", "First", "Third", …
## $ sex <fct> sex, sex, sex, sex, sex, sex, sex, sex, sex, …
## $ age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58,…
## $ fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4…
## $ embarked <fct> Southampton, Cherbourg, Southampton, Southamp…

The data frame has been cleaned for you.

6 / 62

Visualizing a Single Categorical Variable

7 / 62



If you could speak to R in English, how would you tell R to make this plot for you?

OR

If you had the data and had to draw this bar plot by hand, what would you do?

8 / 62

3 Steps of Making a Basic ggplot

1.Pick data

2.Map data onto aesthetics

3.Add the geometric layer

9 / 62

Step 1 - Pick Data

ggplot(data = titanic)

10 / 62

Step 2 - Map Data to Aesthetics

ggplot(data = titanic,
aes(x = pclass))

11 / 62

Step 3 - Add the Geometric Layer

ggplot(data = titanic,
aes(x = pclass)) +
geom_bar()

12 / 62

  • Create a ggplot using the titanic data frame.
  • Map the pclass to the x-axis.
  • Add a layer of a bar plot.
ggplot(data = titanic,
aes(x = pclass)) +
geom_bar()
13 / 62

Visualizing a Single Numeric Variable

14 / 62
## `stat_bin()` using `bins = 30`. Pick better value with
## `binwidth`.

  • Create a ggplot using the titanic data frame.
  • Map the fare to the x-axis.
  • Add a layer of a histogram.
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram()
15 / 62

Step 1 - Pick Data

ggplot(data = titanic)

16 / 62

Step 2 - Map Data to Aesthetics

ggplot(data = titanic,
aes(x = fare))

17 / 62

Step 3 - Add the Geometric Layer

ggplot(data = titanic,
aes(x = fare)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with
## `binwidth`.

18 / 62

What is this warning?

## `stat_bin()` using `bins = 30`. Pick better value with
## `binwidth`.

19 / 62
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15)

20 / 62

🌈

Pick your favorite color(s) from the list at:

bit.ly/colors-r

22 / 62
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15,
color = "white")

23 / 62
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15,
fill = "darkred")

24 / 62
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15,
color = "white",
fill = "darkred")

25 / 62

Visualizing Two Categorical Variables

26 / 62

Stacked Bar-Plot

ggplot(data = titanic,
aes(x = pclass,
fill = survived)) +
geom_bar()

27 / 62

Standardized Bar Plot

ggplot(data = titanic,
aes(x = pclass,
fill = survived)) +
geom_bar(position = "fill")

Note that y-axis is no longer count but we will learn how to change that later.

28 / 62

Dodged Bar Plot

ggplot(data = titanic,
aes(x = pclass,
fill = survived)) +
geom_bar(position = "dodge")

Note that y-axis is no longer count but we will change that later.

29 / 62

New Data

Artwork by @allison_horst

30 / 62

New Data

glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adel…
## $ island <fct> Torgersen, Torgersen, Torgersen, Tor…
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38…
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17…
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 19…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 36…
## $ sex <fct> male, female, female, NA, female, ma…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, …
31 / 62

Artwork by @allison_horst

32 / 62

Visualizing a single numerical and single categorical variable.

33 / 62
## Warning: Removed 2 rows containing non-finite values
## (stat_ydensity).

  • Create a ggplot using the penguins data frame.
  • Map the species to the x-axis and bill_length_mm to the y-axis.
  • Add a layer of a violin plot.
ggplot(penguins,
aes(x = species,
y = bill_length_mm)) +
geom_violin()
34 / 62
## Warning: Removed 2 rows containing non-finite values
## (stat_boxplot).

  • Create a ggplot using the penguins data frame.
  • Map the species to the x-axis and bill_length_mm to the y-axis.
  • Add a layer of a box plot.
ggplot(penguins,
aes(x = species,
y = bill_length_mm)) +
geom_boxplot()
35 / 62

Note: Violin plots display densities, not counts!

36 / 62

Visualizing Two Numerical Variables

37 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

38 / 62

Considering More Than Two Variables

39 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

40 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

41 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

42 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

43 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species,
size = body_mass_g)) +
geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).

44 / 62

45 / 62

Practice

Using either the babies, titanic or penguins data frame ask a question that you are interested in answering. Visualize data to get a visual answer to the question. What is the visual telling you? Note all of this down in your lecture notes.

46 / 62

  • Using the penguins data,
  • Map bill depth to x-axis, bill length to y-axis, species to shape and color.
  • Add a layer of points and set the size of the points to 4.
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species)) +
geom_point(size = 4)
47 / 62

Labs

  • Using the penguins data,
  • Map bill depth to x-axis, bill length to y-axis, species to shape.
  • Add a layer of points and set the size of the points to 4.
  • Add labels to x-axis (Bill Depth(mm)), y-axis (Bill Length(mm)), and the title of the plot (Palmer Penguins).
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species)) +
geom_point(size = 4) +
labs(x = "Bill Depth (mm)",
y = "Bill Length (mm)",
title = "Palmer Penguins")
48 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species)) +
geom_point() +
labs(x = "Bill Depth (mm)",
y = "Bill Length (mm)",
title = "Palmer Penguins") +
theme_bw()

49 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species)) +
geom_point() +
labs(x = "Bill Depth (mm)",
y = "Bill Length (mm)",
title = "Palmer Penguins") +
theme_bw() +
theme(text = element_text(size=20))

51 / 62
?theme
52 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species)) +
geom_point(size = 4) +
facet_grid(.~species)

53 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species)) +
geom_point(size = 4) +
facet_grid(species~.)

54 / 62
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm)) +
geom_point() +
xlim(0, 30) +
ylim(0,70)

55 / 62

code style

The tidyverse style guide has the following convention for writing ggplot2 code.

The plus sign for adding layers + always has a space before it and is followed by a new line.

The new line is indented by two spaces. RStudio does this automatically for you.

56 / 62

Check out the ggplot flipbook for some inspiration. Find your favorite new function/feature. Share it with your neighbor.

58 / 62

ggplot extensions

There are more extensions

59 / 62

Detour: R Markdown chunk options

{r, echo=TRUE, message=FALSE}

60 / 62

(some) Chunk Options in R Markdown

echo = FALSE hides the code
message = FALSE hides messages
warning = FALSE hides warning
error = TRUE renders despite errors and displays the error
fig.cap = "Some figure caption" creates a figure caption
fig.alt = "Some alternate text for figure" creates alternate text for figures
61 / 62

Schedule for the Day

10:00 - 10:15 Introduction and Setup
10:15 - 11:15 Introduction to Toolkit and Data Basics
11:20 - 12:30 Data Visualization
1:00 - 1:45 Data Wrangling
1:45 - 2:15 Packages and External Datasets 2:15 - 2:30 Wrap Up

62 / 62
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow