Data Visualizations
are graphical representations of data
use different colors, shapes, and the coordinate system to summarize data
Data Visualizations
are graphical representations of data
use different colors, shapes, and the coordinate system to summarize data
tell a story
Data Visualizations
are graphical representations of data
use different colors, shapes, and the coordinate system to summarize data
tell a story
are useful for exploring data
First I go over different types of visualizations so that students can interpret what they see in the visuals.
The following is a useful resource for understanding histograms.
Exploring Histograms Interactively
glimpse(titanic)
## Rows: 891## Columns: 6## $ survived <lgl> FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE,…## $ pclass <chr> "Third", "First", "Third", "First", "Third", …## $ sex <fct> sex, sex, sex, sex, sex, sex, sex, sex, sex, …## $ age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58,…## $ fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4…## $ embarked <fct> Southampton, Cherbourg, Southampton, Southamp…
The data frame has been cleaned for you.
If you could speak to R in English, how would you tell R to make this plot for you?
OR
If you had the data and had to draw this bar plot by hand, what would you do?
3 Steps of Making a Basic ggplot
1.Pick data
2.Map data onto aesthetics
3.Add the geometric layer
ggplot(data = titanic, aes(x = pclass))
ggplot(data = titanic, aes(x = pclass)) + geom_bar()
titanic
data frame.pclass
to the x-axis. ggplot(data = titanic, aes(x = pclass)) + geom_bar()
## `stat_bin()` using `bins = 30`. Pick better value with## `binwidth`.
titanic
data frame.fare
to the x-axis. ggplot(data = titanic, aes(x = fare)) + geom_histogram()
ggplot(data = titanic, aes(x = fare))
ggplot(data = titanic, aes(x = fare)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with## `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with## `binwidth`.
ggplot(data = titanic, aes(x = fare)) + geom_histogram(binwidth = 15)
ggplot(data = titanic, aes(x = fare)) + geom_histogram(binwidth = 15, color = "white")
ggplot(data = titanic, aes(x = fare)) + geom_histogram(binwidth = 15, fill = "darkred")
ggplot(data = titanic, aes(x = fare)) + geom_histogram(binwidth = 15, color = "white", fill = "darkred")
ggplot(data = titanic, aes(x = pclass, fill = survived)) + geom_bar()
ggplot(data = titanic, aes(x = pclass, fill = survived)) + geom_bar(position = "fill")
Note that y-axis is no longer count but we will learn how to change that later.
ggplot(data = titanic, aes(x = pclass, fill = survived)) + geom_bar(position = "dodge")
Note that y-axis is no longer count but we will change that later.
glimpse(penguins)
## Rows: 344## Columns: 8## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adel…## $ island <fct> Torgersen, Torgersen, Torgersen, Tor…## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38…## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17…## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 19…## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 36…## $ sex <fct> male, female, female, NA, female, ma…## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, …
## Warning: Removed 2 rows containing non-finite values## (stat_ydensity).
penguins
data frame.species
to the x-axis and bill_length_mm
to the y-axis. ggplot(penguins, aes(x = species, y = bill_length_mm)) + geom_violin()
## Warning: Removed 2 rows containing non-finite values## (stat_boxplot).
penguins
data frame.species
to the x-axis and bill_length_mm
to the y-axis. ggplot(penguins, aes(x = species, y = bill_length_mm)) + geom_boxplot()
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species)) + geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species)) + geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species, size = body_mass_g)) + geom_point()
## Warning: Removed 2 rows containing missing values (geom_point).
Using either the babies
, titanic
or penguins
data frame ask a question that you are interested in answering. Visualize data to get a visual answer to the question. What is the visual telling you? Note all of this down in your lecture notes.
penguins
data, bill depth
to x-axis, bill length
to y-axis, species
to shape and color.ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point(size = 4)
penguins
data, bill depth
to x-axis, bill length
to y-axis, species
to shape.ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point(size = 4) + labs(x = "Bill Depth (mm)", y = "Bill Length (mm)", title = "Palmer Penguins")
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point() + labs(x = "Bill Depth (mm)", y = "Bill Length (mm)", title = "Palmer Penguins") + theme_bw()
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point() + labs(x = "Bill Depth (mm)", y = "Bill Length (mm)", title = "Palmer Penguins") + theme_bw() + theme(text = element_text(size=20))
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point(size = 4) + facet_grid(.~species)
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point(size = 4) + facet_grid(species~.)
ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() + xlim(0, 30) + ylim(0,70)
The tidyverse style guide has the following convention for writing ggplot2 code.
The plus sign for adding layers +
always has a space before it and is followed by a new line.
The new line is indented by two spaces. RStudio does this automatically for you.
Check out the ggplot flipbook for some inspiration. Find your favorite new function/feature. Share it with your neighbor.
patchwork
combining plots into a single plotgganimate
animated graphicsggthemes
additional set of themesggtext
improved text rendering support for ggplot2There are more extensions
echo = FALSE | hides the code |
message = FALSE | hides messages |
warning = FALSE | hides warning |
error = TRUE | renders despite errors and displays the error |
fig.cap = "Some figure caption" | creates a figure caption |
fig.alt = "Some alternate text for figure" | creates alternate text for figures |
10:00 - 10:15 Introduction and Setup
10:15 - 11:15 Introduction to Toolkit and Data Basics
11:20 - 12:30 Data Visualization
1:00 - 1:45 Data Wrangling
1:45 - 2:15 Packages and External Datasets
2:15 - 2:30 Wrap Up
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |