Tools and recommendations for reproducible teaching

Talk at TALMO

Mine Dogucu
University of California Irvine
Mine Çetinkaya-Rundel
Duke University

2025-03-12

Teacher-Scholar’s Toolkit

Purpose Tool
Research R
Teaching PowerPoint
Learning Graphing Calculator

Teacher-Scholar’s Toolkit

Purpose Tool
Research R
Teaching R
Learning R

Teacher-Scholar’s Toolkit

Purpose Tool
Research R
Teaching PowerPoint
Learning Graphing Calculator
Purpose Tool
Research R
Teaching R
Learning R

Why?

  • Course management
  • Role modeling
  • Sharing course materials with others

Framework for Reproducible Teaching

All teaching materials should be

  • computationally reproducible,

  • well-documented, and

  • open.

Computational Reproducibility

Literate programming

Two fuzzy round monsters dressed as wizards, working together to brew different things together from a pantry (code, text, figures, etc.) in a cauldron labeled “R Markdown”. The monster wizard at the cauldron is reading a recipe that includes steps “1. Add text. 2. Add code. 3. Knit. 4. (magic) 5. Celebrate perceived wizardry.” The R Markdown potion then travels through a tube, and is converted to markdown by a monster on a broom with a magic wand, and eventually converted to an output by pandoc. Stylized text (in a font similar to Harry Potter) reads 'R Markdown. Text. Code. Output. Get it together, people.'

Literate programming - Websites

Screenshot of course website for Stats 6 - Introduction to Data Science that shows the course logo with the URL introdata.science and a brief description of the course.

Literate programming - Slides

Screenshot of Lecture 1 slides on a deep dive into ggplot2 layers from STA 313 - Spring 2024 by Mine Çetinkaya-Rundel at Duke University.

Literate programming - Exams

Screenshot of a PDF containing midterm exam question example for Stats 67 at UCI.

Raw data - Real data


A diagram for going from raw data to cleaning script to clean data.

Keep data in the raw form you find or collect it, and record any steps to process it to prepare it for teaching.

Raw data - Simulated data

set.seed(123)
sample(1:100, size = 3, replace = FALSE)
[1] 31 79 51
set.seed(123)
sample(1:100, size = 3, replace = FALSE)
[1] 31 79 51
set.seed(123)
sample(1:100, size = 3, replace = FALSE)
[1] 31 79 51
set.seed(123)
sample(1:100, size = 3, replace = FALSE)
[1] 31 79 51
set.seed(123)
sample(1:100, size = 3, replace = FALSE)
[1] 31 79 51

File organization


week-11-simple-linreg
|-- week-11-simple-linreg.Rproj
|-- README.md
|-- data
    |-- README.md
    |-- bike-rental.csv
    |-- birth-weight.csv
|-- lectures
    |-- lec-11a-simple-linreg.qmd
    |-- lec-11b-indicator-var.qmd
    |-- lec-11c-assumptions.qmd
|-- quizzes
    |-- quiz-11a-simple-linreg.qmd
    |-- quiz-11b-indicator-var.qmd
    |-- quiz-11c-assumptions.qmd


Adopt a project-based workflow to avoid changing file paths.

The file names should

  • be machine-readable

  • be human-readable

  • play well with default-ordering

File organization


read.csv(here::here("data","bike-rental.csv"))


Use here::here() to set the file path to the root of the project.

Version control

A quote by Jenny Bryan that reads 'Collaboration is the most compelling reason to manage a project with Git and GitHub. My definition of collaboration includes hands-on participation by multiple people, including your past and future self, as well as an asymmetric model, in which some people are active makers and others only read or review.' and illustrations of six colorful monsters by Allison Horst.

Version control

Diff due to upgrading R from 3.5.3 to 4.1.0

Version control

A blackboard with the GitHub octocat logo and GitHub Classroom logo with the text 'Your course assignments on GitHub'.

Documentation

Data documentation

Document contents and provenance of data files:

Option 1:

A single README.md file with variable descriptions and provenance.

Option 2:

  • A CSV file with variable descriptions with columns name and description.
  • A README.md file with data provenance.

Use plain-text formats, not spreadsheets like Excel or Google Sheets, for version control compatibility.

Folder documentation

Include a plain-text README file in the course folder and each top-level subfolder explaining the folder’s contents and outlining steps to reproduce materials, e.g., quarto render.

README.md
|-- slides
    |-- README.md
    |-- deck-1.qmd
    |-- deck-2.qmd
    |-- ...
|-- labs
    |-- README.md
    |-- lab-1.qmd
    |-- lab-2.qmd
    |-- ...
|-- data
    |-- bike-rentals
        |-- README.md
        |-- bike-rentals.csv
    |-- sales-taxes
        |-- README.md
        |-- sales-taxes.csv
|-- ...

Software documentation

Minimum:

Document versions of R, Quarto, packages, etc. README files, manually updated or programmatically generated with session_info().

Medium:

Use renv (R) or venv (Python) to preserve the computational environment.

Advanced:

Use Docker containers for full environment snapshots.

Reproducibility in communication

Use the reprex package to generate reproducible code snippets and share session details when answering student questions.

  1. Copy code:
x <- 1:2
y <- 1:4
x + y
  1. Run reprex::reprex().

  2. Paste from clipboard:

x <- 1:2
y <- 1:4
x + y
#> [1] 2 4 4 6

What’s your style?

Option 1:

modelweight<-lm(mpg~wt,data=subset(mtcars,am==1),na.action=na.exclude)


or Option 2:

model_weight <- lm(
  mpg ˜ wt,
  data = subset(mtcars, am == 1),
  na.action = na.exclude
)


or Option 3: Who cares? It doesn’t matter!

Style guide

Openness

Gold standard

Openly share materials on publicly hosted websites to increase accessibility and reusability.

Challenge: University policies on hosting course materials in password-protected LMS – consistency for students across courses but limits broader access.

Consider posting materials both on university platforms (for students) and publicly (for wider access). Open access benefits learners worldwide, while open-source sharing allows other educators to adapt and reuse materials.

Licensing

Always release materials with a license, whether open-access or open-source.

Non-software:

Consider whether to allow derivative works and re-sharing (e.g., share-alike options):

  • Creative Commons (CC) licenses are recommended.
  • CC BY-NC 4.0: Allows reuse with attribution but restricts commercial use (useful for future textbooks).
  • CC BY 4.0: Allows both commercial and non-commercial use (e.g., corporate training, MOOCs).

Software:

  • MIT License
  • General Public License (GPL) etc.

Educators should familiarize themselves with licensing options before making a choice. Recommended reading: The four R’s of openness and ALMS analysis: frameworks for open educational resources by Hilton III et al. (2010) on licensing and open educational resources.

Coda

Example - Frontend

Course website for STA 199 - Fall 2024 Introduction to Data Science and Statistical Thinking that shows a navigation bar on the left with the course hex logo and the schedule of the course on the right.

Example - Backend

GitHub repo for the course website for STA 199 - Fall 2024. README in the GitHub repo for the course website for STA 199 - Fall 2024 that shows the color choices, instructions for rendering and publishing.

Challenges and benefits

  • Rarely, if ever, an investment in reproducibility doesn’t pay off!

  • Adopting the full framework likely requires learning new tools and keeping your (and your co-instructors’ + TAs’) knowledge of tools and ecosystems up to date, which is no small feat!

  • Openness brings

    • free learning materials to your discipline and community,
    • invites kudos from colleagues using your materials,
    • generates opportunities for collaboration and recognition, and
    • can serve as a role model for your students.

See also

QUESTIONS?


Slides at mdogucu.github.io/talmo-25.

Source code for slides at github.com/mdogucu/talmo-25.