Introduction to Git and GitHub for RStudio Users

Workshop at Cypress College

Mine Dogucu
University of California Irvine

2024-04-29

Preparation

GitHub usernames

Check to see if you already have git

On Windows, we will check in person

Make sure RStudio sees git

Download and install git on Windows

Git for Windows

Download and install git on Mac

Introduce yourself to Git

Setup SSH key

The fun part

Does this look familiar?

  • hw1

  • hw1_final

  • hw1_final2

  • hw1_final3

  • hw1_finalwithfinalimages

  • hw1_finalestfinal

What if we tracked our file with a better names for each version and have only 1 file hw1?

  • hw1 added questions 1 through 5

  • hw1 changed question 1 image

  • hw1 fixed typos

We will call the descriptions in italic commit messages.

git vs. GitHub

  • git allows us to keep track of different versions of a file(s).

  • GitHub is a website where we can store (and share) different versions of the files.

Demo

Tip

Always use .Rproj file to open projects. Then open the appropriate .qmd / .R file from the Files pane. If you don’t open .Rproj file you will not be able to see the Git pane.

Cloning a repo

repo is a short form of repository. Repositories contain all of your project’s files as well as each file’s revision history.

To clone a GitHub repo to our computer, we first copy the cloning link as shown in screencast then start an RStudio project using that link.

Cloning a repo pulls (downloads) all the elements of a repo available at that specific time.

Commits

Once you make changes to your repo (e.g. take notes during lecture, answer an activity question) you can take a snapshot of your changes with a commit.

This way if you ever have to go back in version history you have your older commits to get back to.

This is especially useful, for instance, if you want to go back to an earlier solution you have committed.

Push

All the commits you make will initially be local (i.e. on your own computer).

In order for us to see your commits and your final submission on any file, you have to push your commits. In other words upload your files at the stage in that specific time.

(An incomplete) Git/GitHub glossary

Git: is software for tracking changes in any set of files

GitHub: is an internet host for Git projects.

repo: is a short form of repository. Repositories contain all of your project’s files as well as each file’s revision history.

clone: Cloning a repo pulls (downloads) all the elements of a repo available at that specific time.

commit: A snapshot of your repo at a specific point in time. We distinguish each commit with a commit message.

push: Uploads the latest “committed” state of your repo to GitHub.

Do you git it?

README.md

  • README file is the first file users read. In our case a user might be our future self, a teammate, or (if open source) anyone.

  • There can be multiple README files within a single directory: e.g. for the general project folder and then for a data subfolder. Data folder README’s can possibly contain codebook (data dictionary).

  • It should be brief but detailed enough to help user navigate.

  • a README should be up-to-date (can be updated throughout a project’s lifecycle as needed).

  • On GitHub we use markdown for README file (README.md). Good news: emojis are supported.

README examples

Collaboration on GitHub

Collaboration on GitHub

Collaboration on GitHub

Collaboration on GitHub

Collaboration on GitHub

Collaboration on GitHub

Collaboration on GitHub

Collaboration on GitHub

If each change is made by one collaborator at a time, this would not be an efficient workflow.

Collaboration on GitHub

Collaboration on GitHub

Collaboration on GitHub

Collaboration on GitHub

1 - commit

2 - pull (very important)

3 - push

Collaboration on GitHub

Collaboration on GitHub

Collaboration on GitHub

Opening an issue

We can create an issue to keep a list of mistakes to be fixed, ideas to check with teammates, or note a to-do task. You can assign tasks to yourself or teammates.

Closing an issue

If you are working on an issue, it makes sense to refer to issue number in your commit message (e.g. “add first draft of alternate texts for #4”). If your commit resolves the issue then you can use key words such as “fixes #4” or “closes #4” to close the issue. Issues can also be manually closed.

.gitignore

A .gitignore file contains the list of files which Git has been explicitly told to ignore.

For instance README.html can be git ignored.

You may consider git ignoring confidential files (e.g. some datasets) so that they would not be pushed by mistake to GitHub.

A file can be git ignored either by point-and-click using RStudio’s Git pane or by adding the file path to the .gitignore file. For instance weather.csv data file in a data folder need to be added as data/weather.csv

Files with certain files (e.g. all .log files) can also be ignored. See git ignore patterns.

Acknowledgments

Supported by NSF HDR DSC award #2123366

Blog Posts for Students

https://www.datapedagogy.com/#category=for%20students

THANK YOU

minedogucu.com
mdogucu
MineDogucu
mastodon.social/@MineDogucu
minedogucu