Class Meeting 16 (2) Git workflows

16.1 Today’s Agenda

  • Announcements:
    • Reminder about Assignment 1 and milestone 1 due on Saturday at 6 PM
    • By now you should have already gotten in touch with your partner. If you are having troubles, please send me a private message on canvas
    • If you would like someone from the teaching team to approve your dataset, please create an issue here
  • Part 1: Git and GitHub for collaborative work (20 mins)
    • Sensible workflow for local work
    • Setting up your project repositories
    • Dealing with merge conflicts
  • Part 2: More about fork and clone workflow

16.2 Part 1: Git and GitHub for collaborative work

  • Git class demo with Firas & Yulia
  • Class activity with your partner
    • Try to sit with your project partner
    • If you are auditing the course, try to find another who is so you can do this part with someone else
  • Tips and tricks on how to handle merge conflicts.

16.2.1 Create your project repo and add your partner as admin

  1. Go to the STAT547 - Spring 2020 organization
  2. On the right of the page, click the green “New” button
  3. For the repository name, set it to “group_xx_yy” where xx is your group number (from canvas) and yy is an optional name of your repo
  4. Keep this repo private (for now)
  5. Initialize the repo with a README, add a .gitignore file (for R) and a license (MIT is fine).
  6. Once at the home page of your repo, click “Settings”, then “Manage Access”
  7. Add your team member, and give them Admin access.
  8. THERE IS NO NEED TO add any of the teaching team to your repo, we will have access as long as it’s created inside the STAT547 organization.

16.2.2 Setup your local directories, and RStudio to work nicely with GitHub

  1. Locally on my machine, create a new folder called stat547 in a sensible place:
  2. Create folders for each of your main areas: Discussion, assignments, project, participation
  3. On GitHub.com visit each of these repositories and copy the git repo URL into these spots
  4. Create a new RStudio project for each of the repos above
    • Open RStudio then click File >> New Project >> Version Control >> Git
    • In the “Repository URL” field, paste the URL for one of the repos above
    • In the “Project directory name” field, you should leave this as the name of the repo to avoid confusion
    • In the “Create project as subdirectory of:” field, use the from Step 1.
    • Click “Create Project”
    • You should repeat this step (Step 4) for every assignment that you accept in STAT547.
  5. Now you have correctly set up RStudio with Git so you can commit, pull, push, etc… right from RStudio!
  6. At the beginning of every lecture, open the “Discussion” RStudio project. Then click on the “Git” tab, and then pull the latest cm1xx_participation.Rmd file directly to your computer. Copy it to your participation repo, commit it, and then push.

16.2.3 Activity 1: Fork and clone workflow

For your team project, you should be working in a “fork and clone” workflow (unless you consider yourself a git master). More details about this workflow are described here in happygitwithr.com

Here is what you need to do:

  1. Go to your project repo, called group_xx_yy
  2. Click the “fork” button on the repo home page
  3. Make sure you fork the repo within the STAT547 organization
  4. Copy the URL of YOUR fork of the main project repo
    • Git URL of YOUR fork: ## LINK TO THIS REPO
  5. Also grab the URL of the MAIN PROJECT REPO
    • Git URL of MAIN PROJECT REPO: ## LINK TO THIS REPO
  6. Create a new RStudio project and repeat Step 4 from the previous section for YOUR FORK of the main project repo
  7. Now, we need to “link” the main project repoto YOUR FORK so you can send your changes back to the main project repo, and also receive any changes your partner made. To do this:
    • In RStudio, find the “Terminal” button and click on it
    • Now you’re in a Terminal/Console (yes, within RStudio, weird - I know)
    • Make sure you’re in the correct directory (YOUR FORK cloned locally) by typing in pwd
    • Type this command to link the MAIN PROJECT repo to your fork:
      • git remote add upstream <MAIN_PROJECT_REPO_URL>
      • Done!
  8. To check if it worked, type this in the RStudio console:
    • git fetch upstream
    • If there are changes it will let you know what they are
    • Once you’re ready to pull the changes into YOUR FORK of the MAIN PROJECT repo, use this command:
    • git merge upstream/master
    • This will merge all the changes from the upstream URL () into your fork
    • Do this often!
  9. It is best practice to create a new branch in a forked repo so your work isn’t committed to master:
    • git checkout -b name_of_your_new_branch

16.2.4 Activity 2: Send a PR from YOUR FORK back to the MAIN PROJECT

  1. Will be demo’d

16.2.5 Activity 3: “Catching up”

  1. Accept a PR from your partner in the MAIN PROJECT repo
  2. Merge those changes into YOUR FORK using git merge upstream/master
    • Deal with any merge conflicts that arise
  3. Send another PR now with even more changes, hopefully if everything was done correctly, the new changes should allow us to easily merge the PR in with the main code.

16.2.6 (Optional) Activity 4: Class wide

You can do this all on GitHub.com - no need to clone locally.

  1. Fork this git demos repo
  2. Create a branch called ‘my_change’
  3. Make a change in the introductions.md file
  4. Create a PR and send it to the main git_demos repo
  5. Let’s see if we can get all 30 PRs merged in!

16.3 Part 2: Fork and clone workflow

This section of the lecture notes have been adapted from Jenny Bryan’s HappyGitWithR

For the team projects in STAT547, we will be using the “Fork and clone” method to work collaboratively together and send changes back to the main repository (that each partner forked) via Pull Requests. In general, use “fork and clone” to get a copy of someone else’s repo if there’s any chance you will want to propose a change to the owner, i.e. send a “pull request”. There are several other workflows for working collaboratively, and another popular one is called branch and PR - but that is not what we’ll be doing in this course.

16.3.1 Initial workflow

Sign in to GitHub navigate to the repo of interest. Think of this as OWNER/PROJECT, where OWNER is the user or organization who owns the repository named PROJECT.

In the upper right hand corner, click Fork. For each GitHub account, you can only fork the repo once. Subsequent clicking on the FORK button will tell you that you’ve already forked this repo.

This creates a copy of PROJECT in your GitHub account and takes you there in the browser. Now we are looking at YOU/PROJECT. One way to tell whether you are in your fork (YOU/PROJECT), or the original OWNER/PROJECT is to check the name of the repo on the repo home page. See below for markers:

Here are the next steps:

  1. Copy the “CLONE URL” of YOU/PROJECT (it should look something like https://github.com/YOU/PROJECT.git)
  2. Open R Studio on your computer
  3. Create a new “Version Control >> Git” project in the appropriate location
  4. For the “repository URL” specify the CLONE URL from step 1

We’re doing this:

### Don’t mess with master {#dont-touch-master}

If you make any commits in your local repository, you should work in a new branch, not master.

You shouldn’t make (code) commits to master of a repo you have forked. This will make your life much easier if you want to pull upstream work into your copy. The OWNER of PROJECT will also be happier to receive your pull request from a non-master branch.

16.3.2 The original repo as a remote

Remember we are here:

Here is the current situation in words:

  • You have a fork YOU/PROJECT, which is a repo on GitHub.
  • You have a local clone of your fork.
  • Your fork YOU/PROJECT is the remote known as origin for your local repo.
  • You are well positioned to make a pull request to OWNER/PROJECT.

But notice the lack of a direct connection between your local copy of this repo and the original OWNER/PROJECT. This is a problem - how will you ever get changes made to the original project?

As time goes on, the original repository OWNER/PROJECT will continue to evolve. You probably want the ability to keep your copy up-to-date. In Git lingo, you will need to get the “upstream changes”. The next section deals with that.

16.4 Part 3: Getting changes from “upstream”

This workflow is relevant if you have done fork and clone and now you need to pull subsequent changes from the original repo into your copy.

You should set this up right away after you fork and clone, even though you don’t need it yet.

Vocabulary: OWNER/PROJECT refers to the original GitHub repo, owned by OWNER, who is not you. YOU/PROJECT refers to your copy on GitHub, i.e. your fork.

16.4.1 No, you can’t do this via GitHub

You might hope that GitHub could automatically keep your fork YOU/PROJECT synced up with the original OWNER/PROJECT. Or that you could do this in the browser interface. Then you could pull those upstream changes into your local repo.

But you can’t.

There are some tantalizing, janky ways to sort of do parts of this. But they have fatal flaws that make them unsustainable. I believe you really do need to add OWNER/PROJECT as a second remote on your repo and pull from there.

16.4.2 Initial conditions

Get into the repo of interest, i.e. your local copy. For many of you, this means launching it as an RStudio Project. You’ll probably also want to open a terminal within RStudio for some Git work via Tools > Terminal > New Terminal.

Make sure you are on the master branch and your “working tree is clean”. git status should show something like:

BTW I recommend that you never make your own commits to the master branch of a fork. However, if you have already done so, your situation is addressed here in section 29.8.

16.4.3 List your remotes

Let’s inspect the current remotes for your local repo. In the shell (Appendix ??):

Most of you will see output along these lines (let’s call this BEFORE):

There is only one remote, named origin, corresponding to your fork on GitHub. This figure depicts a BEFORE scenario:

This is sad, because there is no direct connection between OWNER/PROJECT and your local copy of the repo.

The state we want to see is like this (let’s call this AFTER):

Notice the second remote, named upstream, corresponding to the original repo on GitHub. This figure depicts AFTER, the scenario we want to achieve:

16.4.4 Add the upstream remote

Let us add OWNER/PROJECT as the upstream remote.

On GitHub, make sure you are signed in and navigate to the original repo, OWNER/PROJECT. It is easy to get to from your fork, YOU/PROJECT, via “forked from” links near the top.

Use the big green “Clone or download” button to get the URL for OWNER/PROJECT on your clipboard. Be intentional about whether you copy the HTTPS or SSH URL.

16.4.4.1 Command line Git

Use a command like this, but make an intentional choice about using an HTTPS vs SSH URL.

The nickname upstream can technically be whatever you want but there is a strong convention/tradition of using upstream

16.4.4.2 RStudio

This feels a bit odd, but humor me. Click on “New Branch” in the Git pane.

]

This will reveal a button to “Add Remote”. Click it. Enter upstream as the remote name and paste the URL for OWNER/PROJECT that you got from GitHub. Click “Add”. Decline the opportunity to add a new branch by clicking “Cancel”.

16.4.5 Verify your upstream remote

Let’s inspect the current remotes for your local repo AGAIN. In the shell:

Now you should see something like

Notice the second remote, named upstream, corresponding to the original repo on GitHub. We have gotten to this:

16.4.6 Pull changes from upstream

Before we pull changes down, let’s first do a quick peek to see what WOULD happen if we pulled upstream changes.

Okay, now that we’re satisfied there are some changes, let’s pull the changes that we don’t have from OWNER/PROJECT into our local copy.

This says: “pull the changes from the remote known as upstream into the master branch of my local repo”. We are being explicit about the remote and the branch in this case, because (as our git remote -v commands reveal), upstream/master is not the default tracking branch for local master.

See Um, what if I did touch master? to get yourself back on the happy path.

16.4.7 Push these changes to origin/master

Feel free to push the newly updated state of local master to your fork YOU/PROJECT and enjoy the satisfaction of being “caught up” with OWNER/PROJECT.

In the shell:

Or use the green “Push” button in RStudio.

16.4.8 Um, what if I did touch master?

I told you not to!

But OK here we are.

Let’s imagine this is the state of the original repo OWNER/PROJECT:

and and this is the state of the master branch in your local copy:

The two histories agree, up to commit or state C, then they diverge.

If you want to preserve the work in commits X, Y, and Z, create a new branch right now, with tip at Z, via git checkout -b my-great-innovations (pick your own branch name!). Then checkout master via git checkout master.

I now assume you have either preserved the work in X, Y, and Z (with a branch) or have decided to let it go.

Do a hard reset of the master branch to C.

You will have to figure out how to convey C in Git-speak. Specify it relative to HEAD or provide the SHA. See future link about resets for more support.

The instructions above for pulling changes from upstream should now work. Your master branch should reflect the history of OWNER/PROJECT:

If you chose to create a branch with your work, you will also have that locally:

If you pushed your alternative history (with commits X, Y, and Z) to your fork YOU/PROJECT and you like keeping everything synced up, you will also need to force push master via git push --force, but we really really don’t like discussing force pushes in Happy Git. We only do so here, because we are talking about a fork, which is fairly easy to replace if things so sideways.