Class Meeting 10 Tibble Joins
Today’s topic is on operations with two or more tibbles.
10.1 Worksheet
You can find a worksheet template for today here.
10.2 Resources
- Jenny’s join cheatsheet
- “two-table verbs”’s vignette
- Relational Data chapter in “R for Data Science”.
- dplyr cheatsheet
For an overview of operations involving multiple tibbles, check out Jenny’s Chapter 14 in stat545.com.
For more activities, check out Rashedul’s guest lecture material from 2018.
10.3 Join Functions (25 min)
Often, we need to work with data living in more than one table. There are three main types of operations that can be done with two tables (as elaborated in r4ds Chapter 13 Introduction):
- Mutating joins add new columns to the “original” tibble.
- Filtering joins filter the “original” tibble’s rows.
- Set operations work as if each row is an element in a set.
- Binding stacks tables on top of or beside each other, with
bind_rows()
andbind_cols()
.
Let’s navigate to each of these three links, which lead to the relevant r4ds chapters, and go through the concepts there. These have excellent visuals to explain what’s going on.
Then, let’s go through Jenny’s join cheatsheet for examples.
10.4 Activity (25 min)
Let’s complete today’s worksheet.
In case you can’t download the singer
package, just load the data by running these two lines
songs <- read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/singer/songs.csv")
locations <- read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/singer/loc.csv")
10.5 Time remaining?
Let’s return to the exercises from either:
- tidyr last class
- ggplot2 the class before