Class Meeting 6 Intro to data wrangling, Part I
Worksheet: You can find a worksheet template for today here.
6.1 Today’s Lessons
Today we’ll introduce the dplyr package. Specifically, we’ll look at these three lessons:
- Intro to
dplyrsyntax - The
dplyradvantage - Relational/comparison and logical operators in R
6.2 Resources
All three of today’s lessons are closely aligned to the stat545: dplyr-intro.
More detail can be found in the r4ds: transform chapter, up until and including the select() section. Section 5.2 also elaborates on relational/comparison and logical operators in R
Here are some supplementary resources:
- A similar resource to the r4ds one above is the intro to dplyr vignette, up until and including the
select()section. - Want to read more about piping? See r4ds: pipes.
6.3 Participation
To get participation points for today, we’ll be filling out the cm006-exercise.Rmd file, and adding it to your participation repo.
6.4 Intro to dplyr syntax
6.4.1 Learning Objectives
Here are the concepts we’ll be exploring in this lesson:
- tidyverse
dplyrfunctions:- select
- arrange
- piping
By the end of this lesson, students are expected to be able to:
- subset and rearrange data with
dplyr - use piping (
%>%) when implementing function chains
6.4.2 Preamble
Let’s talk about:
- The history of
dplyr:plyr - tibbles are a special type of data frame
- the tidyverse
6.4.3 Demonstration
Let’s get started with the exercise:
- Open RStudio, and download the
tidyversemeta-package by executinginstall.packages("tidyverse")into the R console. - Optional: open the
STAT545_participationRStudio project in RStudio. - With RStudio, open the
cm006-exercise.Rmdfile you downloaded and committed earlier. - Follow the instructions in the
.Rmdfile until the resume lecture section.
6.5 Small break
Here are some things you might choose to do on this break:
- Talk with a TA, Vincenzo, or your neighbour(s) about the content so far.
- Attempt the bonus exercises on the
cm006-exercise.Rmdfile. - Work on an assignment.
6.6 The dplyr advantage
6.6.1 Learning Objectives
By the end of this lesson, students are expected to be able to:
- Have a sense of why
dplyris advantageous compared to the “base R” way with respect to good coding practice.
Why?
- Having this in the back of your mind will help you identify qualities of and produce a readable analysis.
6.6.2 Compare base R to dplyr
Self-documenting code.
This is where the tidyverse shines.
Example of dplyr vs base R:
gapminder %>%
filter(country == "Cambodia") %>%
select(year, lifeExp)
vs.
gapminder[gapminder$country == "Cambodia", c("year", "lifeExp")]
No need to take excerpts.
Wrangle with dplyr first, then pipe into a plot/analysis.
OR, use the subset argument that’s often offered by R functions like lm().
Especially don’t use magic numbers to subset!
Note that you need to use the assignment operator to store changes!
6.7 Relational/Comparison and Logical Operators in R
6.7.1 Learning Objectives
Here are the concepts we’ll be exploring in this lesson:
- Relational/Comparison operators
- Logical operators
dplyrfunctions:- filter
- mutate
By the end of this lesson, students are expected to be able to:
- Predict the output of R code containing the above operators.
- Explain the difference between
&/&&and|/||, and name a situation where one should be used over the other. - Subsetting and transforming data using filter and mutate
6.7.2 R Operators
Arithmetic operators allow us to carry out mathematical operations:
| Operator | Description |
|---|---|
| + | Add |
| - | Subtract |
| * | Multiply |
| / | Divide |
| ^ | Exponent |
| %% | Modulus (remainder from division) |
Relational operators allow us to compare values:
| Operator | Description |
|---|---|
| < | Less than |
| > | Greater than |
| <= | Less than or equal to |
| >= | Greater than or equal to |
| == | Equal to |
| != | Not equal to |
- Arithmetic and relational operators work on vectors.
Logical operators allow us to carry out boolean operations:
| Operator | Description |
|---|---|
| ! | Not |
| | | Or (element_wise) |
| & | And (element-wise) |
| || | Or |
| && | And |
- The difference between
|and||is that||evaluates only the first element of the two vectors, whereas|evaluates element-wise.
6.7.3 Demonstration
Continue along with the cm006-exercise.Rmd file.
6.8 If there’s time remaining
- Let’s do the bonus exercises together, in the
cm006-exercise.Rmdfile. - Another “break”