(5) R Packages, Part I

— LAST YEAR’S CONTENT BELOW —

26.14 Learning Objectives

This tutorial aims to get you started with package development in R. By the end of this tutorial, you’ll have the beginnings of an R package called powers (complete version). You’ll learn about key components of an R package, and how to modify them.

We’ll be going over the following topics:

  • set up the directory structure for a package and put it under version control with File -> New Project
  • define functions in R scripts located in the R directory of the package
  • use load_all and Build & Reload to simulate loading the package
  • use Check to check the package for coherence
  • use Build & Reload to properly build and install the package
  • edit the DESCRIPTION file of package metadata
  • specify a LICENSE
  • document and export the functions via roxygen2 comments
  • document the package itself via use_package_doc()
  • create documentation and manage the NAMESPACE file via document()
  • use testthat to implement unit testing
  • use a function from another package via use_package() and syntax like otherpkg::foofunction()
  • connect your local Git repo to a new remote on GitHub via use_github()
  • create a README.md that comes from rendering README.Rmd containing actual usage, via use_readme_rmd()
  • create a vignette via use_vignette() and build it via build_vignettes()

26.15 Participation

We’ll be developing the powers R package in class. Please follow along with this, developing in your participation repo.

At least, some of the development. Sometimes it might be better to just sit back and watch. I’ll try to inform you when to do what.

26.16 Resources

This tutorial is adapted from Jenny Bryan’s STAT 547 tutorial, where she develops the foofactors package.

Other resources you might find useful:

Others on specific topics:

During exercise periods, in case you’re ahead of the class and have time, you should work on Homework 7.

26.17 Motivation

Why make a package in R? Here are just a few big reasons:

  • Built-in checks that your functions are working and are sensible.
  • Easy way to store and load your data – data packages like gapminder are awesome!
  • Allows for documentation of functions that you’ve written.
  • Companion for a journal article you’re writing.

Think aid for a type of analysis, not an analysis itself.

And an R package does not need to be big!

26.18 Getting Started

Install/update the devtools package, used as an aid in package development:

install.packages("devtools")

This will do for now – for development beyond the basics, you might need to further configure your computer.

26.19 Let’s start with a single function

26.19.1 Function creation

Follow along as we make an R package called powers that contains a function square that squares its input. Let’s initiate it:

  • RStudio —> New project —> R Package
    • Initiate git (optional, but recommended).
  • Under the “Build” menu, click “Install and Restart”
  • Check out the files that have been created
    • Rd
    • NAMESPACE
    • DESCRIPTION

Now, start a new R script in the R directory, called square.R. Write a function called square that squares its input.

Build the package:

  • Build and Reload, or in newer versions of RStudio, Install and Restart.
    • This compiles the package, and loads it.
    • Try leaving the project, do library(powers), and use the function! Pretty cool, eh?

26.19.2 Documentation

The roxygen2 package makes documentation easy(er). Comment package functions with #' above the function, and use tags starting with @. Let’s document the square function.

Key tags:

  • @param – what’s the input?
  • @return – what’s the output?
  • @export – make the function available upon loading the package.

Type document() into the console (a function from the devtools package). Then Install and Restart the package.

Your function is now documented. Check it out with ?square! This happens due to the creation of an Rd file in the man folder.

26.19.3 Taking control of your NAMESPACE

Let’s start being intentional as to what appears in our NAMESPACE.

  1. Delete your NAMESPACE file.
  2. Add the @export tag to your square function to write it to the NAMESPACE.

Things that do not get @exported can still be referred to “internally” by functions in your NAMESPACE, as we’ll see soon.

26.19.4 Checking

It’s a good idea to check your package early and often to see that everything is working.

Click Check under the Build menu. It checks lots of things for you! We’ll see more examples of this.

26.19.5 Function Dependencies

Make another, more general function to compute any power:

It can go in the same R script as square, or a different one – your choice.

We’ll make square depend on pow.

Aftering Install and Restarting, you’ll notice that you can’t use pow because it’s not exported. But, square still works! We call pow an internal function.

Note: you should still document your internal function! But mention that the function is internal. Users will be able to access the documentation like normal, but still won’t be able to (easily) use the function.

If you want to be able to use internal functions as a developer, but don’t want users to have (easy) access to the functions, then run load_all instead of Install and Restart.

26.19.6 Your Turn

Make and document another function, say cube, that raises a vector to the power of 3. Be sure to @export it to the NAMESPACE. Use our internal pow function to make cube, if you have it.

Finished early? Do more – work on Assignment 7, and/or try out more documentation features that comes with roxygen2 (the @ tags).

26.20 Documentation and Testing

26.20.1 More Roxygen2 Documentation

  • \code{} for code font
  • \link{} to link to other function docs
  • Combine: \code{\link{function_name}}

Enumeration:

#' \enumerate{
#'      \item first item
#'      \item second item
#' }

Itemization:

#' \itemize{
#'      \item first item
#'      \item second item
#' }

Manually labelled list:

#' \describe{
#'      \item{bullet label 1}{first item}
#'      \item{bullet label 2}{second item}
#' }

26.20.2 DESCRIPTION file

Every R package has this. It contains the package’s metadata. Let’s edit it:

  • Add a title and brief description.
    • R is picky about these! Check out the rules.
  • Add your name.
    • Use the Authors@R field instead of the default Author and Maintainer fields.
  • Pick a license: next!

26.20.3 Pick a license

Karl Broman’s post is brief and informative.

Let’s add an MIT licence.

26.20.4 Testing with testthat

We’ve already seen package Checks – this checks that the pieces of your R package are in place, and that even your examples don’t throw errors. We should not only check that our functions are working, but that they give us results that we’d expect.

The testthat package is useful for this. Initialize it in your R package by running use_testthat().

As a template, save and edit the following script in a file called test_square in the tests/testthat folder, filling in the blanks with an expect statement:

context("Squaring non-numerics")

test_that("At least numeric values work.", {
    num_vec <- c(0, -4.6, 3.4)
    expect_identical(square(numeric(0)), numeric(0))
    FILL_THIS_IN
})

test_that("Logicals automatically convert to numeric.", {
    logic_vec <- c(TRUE, TRUE, FALSE)
    FILL_THIS_IN
})

Then, you can execute those tests by running devtools::test(), or clicking Build -> Test package.

These sanity checks are very important as your R package becomes more complex!

26.21 Higher-level User Documentation

26.21.1 Package Documentation

Just like we do for functions, we can make a manual (.Rd) page for our entire R package, too. For example, check out the documentation for ggplot2:

?ggplot2         # Can execute only if `ggplot2` is loaded.
package?ggplot2  # Always works.

To do so, just execute use_package_doc(). You’ll see a new R script come up with roxygen2-style documentation to NULL. Document as you’d do functions, and run document() to generate the .Rd file.

Here’s sample documentation:

#' Convenient Computation of Powers
#'
#' Are you tired of using the power operator, \code{^} or \code{**} in R?
#' Use this package to call functions that apply common powers
#' to your vectors.
#'
#' @name powers
#' @author Me
#' @note This package isn't actually meant to be serious. It's just for
#' teaching purposes.
#' @docType package

26.21.2 Vignettes

It’s a good idea to write a vignette (or several) for your R package to show how the package is useful as a whole. Documentation for individual functions don’t suffice for this purpose!

To write a vignette called "my_vignette", just run

use_vignette("my_vignette")

Some things happen automatically, but it’s up to you to modify the .Rmd document to provide adequate instruction. Change the template to suit your package. The only real “catch” to doing this is making sure the title is replaced in both instances.

Then just Knit, and then run build_vignettes() to build the vignettes.

Vignette woes: There seems to be resistance against building vignettes when installing. Try running install(build_vignettes=TRUE) to get it working.

26.21.3 README

Just as most projects should have a README file in the main directory, so should an R package.

Purposes:

  • Inform someone stumbling across your project what they’ve stumbled across.
    • At a high level (like “This is an R package”), but also
    • somewhat at a lower level too, like your description file. This becomes a little redundant.
  • I like to use the README to inform developers the main workflow and spirit behind developing the package.
    • There are some things that you’d want other potential developers to know about the package as a whole, yet are irrelevant to users!

How to do it:

You could just make and edit a README.md file like normal. But you’ll probably want to briefly demonstrate some code, so you’ll need an .Rmd. Let devtools set that up for you:

use_readme_rmd()

knit and you’re done!

26.21.4 Exercises

Create the above three types of documentation, without looking at my version. Then compare.

Ideally, you’ll have more to document because you’ve been working on expanding this (or another) R package for Homework 07 already.

26.22 Adding data to your R package

You can store and document datasets within R packages. Here’s one useful way.

Note: This currently doesn’t seem to be present in the companion tutorial from Jenny. Check out the R Packages “data chapter” for a resource.

Example:

Let’s add tenvec and tendf to the package:

tenvec <- 1:10
tendf <- data.frame(vec=1:10)

In the console:

  1. Store your data as R objects, as we’ve done above with tenvec and tendf.
  2. Execute use_data(tenvec, tendf) (one argument per object).

tenvec and tendf will be saved as .Rdata files in the new /data directory. These are available upon loading the package.

To document the data, for each object (i.e., for each of tenvec and tendf), put roxygen2-style documentation above the character "tenvec" and "tendf" in an R script in the /R folder.

Example for tenvec:

#' Integer vector from 1 to 10
#'
#' Self-explanatory! 
#'
#' @format What format does you data take? Integer vector.
#' @source Where did the data come from? 
"tenvec"

The @format and @source tags are unique to data documentation. Note that you shouldn’t use the @export tag when documenting data!

26.23 Dependencies

We can use functions from other R packages within our homemade R package, too. We need to do two things:

  • Use the syntax package_name::function_name() whenever you want to use function_name from package_name.
  • Indicate that your R package depends on package_name in the DESCRIPTION file by executing the command use_package("package_name").

There are other methods, but this is the easiest.

Example: Add ggplot2 dependency to plot the resulting computations. Do so by adding a plot to pow – change pow’s guts to the following:

res <- x^p
if (showplot) {
    p <- ggplot2::qplot(x, res)
    print(p)
}
res

Note 1: Here’s an example of the benefits of not having your functions do too much – I only needed to change pow alone to get the changes to work for square and cube.

Note 2: It’s probably better to use Base R’s plotting here, so that your package is as stand-alone as possible. We use ggplot2 for expository purposes.

26.24 Launching your Package to GitHub

If I want to put an R package on GitHub, I typically just:

  1. Click “New” in GitHub to make a new repo. Don’t initialize with README.
  2. Follow the instructions github provides, which involves two lines to execute in the terminal.
    • Those two lines can be found here in Jenny’s Happy git book.

There is also the use_github() way – although, to me, it seems overly complicated (perhaps there’s an advantage I don’t know about). It’s just a matter of following the instructions, which are not worth demonstrating here.

26.25 Time remaining?

If there’s time remaining, we’ll check out S3 OO programming in R.

  1. Add a “class” to the output of pow.
  2. Add some methods:
print.pow <- function(x) {
    cat(paste("Object of class 'pow',", head(x)))
    invisible()
}

#' @export
bind.pow <- function(x) paste(x, collapse=".")

bind <- function(x) UseMethod("bind")