When I started using R in 2006, project specific functions were contained in a collection of R script files. The functions are made available within an R session when they are “sourced” by an analysis R script. It was an easy way of creating and using custom functions. Unfortunately, this method was horrible to maintain. It reduced the likelihood, quality, and accessibility of each function’s documentation, proved difficult and complex when sharing the functions, and prevented the inclusion of unit tests. Identifying the most recent version of the function often required finding various copies of the file containing the function(s). Time was spent -- in retrospect wasted -- comparing and merging the different versions of the file in an attempt to create the most current version of the file and the contained functions. The pain was continued when trying to determine the required type and format of the data passed to the function’s parameters. It was a frustrating experience and an excellent cautionary tail on technical debt. One I am ashamed to admit lasted as long as it did.
About six years ago I started converting collections of R functions -- used together to accomplish an overall goal -- into packages. This resulted in two types of packages: (i) those for general modeling, analysis, and utilities and (ii) project specific packages. Creating packages instead of loose collections of functions is now a mainstay of my workflow. When starting a project, a new package is created containing project-specific functions while new functions applicable to many projects are added to established packages. This framework has overcome the following barriers:
The move from a collection of sourced function to R packages has greatly improved reproducibility, reduced the amount of time maintaining functions, and allows me to focus on converting the data into information to aid in making better, informed decisions.
I routinely reference the following resources when building packages. They are great for everyone building R packages:
Overviews of writing R packages
Documentation
Testing your R package
About six years ago I started converting collections of R functions -- used together to accomplish an overall goal -- into packages. This resulted in two types of packages: (i) those for general modeling, analysis, and utilities and (ii) project specific packages. Creating packages instead of loose collections of functions is now a mainstay of my workflow. When starting a project, a new package is created containing project-specific functions while new functions applicable to many projects are added to established packages. This framework has overcome the following barriers:
- Shareability. When the functions were contained in R script files, and had to be sourced into the R session, it was hard to share them efficiently and properly with collaborators. The documentation was limited and hard to access, there were no unit tests, and it was difficult to make sure everyone was using the same version of the functions. Now, the functions are provided as an R package and they are easily install and update. The package contains documentation, unit tests, and examples for the functions. These small infrastructure features of R packages makes using a package seamless and intuitive.
- Documentation. In the past my functions had minimal -- if any -- documentation and the documentation was not available from within an R session. Without documentation, the function’s purpose was not stated, descriptions of the requirements of each parameter were not present, the results returned by the function were missing, and examples of how to use the function not readily available. Additionally, the references and list of related functions were not provided. The devtools R package [ CRAN | GitHub ] paradigm -- incorporating rOxygen documentation methodology -- provides the structure and ease to create documentation, above the function’s code, while the function is being developed. This removes the added step of writing and updating the function’s documentation in a separate documentation files while developing R packages. The ability to format the documentation using markdown [ Wikipedia | John Gruber | markdown-here Cheat Sheet | RStudio's markdown ] has greatly improved the documentation experience. All of these small conveniences results in a huge payoff of easily creating useful and comprehensive documentation.
- Unit tests. An automated mechanism to ensure the functions within the R package always performs properly are called unit tests because they test individual functions and are commonly designed for a function performing a singular task, a unit. While users might not see the immediate benefit of unit tests, including them for a majority of the functions ensures the functions perform properly for everyone regardless of the computing platform. This is especially important when the package is used on computers other than the one they are developed. This is becoming more important because of the migration to differing cloud computing infrastructure. While all of my functions do not have unit tests, all of the small, single task functions do. The unit tests are based on the problem used to develop the function and provide excellent examples for the documentation.
The move from a collection of sourced function to R packages has greatly improved reproducibility, reduced the amount of time maintaining functions, and allows me to focus on converting the data into information to aid in making better, informed decisions.
I routinely reference the following resources when building packages. They are great for everyone building R packages:
Overviews of writing R packages
- Hilary Parker’s timeless Writing an R package from scratch
- Karl Broman’s R package primer
- Hadley Wickham’s R packages website and book
- RStudio’s Developing [R] packages with RStudio and their Package Development Cheat Sheet
- The R Core Team's venerable Writing R Extensions document
Documentation
- Oxygen2 [ rOxygen2 | RdOxygen2 ]
- R Markdown [ RStudio | the Cheat Sheet | the Reference Guide ]
- Vignettes [ knitr documentation | Karl Broman’s tutorial ]
Testing your R package
- testthat
- Web service to check package for Windows compatibility (also good for an initial test before submitting to CRAN)