--- title: "Modules: Organizing R Source Code" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Modules: Organizing R Source Code} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction This vignette explains how to use modules outside of R packages as a means to organize a project or data analysis. Using modules we may gain some of the features we also expect from packages but with less overhead. A lot of R projects run into problems when they grow. Even relatively simple data analysis projects can span a thousand lines easily. R has two important building blocks to organize projects: functions and packages. However packages do present a hurdle for a lot of users with little programming background. In those cases we often rely on splitting up the code base into files and *source* them into our R session (referring to the function `source`). Modules, in this context, present a more sophisticated way to *source* files by providing three important features: - (Imports) loading a package is local to a module and avoids name clashes in the global environment. - (Exports) variable assignment are local to a module and (a) do not pollute the global environment and (b) hide details of a module. - Modules make it easy to spread your code base across files and reuse them when needed. Each file is self contained. # Example You can load scripts as modules when you refer to a file (or directory) in a call to `use`. Inside such a script you can use `import` and `use` in the same way you typically use `library`. Consider the following example where we create a module in a temporary file with its dependencies. ```{r} code <- " import('stats', 'median') functionWithDep <- function(x) median(x) " fileName <- tempfile(fileext = ".R") writeLines(code, fileName) ``` Then we can load such a module into this session by the following: ```{r} library(modules) m <- use(fileName) m$functionWithDep(1:2) ``` # Pseudo-code example To give a bit more context of how you can structure a project, consider the following file structure: ``` / /R munging.R graphics.R /data some.csv /results /tables ... /figs main.R README.md ``` You put all your R code into the `R` folder. This folder may or may not have a nested folder structure itself. You probably have a folder for your data and one into which you store all results. The important part here is that you have split your code base into different files. `main.R` in the project root acts as the *master* file in this example. This file kicks of all steps of our analysis and *connects the dots*. `munging.R` and `graphics.R` implement helper functions. **main.R** ```{r eval = FALSE} lib <- modules::use("R") dat <- read.csv("data/some.csv") # munging dat <- lib$munging$clean(dat) dat <- lib$munging$recode(dat) # generate results lib$graphics$barplot(dat) lib$graphics$lineplot(dat) ``` The `main.R` file implements no logic of the analysis. Its responsibility is to connect all steps. Each file in the `R` folder then implements a *phase* of the project. In larger projects it is likely that each phase will need its own folder. The implementation may then look something along the lines of: **R/munging.R** ```{r eval = FALSE} export("clean") clean <- function(dat) { # ... } export("recode") recode <- function(dat) { # ... } helper <- function(...) { # This function is private # ... } ``` **R/graphics.R** ```{r eval = FALSE} import("ggplot2") export("barplot", "lineplot") barplot <- function(dat) { # ... } lineplot <- function(dat) { # ... } helper <- function(...) { # ... } ``` - Each file is coerced into a module and can have its own set of imports. They do not share them. - Loading the complete folder, or each module individually is a matter of preference. Loading complete folders saves a couple of lines. - Each module has its own set of exports. This keeps the interface clean and minimal. # Documentation If you want proper documentation for your functions or modules you really want a package. There are some simple things you can do for ad-hoc documentation of modules which is to use comments: ```{r} module({ fun <- function(x) { ## A function for illustrating documentation ## x (numeric) some values x } }) ``` # Best practices - Modules in files should not load other modules in other files. You should view a module as a stand alone and self-contained unit. Dependencies should refer to packages if possible. The benefit is ease of reuse. If your modules do depend on each other, you use dependency injection to encode these relationships. See the vignette on *modules as objects*. - Modules should always declare exports. This clearly communicates which parts are safe to use and avoids that other parts of our code base rely on implementation details. - Do not use `library`, `attach` or `source` inside of modules. It is likely that they do not do what you want. `import` and `use` are to be preferred in this context. - A good length for a module in a file is appr. 100 lines of code. The idea is to keep things organised and modular. If we only have one big module or a collection of big modules we do not gain much. - All other R coding guidelines still apply inside of modules. - If you need documentation, or want to distribute and publish code: R-Packages are the way to go.