--- title: "Get started" author: Konrad Rudolph date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true md_document: variant: gfm vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Get started} %\VignetteEncoding{UTF-8} --- ## Using modules For the purpose of this tutorial, we are going to use the example module `bio/seq`. The module implements some very basic mechanisms for dealing with DNA sequences (= character strings consisting of the letters `A`, `C`, `G` and `T`). First, we load the module: ```{r eval = FALSE} box::use(./bio/seq) ``` The function `box::use` accepts a list of *unquoted*, *qualified* module names. Each of these module names will load a single module and make it available to the caller in some form. In the code above, we’ve loaded a single module, `bio/seq`. `bio` serves as a *parent module* that may group several submodules. Since the module name inside `box::use` starts with `./`, the module location is resolved *locally*, i.e. relative to the path of the currently running code. In the above, `seq` is the module’s *proper name*. `bio/seq` is its *fully qualified name*. And `./bio/seq` is its *`use` declaration*. To see the effect of this `use` declaration, let’s inspect our workspace: ```{r eval = FALSE} ls() ``` ## [1] "seq" ```{r eval = FALSE} seq ``` ## We have used the module’s fully qualified name to load it. But, as shown by `ls`, loading the module this way only introduces a single new name into the current scope, the module itself, identified by its proper (non-qualified) name. To see which names a module exports, we use `ls` again, this time on the module itself: ```{r eval = FALSE} ls(seq) ``` ## [1] "is_valid" "revcomp" "seq" "table" It appears that `seq` exports 4 different names. To access exported names, we use the `$` operator: seq\$is_valid allows us to use the first function in the list of exported names. We can also display the interactive help for individual names using the `box::help` function, e.g.: ```{r eval = FALSE} box::help(seq$revcomp) ``` Now let’s actually *use* the module. The `seq` function inside the `bio/seq` module constructs a set of (optionally named) biological sequences: ```{r eval = FALSE} s = seq$seq( gene1 = 'GATTACAGATCAGCTCAGCACCTAGCACTATCAGCAAC', gene2 = 'CATAGCAACTGACATCACAGCG' ) seq$is_valid(s) ``` ## [1] TRUE ```{r eval = FALSE} s ``` ## 2 DNA sequences: ## >gene1 ## GATTACAGATCAGCTCAGCACCTAGCA... ## >gene2 ## CATAGCAACTGACATCACAGCG Note how we automatically get pretty-printed ([FASTA](https://en.wikipedia.org/wiki/FASTA_format)) output because the `print` method (which gets called implicitly here) is specialised for the `'bio/seq'` S3 class in the `bio/seq` module (prefixing S3 classes inside modules with the full module name is a convention to avoid name clashes of S3 classes): ```{r eval = FALSE} getS3method('print', 'bio/seq') ``` ## function (x) { ## box::use(stringr[trunc = str_trunc]) ## ## if (is.null(names(x))) names(x) = paste('seq', seq_along(x)) ## ## cat( ## sprintf('%d DNA sequence%s:\n', length(x), if (length(x) == 1L) '' else 's'), ## sprintf(' >%s\n %s\n', names(x), trunc(x, 30L)), ## sep = '' ## ) ## invisible(x) ## } ## The source code for `` `print.bio/seq` `` contains an interesting `use` declaration: it showcases an alternative way of invoking `box::use`, which we’ll explore now. ## Attaching modules Let’s have a look at alternative ways of using modules. To start, let’s unload the `bio/seq` module … ```{r eval = FALSE} box::unload(seq) ``` … and load it again, via a different route: ```{r eval = FALSE} options(box.path = getwd()) box::use(bio/seq[revcomp, is_valid]) ``` After unloading the already loaded module, `options(box.path = …)` sets the module search path: this is where `box::use` searches for modules. If more than one path is given, `box::use` searches them all until a module of matching name is found. This works analogously to how `.libPaths` operates on R packages. The `box::use` directive can now use `bio/seq` instead of `./bio/seq` as the module name: rather than a relative name we specify a *global* name. In this example we set the search path to the current working directory but in normal usage it would be a global library location, e.g. (following the [XDG base directory specification](https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html)) `~/.local/share/R/modules` on Linux. **Note** that non-local module names *must* be fully qualified, nested modules: `box::use(foo/bar)` works, `box::use(bar)` does not (instead, it is assumed that `bar` refers to a *package*)! In the declaration above we use `[revcomp, is_valid]` to specify that the names `revcomp` `is_valid` from the `bio/seq` module should be attached in the calling environment. The `[…]` part is an *attach specification*: a comma-separated list of names inside the parentheses specifies which names to attach. The special symbol `...` can be used to specify that *all exported names* should be attached. This has an effect similar to conventional package loading via `library` (or `attach`ing an environment): all the attached names are now available for direct use without necessitating the `seq$` qualifier: ```{r eval = FALSE} is_valid(s) ``` ## [1] TRUE ```{r eval = FALSE} revcomp(s) ``` ## 2 DNA sequences: ## >gene1 ## GTTGCTGATAGTGCTAGGTGCTGAGCT... ## >gene2 ## CGCTGTGATGTCAGTTGCTATG However, unlike the `attach` function, module attachment happens in the *current, local scope* only. Since the above code was executed in the global environment, there’s no distinction between local and global scope: ```{r eval = FALSE} search() ``` ## [1] ".GlobalEnv" "mod:bio/seq" "package:stats" ## [4] "package:graphics" "package:grDevices" "package:utils" ## [7] "package:datasets" "renv:shims" "package:methods" ## [10] "Autoloads" "package:base" Note the second item, which reads “mod:bio/seq”. But let’s now undo that, to attach (and use) the module locally instead: ```{r eval = FALSE} detach() seq_table = function (s) { box::use(./bio/seq[...]) table(s) } seq_table(s) ``` ## $gene1 ## A C G T ## 13 12 6 7 ## ## $gene2 ## A C G T ## 8 7 4 3 Unlike above, we are now attaching *all* exported names instead of specifying individual names. The subsequent line of code uses the `seq$table` function rather than `base::table` (which would have a different output). And note that the `seq` module’s `table` function is *not* attached outside the local scope: ```{r eval = FALSE} search() ``` ## [1] ".GlobalEnv" "package:stats" "package:graphics" ## [4] "package:grDevices" "package:utils" "package:datasets" ## [7] "renv:shims" "package:methods" "Autoloads" ## [10] "package:base" ```{r eval = FALSE} table(s) ``` ## s ## CATAGCAACTGACATCACAGCG GATTACAGATCAGCTCAGCACCTAGCACTATCAGCAAC ## 1 1 This is very powerful, as it isolates separate scopes more effectively than the `attach` function. What is more, modules which are used and attached inside another module *remain* inside that module and are not visible outside the module by default. Nevertheless, the normal, recommended usage of a module is without an attach specification, as this makes it clearer which names are being referring to. ## Writing modules The module `bio/seq`, which we have used in the previous section, is implemented in the file [`bio/seq.r`](bio/seq.r). The file `seq.r` is, by and large, a regular R source file, which happens to live in a directory named `bio`. In fact, there are only three things worth mentioning: 1. Documentation: functions in the module file can be documented using ‘[roxygen2](https://cran.r-project.org/package=roxygen2)’ syntax. It works the same as for packages. The ‘box’ package parses the documentation and makes it available via `box::help`. *Displaying module help requires that ‘roxygen2’ is installed.* 2. Export declarations: similar to packages, modules explicitly need to declare which names they export; they do this using the annotation comment `#' @export` in front of the name assignment. Again, this works similarly to ‘roxygen2’ (but does *not* require having that package installed). 3. [S3 functions](https://adv-r.hadley.nz/s3.html): ‘box’ registers and exports such functions automatically as necessary, but this only works for *user generics* that are defined inside the same module. When overriding “known generics” (such as `print`), we need to register these manually via `register_S3_method` (this is necessary since these functions are inherently ambiguous and there is no automatic way of finding them). ## Nesting modules Modules can also form nested hierarchies. In fact, here is the implementation of `bio` (in [`bio/__init__.r`](bio/__init__.r): since `bio` is a directory rather than a file, the module implementation resides in the nested file `__init__.r`): ```{r eval = FALSE} #' @export box::use(./seq) ``` The submodule is specified as `./seq` rather than `seq`: the explicitly provided relative path prevents lookup in the import search path (that we set via `options(box.path = …)`); instead, only the current directory (that is, the directory containing the `bio` module) is considered. When applied to a `box::use` declaration, `@export` causes all names which are imported by that declaration to also be exported: any module name created by the declaration (here, `seq`) is exported as-is. Furthermore, any attached name is likewise exported. Refer to the `box::use` documentation and examples for more details on which names are exported. Coming back to our example module, we can now use the `bio` module: ```{r eval = FALSE} options(box.path = NULL) # Reset search path box::use(./bio) ls(bio) ``` ## [1] "seq" ```{r eval = FALSE} ls(bio$seq) ``` ## [1] "is_valid" "revcomp" "seq" "table" ```{r eval = FALSE} bio$seq$revcomp('CAT') ``` ## 1 DNA sequence: ## >seq 1 ## ATG We could also have implemented `bio` as follows: ```{r eval = FALSE} #' @export box::use(./seq[...]) ``` This would have made all of `seq`’s definitions immediately available in `bio`, without having to always write `seq$…`. This is sometimes useful, but should be employed with care: being explicit about namespaces generally increases code robustness and readability. ## Code execution on loading Modules define functions and values. To execute code when a module is loaded, put it inside a function with the name `.on_load`. This function is similar to the hook for the `.onLoad` *package* namespace event. This function is executed the first time the module is loaded in an R session. Subsequent calls to `box::use` for that module, regardless of whether they occur in a different scope, will refer to the already loaded, cached module, and will *not* reload the module. We can illustrate this by loading a module which has side-effects, `info`. ```{r eval = FALSE} .on_load = function (ns) { message( 'Loading module "', box::name(), '"\n', 'Module path: "', basename(box::file()), '"' ) } box::export() # Mark as a ‘box’ module. ``` Let’s use it: ```{r eval = FALSE} box::use(./info) ``` ## Loading module "info" ## Module path: "vignettes" We have imported the module, and get the diagnostic messages. Let’s re-use the module: ```{r eval = FALSE} box::use(./info) ``` … no messages are displayed. However, we *can* explicitly reload a module. This clears the cache, and loads the module again. This can be useful during development and debugging: ```{r eval = FALSE} box::reload(info) ``` ## Loading module "info" ## Module path: "vignettes" And this displays the messages again. The `reload` function is a shortcut for `unload` followed by `import` (using the exact same arguments as used on the original `import` call). ### Module helper functions This `info` module also show-cases two important helper functions: 1. `box::name` returns the name of the module with which it was loaded. This is especially handy because, when called outside of a module, `box::name` is `NULL`. This allows testing whether a piece of code was loaded as a module, or invoked directly (e.g. via `Rscript` on the command line). 2. `box::file` is similar to `system.file`: it returns the full path to any file within the directory where a module is stored. This is useful when distributing data files with modules, which are loaded from within the module. When invoked without arguments, `box::file` returns the full path to the directory containing the module source file.