--- title: "downsize" author: "William Michael Landau" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{downsize} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- With the `downsize` package, you can toggle the test and production versions of your workflow with the flip of a `TRUE/FALSE` global option. This is helpful when your workflow takes a long time to run, you want to test it quickly, and unit testing is too reductionist to cover everything. # Basic downsizing Say you want to analyze a large dataset. ```{r, eval = F} big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4)) ``` But for the sake of time, you want to test and debug your code on a smaller dataset. In your code, select your dataset with a call to `downsize()`. ```{r, eval = F} my_data <- downsize(big_data) # downsize(big = big_data) ``` Above, `my_data` becomes `big_data` if `getOption("downsize")` is `FALSE` or `NULL` (default). If `getOption("downsize")` is `TRUE`, `big_data` becomes `head(big_data)`. You can toggle the global option `downsize` with calls to `production_mode()` and `test_mode()`, and you can override the option with `downsize(..., downsize = L)`, where `L` is `TRUE` or `FALSE`. Check if the workflow is in test or production mode with the `my_mode()` function. # Example with test and production modes Here is an example script in test mode. ```{r, eval = F} library(downsize) test_mode() # scales the workflow appropriately my_mode() # shows if the workflow is in test or production mode big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4)) # always large my_data <- downsize(big_data) # either large or small nrow(my_data) # responds to test_mode() and production_mode() # ...more code, time-consuming if my_data is large... ``` To scale up the workflow up to production mode, replace `test_mode()` with `production_mode()` and **leave everything else exactly the same**. ```{r, eval = F} library(downsize) production_mode() # scales the workflow appropriately my_mode() # shows if the workflow is in test or production mode big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4)) # always large my_data <- downsize(big_data) # either large or small nrow(my_data) # responds to test_mode() and production_mode() # ...more code, time-consuming if my_data is large... ``` An ideal workflow has multiple calls to `downsize()` that are configured all at once with a single call to `test_mode()` or `production_mode()` at the very beginning. Thus, tedium and human error are avoided, and the test is a close approximation to the original task at hand. # Provide your own test data You can provide a replacement for `big_data` using argument `small` in `downsize()`. ```{r} library(downsize) big_data <- data.frame(x = rnorm(1e4), y = rnorm(1e4)) small_data <- data.frame(x = runif(16), y = runif(16)) test_mode() my_mode() # getOption("downsize") is TRUE my_data <- downsize(big_data, small_data) # downsize(big = big_data, small = small_data) identical(my_data, small_data) ``` If you set `small` yourself, be sure that subsequent code can accept both `small` and `big`. For example, if `small` is a data frame and `big` is a matrix, your code may work fine in test mode and break in production mode. In addition, `downsize()` will warn you if `small` is identical to or bigger in memory than `big` (disable with `downsize(..., warn = FALSE`)). To be safer, use the subsetting capabilities of the `downsize()` function. # Subsetting The command `my_data <- downsize(big = big_data)` is equivalent to `my_data <- downsize(big = big_data, nrow = 6)`. There are multiple ways to subset argument `big` in `downsize()` when it is time to scale down to test mode. As in the following examples, be sure that `small` is set to `NULL` (default). Otherwise, subsetter arguments such as `dim`, `length`, `nrow`, and `ncol` will be ignored. ```{r} test_mode() downsize(1:10, length = 2) m <- matrix(1:36, ncol = 6) downsize(m, ncol = 2) downsize(m, nrow = 2) downsize(m, dim = c(2, 2)) downsize(data.frame(x = 1:10, y = 1:10), nrow = 5) x = array(0, dim = c(10, 100, 2, 300, 12)) dim(x) my_array <- downsize(x, dim = rep(3, 5)) dim(my_array) my_array <- downsize(x, dim = c(1, 4)) dim(my_array) my_array <- downsize(x, ncol = 1) dim(my_array) ``` Set `random` to `TRUE` to take a random subset of your data rather than just the first few rows or columns. ```{r} set.seed(6) downsize(m, ncol = 2, random = T) ``` # Interchange code blocks You can interchange entire blocks of code based on the scaling/mode of the workload. ```{r} test_mode() downsize(big = {a = 1; a + 10}, small = {a = 1; a + 1}) production_mode() downsize(big = {a = 1; a + 10}, small = {a = 1; a + 1}) ``` Variables set in code blocks are available after calls to `downsize()`. ```{r} test_mode() tmp <- downsize( big = { x = "long code" y = 1000 }, small = { x = "short code" y = 3.14 }) x == "short code" & y == 3.14 production_mode() tmp <- downsize( big = { x = "long code" y = 1000 }, small = { x = "short code" y = 3.14 }) x == "long code" & y == 1000 ``` # Help and troubleshooting Use the `help_downsize()` function to obtain a collection of helpful links. For troubleshooting, please refer to [TROUBLESHOOTING.md](https://github.com/wlandau/downsize/blob/master/TROUBLESHOOTING.md) on the [GitHub page](https://github.com/wlandau/downsize) for instructions.