--- title: "2. Parallelize Computation of Indices" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{2. Parallelize Computation of Indices} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Note: This vignette presents some performance tests ran between non-parallel and parallel versions of `fundiversity` functions. Note that to avoid the dependency on other packages, this vignette is [**pre-computed**](https://ropensci.org/technotes/2019/12/08/precompute-vignettes/). Within `fundiversity` the computation of most indices can be parallelized using the `future` package. The indices that currently support parallelization are: **FRic**, **FDis**, **FDiv**, and **FEve**. The goal of this vignette is to explain how to toggle and use parallelization in `fundiversity`. The `future` package provides a simple and general framework to allow asynchronous computation depending on the resources available for the user. The [first vignette of `future`](https://cran.r-project.org/package=future) gives a general overview of all its features. The main idea being that the user should write the code once and that it would run seamlessly sequentially, or in parallel on a single computer, or on a cluster, or distributed over several computers. `fundiversity` can thus run on all these different backends following the user's choice. ```r library("fundiversity") data("traits_birds", package = "fundiversity") data("site_sp_birds", package = "fundiversity") ``` # Running code in parallel By default the `fundiversity` code will run sequentially on a single core. To trigger parallelization the user needs to define a `future::plan()` object with a parallel backend such as `future::multisession` to split the execution across multiple R sessions. ```r # Sequential execution fric1 <- fd_fric(traits_birds) # Parallel execution future::plan(future::multisession) # Plan definition fric2 <- fd_fric(traits_birds) # The code resolve in similar fashion identical(fric1, fric2) #> [1] TRUE ``` Within the `future::multisession` backend you can specify the number of cores on which the function should be parallelized over through the argument `workers`, you can change it in the `future::plan()` call: ```r future::plan(future::multisession, workers = 2) # Only 2 cores are used fric3 <- fd_fric(traits_birds) identical(fric3, fric2) #> [1] TRUE ``` To learn more about the different backends available and the related arguments needed, please refer to the documentation of `future::plan()` and the [overview vignette of `future`](https://cran.r-project.org/package=future). # Performance comparison We can now compare the difference in performance to see the performance gain thanks to parallelization: ```r future::plan(future::sequential) non_parallel_bench <- microbenchmark::microbenchmark( non_parallel = { fd_fric(traits_birds) }, times = 20 ) future::plan(future::multisession) parallel_bench <- microbenchmark::microbenchmark( parallel = { fd_fric(traits_birds) }, times = 20 ) rbind(non_parallel_bench, parallel_bench) #> Unit: milliseconds #> expr min lq mean median uq max neval cld #> non_parallel 8.756378 8.892243 9.841818 9.072241 9.218554 23.9519 20 a #> parallel 56.374332 167.680385 218.073077 172.888927 185.670312 1247.8534 20 b ``` The non parallelized code runs faster than the parallelized one! Indeed, the parallelization in `fundiversity` parallelize the computation across different sites. So parallelization should be used when you have many sites on which you want to compute similar indices. ```r # Function to make a bigger site-sp dataset make_more_sites <- function(n) { site_sp <- do.call(rbind, replicate(n, site_sp_birds, simplify = FALSE)) rownames(site_sp) <- paste0("s", seq_len(nrow(site_sp))) site_sp } ``` For example with a dataset 5000 times bigger: ```r bigger_site <- make_more_sites(5000) microbenchmark::microbenchmark( seq = { future::plan(future::sequential) fd_fric(traits_birds, bigger_site) }, multisession = { future::plan(future::multisession, workers = 4) fd_fric(traits_birds, bigger_site) }, multicore = { future::plan(future::multicore, workers = 4) fd_fric(traits_birds, bigger_site) }, times = 20 ) #> Warning in supportsMulticoreAndRStudio(...): [ONE-TIME WARNING] Forked processing ('multicore') is not supported when running R from RStudio #> because it is considered unstable. For more details, how to control forked processing or not, and how to silence this warning in future R #> sessions, see ?parallelly::supportsMulticore #> Unit: seconds #> expr min lq mean median uq max neval cld #> seq 15.58688 15.67587 15.97552 15.97047 16.24568 16.54392 20 a #> multisession 21.17851 21.75313 22.02965 21.88691 22.26971 23.50062 20 b #> multicore 15.53872 15.75567 16.06103 16.01595 16.35790 16.98102 20 a ```
Session info of the machine on which the benchmark was ran and time it took to run ``` #> seconds needed to generate this document: 1095.27 sec elapsed #> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.1 (2022-06-23) #> os Ubuntu 20.04.5 LTS #> system x86_64, linux-gnu #> ui RStudio #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Etc/UTC #> date 2022-11-15 #> rstudio 2022.02.0+443 Prairie Trillium (server) #> pandoc 2.17.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/ (via rmarkdown) #> #> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── #> ! package * version date (UTC) lib source #> P abind 1.4-5 2016-07-21 [3] CRAN (R 4.2.0) #> assertthat 0.2.1 2019-03-21 [3] CRAN (R 4.1.3) #> cachem 1.0.6 2021-08-19 [3] CRAN (R 4.1.3) #> VP cli 3.4.0 2022-09-23 [?] CRAN (R 4.2.1) (on disk 3.4.1) #> codetools 0.2-18 2020-11-04 [5] CRAN (R 4.0.3) #> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0) #> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0) #> DBI 1.1.2 2021-12-20 [3] CRAN (R 4.1.3) #> P digest 0.6.29 2021-12-01 [3] CRAN (R 4.2.0) #> dplyr * 1.0.10 2022-09-01 [1] CRAN (R 4.2.1) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0) #> evaluate 0.18 2022-11-07 [1] CRAN (R 4.2.1) #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0) #> P fastmap 1.1.0 2021-01-25 [3] CRAN (R 4.2.1) #> fundiversity * 0.2.1.9000 2022-04-12 [3] Github (bisaloo/fundiversity@87652ba) #> VP future 1.26.1 2022-09-02 [3] CRAN (R 4.2.1) (on disk 1.28.0) #> VP future.apply 1.9.0 2022-11-05 [3] CRAN (R 4.2.1) (on disk 1.10.0) #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0) #> P geometry 0.4.6 2022-04-18 [3] CRAN (R 4.2.0) #> ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.0) #> VP globals 0.15.0 2022-08-28 [3] CRAN (R 4.2.1) (on disk 0.16.1) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0) #> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.1) #> knitr 1.40 2022-08-24 [1] CRAN (R 4.2.1) #> lattice 0.20-45 2021-09-22 [3] CRAN (R 4.1.3) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.1) #> P listenv 0.8.0 2019-12-05 [3] CRAN (R 4.2.1) #> P magic 1.6-0 2022-02-09 [3] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> MASS 7.3-58.1 2022-08-03 [3] CRAN (R 4.2.1) #> Matrix 1.4-1 2022-03-23 [3] CRAN (R 4.1.3) #> memoise 2.0.1 2021-11-26 [3] CRAN (R 4.1.3) #> microbenchmark 1.4.9 2021-11-09 [3] CRAN (R 4.1.3) #> multcomp 1.4-19 2022-04-26 [1] CRAN (R 4.2.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0) #> mvtnorm 1.1-3 2021-10-08 [1] CRAN (R 4.2.0) #> VP parallelly 1.31.1 2022-07-21 [3] CRAN (R 4.2.1) (on disk 1.32.1) #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) #> P Rcpp 1.0.8.3 2022-03-17 [3] CRAN (R 4.2.0) #> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.1) #> rmarkdown 2.13 2022-03-10 [3] CRAN (R 4.1.3) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1) #> sandwich 3.0-2 2022-06-15 [1] CRAN (R 4.2.0) #> scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [3] CRAN (R 4.1.3) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0) #> survival 3.3-1 2022-03-03 [3] CRAN (R 4.1.3) #> TH.data 1.1-1 2022-04-26 [1] CRAN (R 4.2.0) #> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0) #> tictoc 1.0.1 2021-04-19 [3] CRAN (R 4.1.3) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.1) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0) #> vctrs 0.5.0 2022-10-22 [1] CRAN (R 4.2.1) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.34 2022-10-18 [1] CRAN (R 4.2.1) #> yaml 2.3.6 2022-10-18 [1] CRAN (R 4.2.1) #> zoo 1.8-10 2022-04-15 [1] CRAN (R 4.2.0) #> #> [1] /home/ke76dimu/R-library/4.2 #> [2] /usr/local/lib/R/site-library #> [3] /data/library/4.1 #> [4] /usr/lib/R/site-library #> [5] /usr/lib/R/library #> #> V ── Loaded and on-disk version mismatch. #> P ── Loaded and on-disk path mismatch. #> #> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ```