Type: | Package |
Title: | Isolation Forest-Based Presence-Only Species Distribution Modeling |
Version: | 0.2.2 |
Description: | Collection of R functions to do purely presence-only species distribution modeling with isolation forest (iForest) and its variations such as Extended isolation forest and SCiForest. See the details of these methods in references: Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) <doi:10.1109/ICDM.2008.17>, Hariri, S., Kind, M.C. and Brunner, R.J. (2019) <doi:10.1109/TKDE.2019.2947676>, Liu, F.T., Ting, K.M. and Zhou, Z.H. (2010) <doi:10.1007/978-3-642-15883-4_18>, Guha, S., Mishra, N., Roy, G. and Schrijvers, O. (2016) https://proceedings.mlr.press/v48/guha16.html, Cortes, D. (2021) <doi:10.48550/arXiv.2110.13402>. Additionally, Shapley values are used to explain model inputs and outputs. See details in references: Shapley, L.S. (1953) <doi:10.1515/9781400881970-018>, Lundberg, S.M. and Lee, S.I. (2017) https://dm-gatech.github.io/CS8803-Fall2018-DML-Papers/shapley.pdf, Molnar, C. (2020) <ISBN:978-0-244-76852-2>, Štrumbelj, E. and Kononenko, I. (2014) <doi:10.1007/s10115-013-0679-x>. itsdm also provides functions to diagnose variable response, analyze variable importance, draw spatial dependence of variables and examine variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including 'WorldClim' version 2.0 (see Fick, S.E. and Hijmans, R.J. (2017) <doi:10.1002/joc.5086>) and 'CMCC-BioClimInd' (see Noce, S., Caporaso, L. and Santini, M. (2020) <doi:10.1038/s41597-020-00726-5>. |
License: | MIT + file LICENSE |
URL: | https://github.com/LLeiSong/itsdm, https://lleisong.github.io/itsdm/ |
BugReports: | https://github.com/LLeiSong/itsdm/issues |
Depends: | R (≥ 3.5.0) |
Imports: | checkmate, dplyr, fastshap, ggplot2, isotree, methods, mgcv, ncdf4, outliertree, patchwork, raster, rlang, ROCit, sf, stars (≥ 0.6-0), stats, stringr, tidyselect, utils |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
Language: | en-US |
LazyData: | true |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-07-01 22:22:12 UTC; leisong |
Author: | Lei Song |
Maintainer: | Lei Song <lsong@clarku.edu> |
Repository: | CRAN |
Date/Publication: | 2025-07-01 22:50:02 UTC |
Isolation forest-based presence-only species distribution modeling
Description
This package is a wrapper for a few packages including isotree
, outliertree
, fastshap
, etc. It does purely presence-only species distribution modeling with isolation forest and variations such as SCiForest and EIF. It also provides functions to make response curves, analyze variable importance, analyze variable dependence and analyze variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including worldclim version 2.0 and CMCC-BioClimInd. There are also functions to detect outliers in the occurrence dataset to do data cleaning.
Details
This package provides multiple features.
Download bioclimatic variables and reduce their dimensions. This includes historic and future climatic indicators from two sources:
Detect suspicous environmental outliers.
Fit a isolation forest-based SDM.
Make presence-only evaluation.
Generate response curves of environmental variables including marginal and independent responses and analyze interactions between environmental variables.
Analyze variable importance using Shapley values.
Convert predicted environmental suitability to presence-absence map.
Analyze variable contributions to any specific observations.
Author(s)
Lei Song lsong@clarku.edu
Maintainer: Lei Song lei.song@rutgers.edu
References
Please check references in R documentation of each specific function.
Download historic Bioclimatic indicators (BIOs) named CMCC-BioClimInd.
Description
Parse historic CMCC-BioClimInd bioclimatic indicators optionally with a setting of boundary and a few other options.
Usage
cmcc_bioclim(bry = NULL, path = NULL, nm_mark = "clip", return_stack = TRUE)
Arguments
bry |
( |
path |
( |
nm_mark |
( |
return_stack |
( |
Details
Web page page for this dataset
Value
if return_stack
is TRUE
, the images would be
returned as a stars
. Otherwise, nothing to return, but the user
would receive a message of where the images are.
Note
The function is experimental at the moment, because the download server
of this dataset is not as stable as Worldclim yet. If it fails due to slow
internet, try to set a larger timeout option,
e.g., using options(timeout = 1e3)
.
References
Noce, Sergio, Luca Caporaso, and Monia Santini."A new global dataset of bioclimatic indicators. "Scientific data 7.1 (2020): 1-12. doi:10.1038/s41597-020-00726-5
Examples
## Not run:
library(dplyr)
library(sf)
library(itsdm)
bry <- st_polygon(
list(rbind(c(29.34, -11.72), c(29.34, -0.95),
c(40.31, -0.95), c(40.31, -11.72),
c(29.34, -11.72)))) %>%
st_sfc(crs = 4326)
cmcc_bios <- cmcc_bioclim(bry = bry,
nm_mark = 'tza', path = tempdir())
## End(Not run)
Convert predicted suitability to presence-absence map.
Description
Use threshold-based, logistic or linear conversion method to convert predicted suitability map to presence-absence map.
Usage
convert_to_pa(
suitability,
method = "logistic",
beta = 0.5,
alpha = -0.05,
a = 1,
b = 0,
species_prevalence = NA,
threshold = 0.5,
seed = 10L,
visualize = TRUE
)
Arguments
suitability |
( |
method |
( |
beta |
( |
alpha |
( |
a |
( |
b |
( |
species_prevalence |
( |
threshold |
( |
seed |
( |
visualize |
( |
Details
Multiple methods and arguments could be used as a combination to do the conversion.
Value
(PAConversion
) A list of
suitability (
stars
) The input suitability mapprobability_of_occurrence (
stars
) The map of occurrence probabilitypa_conversion (
list
) A list of conversion argumentspa_map (
stars
) The presence-absence map
References
c
onvertToPA in package virtualspecies
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 5,
sample_size = 0.8, ndim = 1L,
nthreads = 1,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
# Threshold conversion
pa_thred <- convert_to_pa(mod$prediction,
method = 'threshold', beta = 0.5, visualize = FALSE)
pa_thred
plot(pa_thred)
## Not run:
# Logistic conversion
pa_log <- convert_to_pa(mod$prediction, method = 'logistic',
beta = 0.5, alpha = -.05)
# Linear conversion
pa_lin <- convert_to_pa(mod$prediction, method = 'linear',
a = 1, b = 0)
## End(Not run)
Detect areas influenced by a changing environment variable.
Description
Use shapley values to detect the potential areas that will impact the species distribution. It only works on continuous variables.
Usage
detect_envi_change(
model,
var_occ,
variables,
target_var,
bins = NULL,
shap_nsim = 10,
seed = 10,
var_future = NULL,
variables_future = NULL,
pfun = .pfun_shap,
method = "gam",
formula = y ~ s(x)
)
Arguments
model |
( |
var_occ |
( |
variables |
( |
target_var |
( |
bins |
( |
shap_nsim |
( |
seed |
( |
var_future |
( |
variables_future |
( |
pfun |
( |
method |
Argument passed on to |
formula |
Argument passed on to |
Details
The values show how changes in environmental variable affects the modeling prediction in space. These maps could help to answer questions of where will be affected by a changing variable.
Value
(EnviChange
) A list of
A figure of fitted variable curve
A map of variable contribiution change
Tipping points of variable contribution
A
stars
of variable contribution under current and future condition, and the detected changes
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
#'
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
#'
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
#'
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 12))
#'
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 5,
sample_size = 0.8, ndim = 1L,
nthreads = 1,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
# Use a fixed value
bio1_changes <- detect_envi_change(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1,
target_var = "bio1",
var_future = 5)
## Not run:
# Use a future layer
## Read the future Worldclim variables
future_vars <- system.file(
'extdata/future_bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
split() %>% select(bioc1, bioc12)
# Rename the bands
names(future_vars) <- paste0("bio", c(1, 12))
## Just use the target future variable
climate_changes <- detect_envi_change(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1,
target_var = "bio1",
var_future = future_vars %>% select("bio1"))
## Use the whole future variable tack
bio12_changes <- detect_envi_change(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1,
target_var = "bio12",
variables_future = future_vars)
print(bio12_changes)
##### Use Random Forest model as an external model ########
library(randomForest)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>%
filter(usage == "train")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12)) %>%
split()
model_data <- stars::st_extract(
env_vars, at = as.matrix(obs_df %>% select(x, y))) %>%
as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>%
mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)
mod_rf <- randomForest(
occ ~ .,
data = model_data,
ntree = 200)
pfun <- function(X.model, newdata) {
# for data.frame
predict(X.model, newdata, type = "prob")[, "1"]
}
# Use a fixed value
bio5_changes <- detect_envi_change(
model = mod_rf,
var_occ = model_data %>% select(-occ),
variables = env_vars,
target_var = "bio5",
bins = 20,
var_future = 5,
pfun = pfun)
plot(bio5_changes)
## End(Not run)
Remove environmental variables that have high correlation with others.
Description
Select environmental variables that have pairwise Pearson correlation lower than a user-defined threshold. NOTE that it only works on numeric variables, does not work on categorical variables.
Usage
dim_reduce(
img_stack = NULL,
threshold = 0.5,
preferred_vars = NULL,
samples = NULL
)
Arguments
img_stack |
( |
threshold |
( |
preferred_vars |
( |
samples |
( |
Value
(ReducedImageStack
) A list of
threshold (
numeric
) The threshold set in function inputsimg_reduced (
stars
) The image stack after dimension reductioncors_original (
data.frame
) A table of Pearson correlations between all variables.cors_reduced (
data.frame
) A table of Pearson correlations between variables after dimension reduction.
Examples
library(sf)
library(itsdm)
library(stars)
library(dplyr)
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars()
img_reduced <- dim_reduce(env_vars, threshold = 0.7,
preferred_vars = c('bio1', 'bio12'))
Evaluate the model based on presence-only data.
Description
This function will calculate two major types of evaluation metrics in terms of presence-only data. The first type is presence-only customized metrics, such as Contrast Validation Index (CVI), continuous Boyce index (CBI), and ROC_ratio. The second type is presence-background evaluation metrics by extracting background points as pseudo absence observations.
Usage
evaluate_po(
model,
occ_pred,
bg_pred = NULL,
var_pred,
threshold = NULL,
visualize = FALSE
)
Arguments
model |
( |
occ_pred |
( |
bg_pred |
( |
var_pred |
( |
threshold |
( |
visualize |
( |
Details
-
CVI is the proportion of presence points falling in cells having a threshold (
0.5
for example) habitat suitability index minus the proportion of cells within this range of threshold of the model. Here we used varied thresholds:0.25
,0.5
, and0.75
. continuous Boyce index (CBI) is made with a 100 resolution of moving windows and Kendall method.
-
ROC_ratio curve plots the proportion of presences falling above a range of thresholds against the proportion of cells falling above the range of thresholds. The area under the modified ROC curve was then called AUC_ratio.
Sensitivity (TPR) = TP/(TP + FN)
Specificity (TNR) = TN/(TN + FP)
True skill statistic (TSS) = Sensitivity + specificity - 1
Jaccard's similarity index = TP/(FN + TP + FP)
Sørensen's similarity index (F-measure) = 2TP/(FN + 2TP + FP)
Overprediction rate = FP/(TP + FP)
Underprediction rate = FN/(TP + FN)
Value
(POEvaluation
) A list of
-
po_evaluation is presence-only evaluation metrics. It is a list of
cvi (
list
) A list of CVI with 0.25, 0.5, and 0.75 as thresholdboyce (
list
) A list of items related to continuous Boyce index (CBI)roc_ratio (
list
) A list of ROC ratio and AUC ratio
-
pb_evaluation is presence-background evaluation metrics. It is a list of
confusion matrix (
table
) A table of confusion matrix. The columns are true values, and the rows are predicted values.sensitivity (
numeric
) The sensitivity or TPRspecificity (
numeric
) The specificity or TNRTSS (
list
) A list of info related to true skill statistic (TSS)cutoff (
vector
ofnumeric
) A vector of cutoff threshold valuestss (
vector
ofnumeric
) A vector of TSS for each cutoff thresholdRecommended threshold (
numeric
) A recommended threshold according to TSSOptimal TSS (
numeric
) The best TSS value
roc (
list
) A list of ROC values and AUC valueJaccard's similarity index (
numeric
) The Jaccard's similarity indexSørensen's similarity index (
numeric
) The Sørensen's similarity index or F-measureOverprediction rate (
numeric
) The Overprediction rateUnderprediction rate (
numeric
) The Underprediction rate
References
Peterson, A. Townsend, Monica Papeş, and Jorge Soberón. "Rethinking receiver operating characteristic analysis applications in ecological niche modeling." Ecological modelling 213.1 (2008): 63-72. doi:10.1016/j.ecolmodel.2007.11.008
Hirzel, Alexandre H., et al. "Evaluating the ability of habitat suitability models to predict species presences." Ecological modelling 199.2 (2006): 142-152. doi:10.1016/j.ecolmodel.2006.05.017
Hirzel, Alexandre H., and Raphaël Arlettaz. "Modeling habitat suitability for complex species distributions by environmental-distance geometric mean." Environmental management 32.5 (2003): 614-623. doi:10.1007/s00267-003-0040-3
Leroy, Boris, et al. "Without quality presence-absence data, discrimination metrics such as TSS can be misleading measures of model performance." Journal of Biogeography 45.9 (2018): 1994-2002. doi:10.1111/jbi.13402
See Also
print.POEvaluation
, plot.POEvaluation
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With perfect_presence mode,
# which should be very rare in reality.
mod <- isotree_po(
obs_mode = "perfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
# Without background samples or absences
eval_train <- evaluate_po(
mod$model,
occ_pred = mod$pred_train$prediction,
var_pred = na.omit(as.vector(mod$prediction[[1]])))
print(eval_train)
# With background samples
bg_pred <- st_extract(
mod$prediction, mod$background_samples) %>%
st_drop_geometry()
eval_train <- evaluate_po(
mod$model,
occ_pred = mod$pred_train$prediction,
bg_pred = bg_pred$prediction,
var_pred = na.omit(as.vector(mod$prediction[[1]])))
plot(eval_train)
#'
Format the occurrence dataset for usage in itsdm
Description
The focus of this function is to format the dataset but to keep the dataset as original as possible. Then the users can modify the data if they want before put it into this function.
Usage
format_observation(
obs_df,
eval_df = NULL,
split_perc = 0.3,
seed = 123,
obs_crs = 4326,
eval_crs = 4326,
x_col = "x",
y_col = "y",
obs_col = "observation",
obs_type = "presence_only"
)
Arguments
obs_df |
( |
eval_df |
( |
split_perc |
( |
seed |
( |
obs_crs |
( |
eval_crs |
( |
x_col |
( |
y_col |
( |
obs_col |
( |
obs_type |
( |
Value
(FormatOccurrence
) A list of
obs (
sf
) the formatted pts of observations. The column of observation is "observation".obs_type (
character
) the type of the observations, presence_only or presence_absence.has_eval (
logical
) whether evaluation dataset is set or generated.eval (
sf
) the formatted pts of observations for evaluation if any. The column of observation is "observation".eval (
eval_type
) the type of the observations for evaluation, presence_only or presence_absence.
See Also
Examples
library(dplyr)
library(itsdm)
data("occ_virtual_species")
# obs + eval, presence-absence
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"
obs <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = obs_type)
# obs + eval, presence-only
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_only"
obs <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = obs_type)
# obs + eval, different crs, presence-only
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
obs_crs <- 4326
# Fake one
eval_crs <- 20935
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_only"
obs <- format_observation(
obs_df = obs_df, eval_df = eval_df,
obs_crs = obs_crs, eval_crs = eval_crs,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = obs_type)
# obs + split, presence-absence
obs_df <- occ_virtual_species
split_perc <- 0.5
seed <- 123
obs_crs <- 4326
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"
obs <- format_observation(
obs_df = obs_df, split_perc = split_perc,
x_col = x_col, y_col = y_col,
obs_col = obs_col, obs_type = obs_type)
# obs, presence-only, no eval
obs_df <- occ_virtual_species
eval_df <- NULL
split_perc <- 0
seed <- 123
obs_crs <- 4326
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_only"
obs <- format_observation(
obs_df = obs_df, eval_df = eval_df,
split_perc = split_perc,
x_col = x_col, y_col = y_col,
obs_col = obs_col, obs_type = obs_type)
Download future Bioclimatic indicators (BIOs) named CMCC-BioClimInd.
Description
Parse future CMCC-BioClimInd bioclimatic indicators obtained by different Earth System Models (ESMs) optionally with a setting of boundary and a few other options.
Usage
future_cmcc_bioclim(
bry = NULL,
path = NULL,
esm = "CMCC-CESM",
rcp = 85,
interval = "2040-2079",
nm_mark = "clip",
return_stack = TRUE
)
Arguments
bry |
( |
path |
( |
esm |
( |
rcp |
( |
interval |
( |
nm_mark |
( |
return_stack |
( |
Details
https://doi.pangaea.de/10.1594/PANGAEA.904278?format=html#download
Value
if return_stack
is TRUE
, the images would be
returned as a stars
. Otherwise, nothing to return, but the user
would receive a message of where the images are.
Note
The function is experimental at the moment, because the download server
of this dataset is not as stable as Worldclim yet. If it fails due to slow
internet, try to set a larger timeout option,
e.g., using options(timeout = 1e3)
.
References
Noce, Sergio, Luca Caporaso, and Monia Santini."A new global dataset of bioclimatic indicators. "Scientific data 7.1 (2020): 1-12.doi:10.1038/s41597-020-00726-5
Examples
## Not run:
library(itsdm)
future_cmcc_bioclim(path = tempdir(),
esm = 'GFDL-ESM2M', rcp = 45,
interval = "2040-2079", return_stack = FALSE)
## End(Not run)
A function to parse the future climate from worldclim version 2.1.
Description
This function allows you to parse worldclim version 2.1 future climatic files with a setting of boundary and a few other options.
Usage
future_worldclim2(
var = "tmin",
res = 10,
gcm = "BCC-CSM2-MR",
ssp = "ssp585",
interval = "2021-2040",
bry = NULL,
path = NULL,
nm_mark = "clip",
return_stack = TRUE
)
Arguments
var |
( |
res |
( |
gcm |
( |
ssp |
( |
interval |
( |
bry |
( |
path |
( |
nm_mark |
( |
return_stack |
( |
Details
Web page page for this dataset
Value
if return_stack
is TRUE
, the images would be
returned as a stars
. Otherwise, nothing to return, but the user
would receive a message of where the images are.
Note
If it fails due to slow internet, try to set a larger timeout option,
e.g., using options(timeout = 1e3)
.
References
Fick, Stephen E., and Robert J. Hijmans. "WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas." International journal of climatology 37.12 (2017): 4302-4315.doi:10.1002/joc.5086
Examples
## Not run:
future_worldclim2("tmin", 10, "BCC-CSM2-MR",
"ssp585", "2021-2040",
path = tempdir(), return_stack = FALSE)
## End(Not run)
Calculate independent responses of each variables.
Description
Calculate the independent responses of each variables within the model.
Usage
independent_response(model, var_occ, variables, si = 1000, visualize = FALSE)
Arguments
model |
(Any predictive model). It is |
var_occ |
( |
variables |
( |
si |
( |
visualize |
( |
Details
The values show how each environmental variable independently affects the modeling prediction. They show how the predicted result only using this variable changes as it is varied.
Value
(IndependentResponse
) A list of
responses_cont (
list
) A list of response values of continuous variablesresponses_cat (
list
) A list of response values of categorical variables
References
Elith, Jane, et al. "The evaluation strip: a new and robust method for plotting predicted responses from species distribution models." Ecological modelling 186.3 (2005): 280-289.doi:10.1016/j.ecolmodel.2004.12.007
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
independent_responses <- independent_response(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables)
plot(independent_responses)
Build Isolation forest species distribution model and explain the the model and outputs.
Description
Call Isolation forest and its variations to do species distribution modeling and optionally call a collection of other functions to do model explanation.
Usage
isotree_po(
obs_mode = "imperfect_presence",
obs,
obs_ind_eval = NULL,
variables,
categ_vars = NULL,
contamination = 0.1,
ntrees = 100L,
sample_size = 1,
ndim = 1L,
seed = 10L,
...,
offset = 0,
response = TRUE,
spatial_response = TRUE,
check_variable = TRUE,
visualize = FALSE
)
Arguments
obs_mode |
( |
obs |
( |
obs_ind_eval |
( |
variables |
( |
categ_vars |
( |
contamination |
( |
ntrees |
( |
sample_size |
( |
ndim |
( |
seed |
( |
... |
Other arguments that |
offset |
( |
response |
( |
spatial_response |
( |
check_variable |
( |
visualize |
( |
Details
For "perfect_presence", a user-defined number (contamination
) of samples
will be taken from background to let iForest
function normally.
If "imperfect_presence", no further actions is required.
If the obs_mode is "presence_absence", a contamination
percent
of absences will be randomly selected and work together with all presences
to train the model.
NOTE: obs_mode and mode only works for obs
. obs_ind_eval
will follow its own structure.
Please read details of algorithm isolation.forest
on
https://github.com/david-cortes/isotree, and
the R documentation of function isolation.forest
.
Value
(POIsotree
) A list of
model (
isolation.forest
) The threshold set in function inputsvariables (
stars
) The formatted image stack of environmental variablesbackground_samples (
sf
) Asf
of background points for training dataset evaluation or SHAP dependence plotindependent_test (
sf
orNULL
) Asf
of test occurrence datasetbackground_samples_test (
sf
orNULL
) Asf
of background points for test dataset evaluation or SHAP dependence plotvars_train (
data.frame
) Adata.frame
with values of each environmental variables for training occurrencepred_train (
data.frame
) Adata.frame
with values of prediction for training occurrenceeval_train (
POEvaluation
) A list of presence-only evaluation metrics based on training dataset. See details ofPOEvaluation
inevaluate_po
var_test (
data.frame
orNULL
) Adata.frame
with values of each environmental variables for test occurrencepred_test (
data.frame
orNULL
) Adata.frame
with values of prediction for test occurrenceeval_test (
POEvaluation
orNULL
) A list of presence-only evaluation metrics based on test dataset. See details ofPOEvaluation
inevaluate_po
prediction (
stars
) The predicted environmental suitabilitymarginal_responses (
MarginalResponse
orNULL
) A list of marginal response values of each environmental variables. See details inmarginal_response
offset (
numeric
) The offset value set as inputs.independent_responses (
IndependentResponse
orNULL
) A list of independent response values of each environmental variables. See details inindependent_response
shap_dependences (
ShapDependence
orNULL
) A list of variable dependence values of each environmental variables. See details inshap_dependence
spatial_responses (
SpatialResponse
orNULL
) A list of spatial variable dependence values of each environmental variables. See details inshap_dependence
variable_analysis (
VariableAnalysis
orNULL
) A list of variable importance analysis based on multiple metrics. See details invariable_analysis
References
Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 eighth ieee international conference on data mining.IEEE, 2008. doi:10.1109/ICDM.2008.17
Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation-based anomaly detection." ACM Transactions on Knowledge Discovery from Data (TKDD) 6.1 (2012): 1-39. doi:10.1145/2133360.2133363
Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "On detecting clustered anomalies using SCiForest." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2010. doi:10.1007/978-3-642-15883-4_18
Ha riri, Sahand, Matias Carrasco Kind, and Robert J. Brunner. "Extended isolation forest." IEEE Transactions on Knowledge and Data Engineering (2019). doi:10.1109/TKDE.2019.2947676
References of related feature such as response curves and variable importance will be listed under their own functions
See Also
evaluate_po
, marginal_response
,
independent_response
, shap_dependence
,
spatial_response
, variable_analysis
,
isolation.forest
Examples
########### Presence-absence mode #################
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Load example dataset
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = obs_type)
# Load variables
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
# Modeling
mod_virtual_species <- isotree_po(
obs_mode = "presence_absence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.6, ndim = 1L,
seed = 123L, nthreads = 1)
# Check results
## Evaluation based on training dataset
print(mod_virtual_species$eval_train)
plot(mod_virtual_species$eval_train)
## Response curves
plot(mod_virtual_species$marginal_responses)
plot(mod_virtual_species$independent_responses,
target_var = c('bio1', 'bio5'))
plot(mod_virtual_species$shap_dependence)
## Relationships between target var and related var
plot(mod_virtual_species$shap_dependence,
target_var = c('bio1', 'bio5'),
related_var = 'bio12', smooth_span = 0)
# Variable importance
mod_virtual_species$variable_analysis
plot(mod_virtual_species$variable_analysis)
########### Presence-absence mode ##################
# Load example dataset
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
# Modeling with perfect_presence mode
mod_perfect_pres <- isotree_po(
obs_mode = "perfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.6, ndim = 1L,
seed = 123L, nthreads = 1)
# Modeling with imperfect_presence mode
mod_imperfect_pres <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.6, ndim = 1L,
seed = 123L, nthreads = 1)
Boundary of mainland Africa
Description
The overall continental boundary of mainland Africa queried from
rnaturalearth
and get processed.
Usage
mainland_africa
Format
A sf
with one rows and 2 fields
- name
(
character
) The name of the polygon: Africa- area
(
units
) The united number of the overall area in km2. This is not a consensus area, but just a calculated area under this resolution.- geometry
(
sfc
) The simple polygon feature of the boundary
Source
rnaturalearth
Calculate marginal responses of each variables.
Description
Calculate the marginal responses of each variables within the model.
Usage
marginal_response(model, var_occ, variables, si = 1000, visualize = FALSE)
Arguments
model |
(Any predictive model). In this package, it is |
var_occ |
( |
variables |
( |
si |
( |
visualize |
( |
Details
The values show how each environmental variable affects the modeling
prediction. They show how the predicted result changes as each environmental
variable is varied while keeping all other environmental variables at average
sample value. They might be hard to interpret if there are strongly correlated
variables. The users could use dim_reduce
function to remove
the strong correlation from original environmental variable stack.
Value
(MarginalResponse
) A nested list of
responses_cont (
list
) A list of response values of continuous variablesresponses_cat (
list
) A list of response values of categorical variables
References
Elith, Jane, et al. "The evaluation strip: a new and robust method for plotting predicted responses from species distribution models." Ecological modelling 186.3 (2005): 280-289.doi:10.1016/j.ecolmodel.2004.12.007
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
marginal_responses <- marginal_response(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables)
plot(marginal_responses)
#'
Occurrence dataset of a virtual species
Description
A pseudo presence-absence occurrence dataset of a virtual species made
by package virtualspecies
.
Usage
occ_virtual_species
Format
A data.frame
with 300 rows and 2 fields
- x
(
numeric
) The x coordinates of the records in WGS84 geographic coordinate system- y
(
numeric
) The y coordinates of the records in WGS84 geographic coordinate system- observation
(
numeric
) The observations of presence and absence.- usage
(
character
) The usage of the occurrences, either be "train" as training set, or "eval" as test set.
Details
The environmental niche of the virtual species is made by defining its response functions to annual temperature and annual precipitation in mainland Africa. The response function of annual temperature is normal distribution with mean = 22 and standard deviation = 5. The response function of annual precipitation is normal distribution with mean = 1000 and standard deviation = 200. Then the suitability is convert to presence-absence map by logistic conversion with beta = 0.7, alpha = -0.05, and species prevalence = 0.27. Finally 500 presence-absence points are sampled across the whole region. Then these points were randomly split into train (0.7) and test set (0.3).
Source
virtualspecies
Display the figure and map of the EnviChange
object.
Description
Show the response curve and the map of contribution change from
detect_envi_change
.
Usage
## S3 method for class 'EnviChange'
plot(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
The same object that was passed as input.
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
#'
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
#'
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
#'
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
#'
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 1L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
# Use a fixed value
bio1_changes <- detect_envi_change(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1,
target_var = "bio1",
var_future = 5)
plot(bio1_changes)
Exhibit suspicious outliers in an observation dataset.
Description
Display observations and potential outliers diagnosed by
function suspicious_env_outliers
in a dataset.
Usage
## S3 method for class 'EnvironmentalOutlier'
plot(x, overlay_raster = NULL, pts_alpha = 0.5, ...)
Arguments
x |
( |
overlay_raster |
( |
pts_alpha |
( |
... |
Not used. |
Value
A ggplot2
figure of outliers distribution among all observations.
See Also
suspicious_env_outliers
, print.EnvironmentalOutlier
Examples
library(dplyr)
library(sf)
library(stars)
library(itsdm)
data("occ_virtual_species")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
occ_outliers <- suspicious_env_outliers(
occ = occ_virtual_species, variables = env_vars,
z_outlier = 3.5, outliers_print = 4L)
plot(occ_outliers)
plot(occ_outliers,
overlay_raster = env_vars %>% slice('band', 1))
Show independent response curves.
Description
Plot independent response curves using ggplot2 by optionally set target variable(s).
Usage
## S3 method for class 'IndependentResponse'
plot(x, target_var = NA, smooth_span = 0.3, ...)
Arguments
x |
( |
target_var |
( |
smooth_span |
( |
... |
Not used. |
Value
ggplot2
figure of response curves
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
independent_responses <- independent_response(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables)
plot(independent_responses)
Show marginal response curves.
Description
Plot marginal response curves using ggplot2 by optionally set target variable(s).
Usage
## S3 method for class 'MarginalResponse'
plot(x, target_var = NA, smooth_span = 0.3, ...)
Arguments
x |
( |
target_var |
( |
smooth_span |
( |
... |
Not used. |
Value
ggplot2
figure of response curves
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
marginal_responses <- marginal_response(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables)
plot(marginal_responses, target_var = 'bio1')
Display results of conversion to presence-absence (PA).
Description
Display raster of suitability, probability of occurrence, presence-absence binary map from presence-absence (PA) conversion.
Usage
## S3 method for class 'PAConversion'
plot(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
A patchwork
of ggplot2
figure of suitability, probability of occurrence,
presence-absence binary map.
See Also
convert_to_pa
, print.PAConversion
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
# Threshold conversion
pa_thred <- convert_to_pa(mod$prediction,
method = 'threshold', beta = 0.5)
plot(pa_thred)
Show model evaluation.
Description
Display informative and detailed figures of continuous Boyce index, AUC curves, and TSS curve.
Usage
## S3 method for class 'POEvaluation'
plot(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
A patchwork
of ggplot2
figure of AUC_ratio, AUC_background and CBI.
See Also
evaluate_po
, print.POEvaluation
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
eval_train <- evaluate_po(
mod$model,
occ_pred = mod$pred_train$prediction,
var_pred = na.omit(as.vector(mod$prediction[[1]])))
plot(eval_train)
Display Shapley values-based spatial variable dependence maps.
Description
Plot Shapley values-based spatial variable dependence maps
using ggplot2 by optionally setting target variable(s). This only works for
SHAPSpatial
even though it is part of SpatialResponse
.
Usage
## S3 method for class 'SHAPSpatial'
plot(x, target_var = NA, ...)
Arguments
x |
( |
target_var |
( |
... |
Not used. |
Value
ggplot2
figure of dependent maps
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
shap_spatial <- shap_spatial_response(
model = mod$model,
target_vars = c("bio1", "bio12"),
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1)
plot(shap_spatial)
plot(shap_spatial, target_var = "bio1")
Show variable dependence plots and variable interaction plots obtained from Shapley values.
Description
Plot Shapley value-based variable dependence curves using ggplot2 by optionally selecting target variable(s). It also can plot the interaction between a related variable to the selected variable(s).
Usage
## S3 method for class 'ShapDependence'
plot(
x,
target_var = NA,
related_var = NA,
sample_prop = 0.3,
sample_bin = 100,
smooth_line = TRUE,
seed = 123,
...
)
Arguments
x |
( |
target_var |
( |
related_var |
( |
sample_prop |
( |
sample_bin |
( |
smooth_line |
( |
seed |
( |
... |
Other arguments passed on to |
Details
If the number of samples is more than 1000, a stratified sampling is used to thin the sample pool, and then plot its subset. The user could set a proportion to sample and a number of bins for stratified sampling.
Value
ggplot2
figure of dependent curves
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_dependence <- shap_dependence(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables)
plot(var_dependence, target_var = 'bio1', related_var = 'bio12')
Display spatial variable dependence maps.
Description
Plot spatial variable dependence maps using ggplot2 by optionally setting target variable(s).
Usage
## S3 method for class 'SpatialResponse'
plot(x, target_var = NA, ...)
Arguments
x |
( |
target_var |
( |
... |
Not used. |
Value
ggplot2
figure of dependent maps
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
spatial_responses <- spatial_response(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 10)
plot(spatial_responses)
plot(spatial_responses, target_var = 'bio1')
Display variable importance.
Description
Display informative and detailed figures of variable importance.
Usage
## S3 method for class 'VariableAnalysis'
plot(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
A patchwork
of ggplot2
figure of variable importance
according to multiple metrics.
See Also
variable_analysis
, print.VariableAnalysis
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_analysis <- variable_analysis(
model = mod$model,
pts_occ = mod$observation,
pts_occ_test = mod$independent_test,
variables = mod$variables)
plot(var_analysis)
Exhibit variable contribution for target observations.
Description
Use ggplot2 to plot variable contribution for each target observation separately or summarize the overall variable contribution across all selected observations.
Usage
## S3 method for class 'VariableContribution'
plot(x, plot_each_obs = FALSE, num_features = 5, ...)
Arguments
x |
( |
plot_each_obs |
( |
num_features |
( |
... |
Not used. |
Value
ggplot2
figure of Variable Contribution.
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_contribution <- variable_contrib(
model = mod$model,
var_occ = mod$vars_train,
var_occ_analysis = mod$vars_train %>% slice(1:10))
# Plot variable contribution to each observation
plot(var_contribution,
plot_each_obs = TRUE,
num_features = 3)
# Plot the summarized contribution
plot(var_contribution)
Print summary information from EnviChange
object.
Description
Display the detected tipping points and percentage of affected
areas due to a changing variable from function
detect_envi_change
.
Usage
## S3 method for class 'EnviChange'
print(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
The same object that was passed as input.
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
#'
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
#'
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
#'
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
#'
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 1L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
# Use a fixed value
bio1_changes <- detect_envi_change(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1,
target_var = "bio1",
var_future = 5)
print(bio1_changes)
Print summary information from EnvironmentalOutlier
object.
Description
Display the environmental variable values comparing to the mean values of the detected environmental outliers in observations.
Usage
## S3 method for class 'EnvironmentalOutlier'
print(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
The same object that was passed as input.
See Also
suspicious_env_outliers
, plot.EnvironmentalOutlier
Examples
library(dplyr)
library(sf)
library(stars)
library(itsdm)
data("occ_virtual_species")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
occ_outliers <- suspicious_env_outliers(
occ = occ_virtual_species, variables = env_vars,
z_outlier = 5, outliers_print = 4L)
print(occ_outliers)
Print summary information from FormatOccurrence
object.
Description
Display the type and number of training and evaluation dataset
in the formatted observations obtained by
function format_observation
.
Usage
## S3 method for class 'FormatOccurrence'
print(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
The same object that was passed as input.
See Also
Examples
library(dplyr)
library(itsdm)
data("occ_virtual_species")
# obs + eval, presence-absence
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
obs_type <- "presence_absence"
obs_formatted <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = obs_type)
print(obs_formatted)
Print summary information from PAConversion
object.
Description
Display the equation and parameters of a PAConversion
object.
Usage
## S3 method for class 'PAConversion'
print(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
The same object that was passed as input.
See Also
convert_to_pa
, plot.PAConversion
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
# Threshold conversion
pa_thred <- convert_to_pa(mod$prediction, method = 'threshold', beta = 0.5)
print(pa_thred)
Print summary information from model evaluation object (POEvaluation
).
Description
Display the most general and informative characteristics of a model evaluation object.
Usage
## S3 method for class 'POEvaluation'
print(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
The same object that was passed as input.
See Also
evaluate_po
, plot.POEvaluation
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
eval_train <- evaluate_po(mod$model,
occ_pred = mod$pred_train$prediction,
var_pred = na.omit(as.vector(mod$prediction[[1]])))
print(eval_train)
Print summary information from POIsotree
object.
Description
Display the most general and informative characteristics of a fitted POIsotree object. It includes the model information, model evaluation, variable analysis, etc.
Usage
## S3 method for class 'POIsotree'
print(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
The same object that was passed as input.
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
print(mod)
Print summary information from ReducedImageStack
object.
Description
Display the most general and informative characteristics of a ReducedImageStack object, including the set threshold, original variables, and the selected variables and the correlations between them.
Usage
## S3 method for class 'ReducedImageStack'
print(x, ...)
Arguments
x |
( |
... |
Not used. |
Value
The same object that was passed as input.
See Also
Examples
library(itsdm)
library(dplyr)
library(stars)
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars()
img_reduced <- dim_reduce(env_vars, threshold = 0.7,
preferred_vars = c('bio1', 'bio12'))
print(img_reduced)
Print summary information from variable importance object
(VariableAnalysis
).
Description
Display non-visualized information of a VariableAnalysis
object returned by function variable_analysis
.
Usage
## S3 method for class 'VariableAnalysis'
print(x, ...)
Arguments
x |
( |
... |
Not used. |
Details
For Jackknife test, if the value is positive, print as "/". If the value is negative, then print as "\". For Shapley values based test, print as "#" since there is no negative value and in order to distinguish this characteristic with Jackknife test.
Value
The same object that was passed as input.
See Also
variable_analysis
, plot.VariableAnalysis
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 2L,
seed = 123L, response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_analysis <- variable_analysis(
model = mod$model,
pts_occ = mod$observation,
pts_occ_test = mod$independent_test,
variables = mod$variables)
print(var_analysis)
Estimate suitability on stars
object using trained isolation.forest
model.
Description
Apply an isolation.forest
model on a stars object to calculate
environmental suitability and do quantile stretch to [0, 1]
.
Usage
probability(x, vars, offset = 0)
Arguments
x |
( |
vars |
( |
offset |
( |
Value
a stars
of predicted habitat suitability
See Also
Examples
## Not run:
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
suit <- probability(mod$model, mod$variables)
## End(Not run)
Calculate Shapley value-based variable dependence.
Description
Calculate how a species responses to environmental variables using Shapley values.
Usage
shap_dependence(
model,
var_occ,
variables,
si = 1000,
shap_nsim = 100,
visualize = FALSE,
seed = 10,
pfun = .pfun_shap
)
Arguments
model |
( |
var_occ |
( |
variables |
( |
si |
( |
shap_nsim |
( |
visualize |
( |
seed |
( |
pfun |
( |
Details
The values show how each environmental variable independently affects the modeling prediction. They show how the Shapley value of each variable changes as its value is varied.
Value
(ShapDependence
) A list of
dependences_cont (
list
) A list of Shapley values of continuous variablesdependences_cat (
list
) A list of Shapley values of categorical variablesfeature_values (
data.frame
) A table of feature values
References
Strumbelj, Erik, and Igor Kononenko. "Explaining prediction models and individual predictions with feature contributions." Knowledge and information systems 41.3 (2014): 647-665.doi:10.1007/s10115-013-0679-x
See Also
plot.ShapDependence
explain
in fastshap
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_dependence <- shap_dependence(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables)
plot(var_dependence, target_var = "bio1", related_var = "bio16")
## Not run:
##### Use Random Forest model as an external model ########
library(randomForest)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>%
filter(usage == "train")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12)) %>%
split()
model_data <- stars::st_extract(
env_vars, at = as.matrix(obs_df %>% select(x, y))) %>%
as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>%
mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)
mod_rf <- randomForest(
occ ~ .,
data = model_data,
ntree = 200)
pfun <- function(X.model, newdata) {
# for data.frame
predict(X.model, newdata, type = "prob")[, "1"]
}
shap_dependences <- shap_dependence(
model = mod_rf,
var_occ = model_data %>% select(-occ),
variables = env_vars,
visualize = FALSE,
seed = 10,
pfun = pfun)
## End(Not run)
Calculate shapley values-based spatial response.
Description
Calculate spatially SHAP-based response figures. They can help to diagnose both how and where the species responses to environmental variables.
Usage
shap_spatial_response(
model,
var_occ,
variables,
target_vars = NULL,
shap_nsim = 10,
seed = 10,
pfun = .pfun_shap
)
Arguments
model |
( |
var_occ |
( |
variables |
( |
target_vars |
(a |
shap_nsim |
( |
seed |
( |
pfun |
( |
Details
The values show how each environmental variable affects the modeling prediction in space. These maps could help to answer questions of where in terms of environmental response.
Value
(SHAPSpatial
) A list of
A list of stars
object of spatially SHAP-based response of all variables
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
shap_spatial <- shap_spatial_response(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1)
shap_spatial <- shap_spatial_response(
model = mod$model,
target_vars = c("bio1", "bio12"),
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1)
## Not run:
##### Use Random Forest model as an external model ########
library(randomForest)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>%
filter(usage == "train")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12)) %>%
split()
model_data <- stars::st_extract(
env_vars, at = as.matrix(obs_df %>% select(x, y))) %>%
as.data.frame()
names(model_data) <- names(env_vars)
model_data <- model_data %>%
mutate(occ = obs_df[['observation']])
model_data$occ <- as.factor(model_data$occ)
mod_rf <- randomForest(
occ ~ .,
data = model_data,
ntree = 200)
pfun <- function(X.model, newdata) {
# for data.frame
predict(X.model, newdata, type = "prob")[, "1"]
}
shap_spatial <- shap_spatial_response(
model = mod_rf,
target_vars = c("bio1", "bio12"),
var_occ = model_data %>% select(-occ),
variables = env_vars,
shap_nsim = 10,
pfun = pfun)
## End(Not run)
Calculate spatial response or dependence figures.
Description
Calculate spatially marginal, independence, and SHAP-based response figures. They can help to diagnose both how and where the species responses to environmental variables.
Usage
spatial_response(
model,
var_occ,
variables,
shap_nsim = 0,
seed = 10L,
visualize = FALSE
)
Arguments
model |
( |
var_occ |
( |
variables |
( |
shap_nsim |
( |
seed |
( |
visualize |
( |
Details
The values show how each environmental variable affects the modeling prediction in space. These maps could help to answer questions of where in terms of environmental response. Compared to marginal dependence or independent dependence maps, SHAP-based maps are way more informative because SHAP-based dependence explain the contribution of each variable to final result.
Value
(SpatialResponse
) A list of
spatial_marginal_response (
list
) A list ofstars
object of spatially marginal response of all variablesspatial_independent_response (
list
) A list ofstars
object of spatially independent response of all variablesspatial_shap_dependence (
list
) A list ofstars
object of spatially SHAP-based response of all variables
See Also
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 20,
sample_size = 0.8, ndim = 1L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
spatial_responses <- spatial_response(
model = mod$model,
var_occ = mod$vars_train,
variables = mod$variables,
shap_nsim = 1)
plot(spatial_responses)
#'
Function to detect suspicious outliers based on environmental variables.
Description
Run outlier.tree
to detect suspicious outliers in observations.
Usage
suspicious_env_outliers(
occ,
occ_crs = 4326,
variables,
rm_outliers = FALSE,
seed = 10L,
...,
visualize = TRUE
)
Arguments
occ |
( |
occ_crs |
( |
variables |
( |
rm_outliers |
( |
seed |
( |
... |
Other arguments passed to function |
visualize |
( |
Details
Please check more details in R documentation of function
outlier.tree
in package outliertree
and their GitHub.
Value
(EnvironmentalOutlier
) A list that contains
outlier_details (
tibble
) A table of outlier details returned from functionoutlier.tree
in packageoutliertree
pts_occ (
sf
) Thesf
points of occurrence. Ifrm_outliers
isTRUE
, outliers are deleted from points of occurrence. IfFALSE
, the full observations are returned.
References
See Also
print.EnvironmentalOutlier
, plot.EnvironmentalOutlier
outlier.tree
in package outliertree
Examples
library(dplyr)
library(sf)
library(stars)
library(itsdm)
data("occ_virtual_species")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
occ_outliers <- suspicious_env_outliers(
occ = occ_virtual_species, variables = env_vars,
z_outlier = 3.5, outliers_print = 4L, nthreads = 1)
occ_outliers
plot(occ_outliers)
Function to evaluate relative importance of each variable.
Description
Evaluate relative importance of each variable within the model using the following methods:
Jackknife test based on AUC ratio and Pearson correlation between the result of model using all variables
SHapley Additive exPlanations (SHAP) according to Shapley values
Usage
variable_analysis(
model,
pts_occ,
pts_occ_test = NULL,
variables,
shap_nsim = 100,
visualize = FALSE,
seed = 10
)
Arguments
model |
( |
pts_occ |
( |
pts_occ_test |
( |
variables |
( |
shap_nsim |
( |
visualize |
( |
seed |
( |
Details
Jackknife test of variable importance is reflected as the decrease in a model performance when an environmental variable is used singly or is excluded from the environmental variable pool. In this function, we used Pearson correlation and AUC ratio.
Pearson correlation is the correlation between the predictions generated by different variable importance evaluation methods and the predictions generated by the full model as the assessment of mode performance.
The area under the ROC curve (AUC) is a threshold-independent evaluator of model performance, which needs both presence and absence data. A ROC curve is generated by plotting the proportion of correctly predicted presence on the y-axis against 1 minus the proportion of correctly predicted absence on x-axis for all thresholds. Multiple approaches have been used to evaluate accuracy of presence-only models. Peterson et al. (2008) modified AUC by plotting the proportion of correctly predicted presence against the proportion of presences falling above a range of thresholds against the proportion of cells of the whole area falling above the range of thresholds. This is the so called AUC ratio that is used in this package.
SHapley Additive exPlanations (SHAP) uses Shapley values to evaluate the variable importance. The larger the absolute value of Shapley value, the more important this variable is. Positive Shapley values mean positive affect, while negative Shapely values mean negative affect. Please check references for more details if you are interested in.
Value
(VariableAnalysis
) A list of
variables (
vector
ofcharacter
) The names of environmental variablespearson_correlation (
tibble
) A table of Jackknife test based on Pearson correlationfull_AUC_ratio (
tibble
) A table of AUC ratio of training and test dataset using all variables, that act as references for Jackknife testAUC_ratio (
tibble
) A table of Jackknife test based on AUC ratioSHAP (
tibble
) A table of Shapley values of training and test dataset separately
References
Peterson, A. Townsend, Monica Papeş, and Jorge Soberón. "Rethinking receiver operating characteristic analysis applications in ecological niche modeling." Ecological modelling 213.1 (2008): 63-72.doi:10.1016/j.ecolmodel.2007.11.008
Strumbelj, Erik, and Igor Kononenko. "Explaining prediction models and individual predictions with feature contributions." Knowledge and information systems 41.3 (2014): 647-665.doi:10.1007/s10115-013-0679-x
See Also
plot.VariableAnalysis
, print.VariableAnalysis
explain
in fastshap
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12, 16))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 10,
sample_size = 0.8, ndim = 2L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_analysis <- variable_analysis(
model = mod$model,
pts_occ = mod$observation,
pts_occ_test = mod$independent_test,
variables = mod$variables)
plot(var_analysis)
Evaluate variable contributions for targeted observations.
Description
Evaluate variable contribution for targeted observations according to SHapley Additive exPlanations (SHAP).
Usage
variable_contrib(
model,
var_occ,
var_occ_analysis,
shap_nsim = 100,
visualize = FALSE,
seed = 10,
pfun = .pfun_shap
)
Arguments
model |
( |
var_occ |
( |
var_occ_analysis |
( |
shap_nsim |
( |
visualize |
( |
seed |
( |
pfun |
( |
Value
(VariableContribution
) A list of
shapley_values (
data.frame
) A table of Shapley values of each variables for all observationsfeature_values (
tibble
) A table of values of each variables for all observations
References
See Also
plot.VariableContribution
explain
in fastshap
Examples
# Using a pseudo presence-only occurrence dataset of
# virtual species provided in this package
library(dplyr)
library(sf)
library(stars)
library(itsdm)
# Prepare data
data("occ_virtual_species")
obs_df <- occ_virtual_species %>% filter(usage == "train")
eval_df <- occ_virtual_species %>% filter(usage == "eval")
x_col <- "x"
y_col <- "y"
obs_col <- "observation"
# Format the observations
obs_train_eval <- format_observation(
obs_df = obs_df, eval_df = eval_df,
x_col = x_col, y_col = y_col, obs_col = obs_col,
obs_type = "presence_only")
env_vars <- system.file(
'extdata/bioclim_tanzania_10min.tif',
package = 'itsdm') %>% read_stars() %>%
slice('band', c(1, 5, 12))
# With imperfect_presence mode,
mod <- isotree_po(
obs_mode = "imperfect_presence",
obs = obs_train_eval$obs,
obs_ind_eval = obs_train_eval$eval,
variables = env_vars, ntrees = 5,
sample_size = 0.8, ndim = 1L,
seed = 123L, nthreads = 1,
response = FALSE,
spatial_response = FALSE,
check_variable = FALSE)
var_contribution <- variable_contrib(
model = mod$model,
var_occ = mod$vars_train,
var_occ_analysis = mod$vars_train %>% slice(1:2))
## Not run:
plot(var_contribution,
num_features = 3,
plot_each_obs = TRUE)
# Plot together
plot(var_contribution)
## End(Not run)
Download environmental variables made by worldclim version 2.1.
Description
Parse historic worldclim version 2.1 variables with a setting of boundary and a few other options.
Usage
worldclim2(
var = "tmin",
res = 10,
bry = NULL,
path = NULL,
nm_mark = "clip",
return_stack = TRUE
)
Arguments
var |
( |
res |
( |
bry |
( |
path |
( |
nm_mark |
( |
return_stack |
( |
Details
Web page page for this dataset
Value
if return_stack
is TRUE
, the images would be
returned as a stars
. Otherwise, nothing to return, but the user
would receive a message of where the images are.
Note
If it fails due to slow internet, try to set a larger timeout option,
e.g., using options(timeout = 1e3)
.
References
Fick, Stephen E., and Robert J. Hijmans. "WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas." International journal of climatology 37.12 (2017): 4302-4315.doi:10.1002/joc.5086
Examples
## Not run:
library(sf)
library(itsdm)
bry <- sf::st_polygon(
list(rbind(c(29.34, -11.72), c(29.34, -0.95),
c(40.31, -0.95), c(40.31, -11.72),
c(29.34, -11.72)))) %>%
st_sfc(crs = 4326)
bios <- worldclim2(var = "tmin", res = 10,
bry = bry, nm_mark = 'exp', path = tempdir())
## End(Not run)