Help for package geocmeans

Type:

Package

Title:

Implementing Methods for Spatial Fuzzy Unsupervised Classification

Version:

0.3.4

Maintainer:

Jeremy Gelb <jeremy.gelb@ucs.inrs.ca>

Imports:

ggplot2 (≥ 3.2.1), tmap (≥ 3.3-1), spdep (≥ 1.1.2), reldist (≥ 1.6.6), dplyr (≥ 0.8.3), fclust (≥ 2.1.1), fmsb (≥ 0.7.0), future.apply (≥ 1.4.0), progressr (≥ 0.4.0), reshape2 (≥ 1.4.4), stats (≥ 3.5), grDevices (≥ 3.5), shiny (≥ 1.6.0), sf (≥ 1.0-6), leaflet (≥ 2.1.1), plotly (≥ 4.9.3), Rdpack (≥ 2.1.1), matrixStats (≥ 0.58.0), methods (≥ 3.5), terra (≥ 1.6-47), Rcpp (≥ 1.0.6)

Depends:

R (≥ 3.5)

Suggests:

knitr (≥ 1.28), rmarkdown (≥ 2.1), markdown (≥ 1.1), future (≥ 1.16.0), ppclust (≥ 1.1.0), ClustGeo (≥ 2.0), car (≥ 3.0-7), rgl (≥ 0.100), ggpubr (≥ 0.2.5), RColorBrewer (≥ 1.1-2), kableExtra (≥ 1.1.0), viridis (≥ 0.5.1), testthat (≥ 3.0.0), bslib (≥ 0.2.5), shinyWidgets (≥ 0.6), shinyhelper (≥ 0.3.2), waiter (≥ 0.2.2), classInt(≥ 0.4-3), covr

License:

GPL-2

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.2.3

VignetteBuilder:

knitr

Description:

Provides functions to apply spatial fuzzy unsupervised classification, visualize and interpret results. This method is well suited when the user wants to analyze data with a fuzzy clustering algorithm and to account for the spatial dimension of the dataset. In addition, indexes for estimating the spatial consistency and classification quality are proposed. The methods were originally proposed in the field of brain imagery (seed Cai and al. 2007 <doi:10.1016/j.patcog.2006.07.011> and Zaho and al. 2013 <doi:10.1016/j.dsp.2012.09.016>) and recently applied in geography (see Gelb and Apparicio <doi:10.4000/cybergeo.36414>).

URL:

https://github.com/JeremyGelb/geocmeans

BugReports:

https://github.com/JeremyGelb/geocmeans/issues

RdMacros:

Rdpack

LinkingTo:

Rcpp, RcppArmadillo

SystemRequirements:

C++17

Language:

en-CA

NeedsCompilation:

yes

Packaged:

2023-09-12 02:04:57 UTC; Gelb

Author:

Jeremy Gelb

[aut, cre], Philippe Apparicio

[ctb]

Repository:

CRAN

Date/Publication:

2023-09-12 03:10:02 UTC

SpatRaster of the bay of Arcachon

Description

A Landsat 8 image of the bay of Arcachon (France), with a resolution of 30mx30m and 6 bands: blue, green, red, near infrared, shortwave infrared 1 and shortwave infrared 2. The dataset is saved as a Large RasterBrick with the package raster and has the following crs: EPSG:32630. It is provided as a tiff file.

Usage

load_arcachon()

Format

A spaRast with 6 bands

blue: wavelength: 0.45-0.51
green: wavelength: 0.53-0.59
red: wavelength: 0.64-0.67
near infrared: wavelength: 0.85-0.88
shortwave infrared: wavelength: 1.57-1.65
shortwave infrared: wavelength: 2.11-2.29

Source

https://earthexplorer.usgs.gov/

Examples

# loading directly from file
Arcachon <- terra::rast(system.file("extdata/Littoral4_2154.tif", package = "geocmeans"))
names(Arcachon) <- c("blue", "green", "red", "infrared", "SWIR1", "SWIR2")

# loading with the provided function
Arcachon <- load_arcachon()

C-means

Description

The classical c-mean algorithm

Usage

CMeans(
  data,
  k,
  m,
  maxiter = 500,
  tol = 0.01,
  standardize = TRUE,
  robust = FALSE,
  noise_cluster = FALSE,
  delta = NULL,
  verbose = TRUE,
  init = "random",
  seed = NULL
)

Arguments

data

A dataframe with only numerical variables. Can also be a list of rasters (produced by the package raster). In that case, each raster is considered as a variable and each pixel is an observation. Pixels with NA values are not used during the classification.

k

An integer describing the number of cluster to find

m

A float for the fuzziness degree

maxiter

An integer for the maximum number of iterations

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

standardize

A boolean to specify if the variables must be centred and reduced (default = True)

robust

A boolean indicating if the "robust" version of the algorithm must be used (see details)

noise_cluster

A boolean indicatong if a noise cluster must be added to the solution (see details)

delta

A float giving the distance of the noise cluster to each observation

verbose

A boolean to specify if the progress should be printed

init

A string indicating how the initial centres must be selected. "random" indicates that random observations are used as centres. "kpp" use a distance-based method resulting in more dispersed centres at the beginning. Both of them are heuristic.

seed

An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected.

Value

An S3 object of class FCMres with the following slots

Centers: a dataframe describing the final centers of the groups
Belongings: the final membership matrix
Groups: a vector with the names of the most likely group for each observation
Data: the dataset used to perform the clustering (might be standardized)
isRaster: TRUE if rasters were used as input data, FALSE otherwise
k: the number of groups
m: the fuzyness degree
alpha: the spatial weighting parameter (if SFCM or SGFCM)
beta: beta parameter for generalized version of FCM (GFCM or SGFCM)
algo: the name of the algorithm used
rasters: a list of rasters with membership values and the most likely group (if rasters were used)
missing: a boolean vector indicating raster cell with data (TRUE) and with NA (FALSE) (if rasters were used)
maxiter: the maximum number of iterations used
tol: the convergence criterio
lag_method: the lag function used (if SFCM or SGFCM)
nblistw: the neighbours list used (if vector data were used for SFCM or SGFCM)
window: the window used (if raster data were used for SFCM or SGFCM)

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
result <- CMeans(dataset,k = 5, m = 1.5, standardize = TRUE)

Elsa statistic calculated on a matrix with a given window

Description

method described here : https://doi.org/10.1016/j.spasta.2018.10.001

Usage

Elsa_categorical_matrix_window(mat, window, dist)

Arguments

mat

an IntegerMatrix, must be filled with integer, -1 indicates NA values, categories must start at 0

window

the window to use to define neighbours. 0 can be used to indicate that a cell is not a neighbour

dist

a distance matrix between the categories

Value

a NumericVector : the local values of ELSA

Fuzzy Elsa statistic calculated on a matrix with a given window

Description

This is an extension to the fuzzy classification case for the Elsa statistic

Usage

Elsa_fuzzy_matrix_window(mats, window, dist)

Arguments

mats

An array, each slice must contains the membership values of one group

window

the window to use to define neighbours. 0 can be used to indicate that a cell is not a neighbour

dist

a distance matrix between the groups

Value

a NumericVector : the local values of ELSA

Instantiate a FCMres object

Description

Instantiate a FCMres object from a list

Usage

FCMres(obj)

Arguments

obj

A list, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans

Details

Creating manually a FCMres object can be handy to use geocmeans functions on results from external algorithms. The list given to function FCMres must contain 5 necessary parameters:

Centers: a dataframe or matrix describing the final centers of the groups
Belongings: a membership matrix
Data: the dataset used to perform the clustering. It must be a dataframe or a matrix. If a list is given, then the function assumes that the classification occured on rasters (see information below)
m: the fuzyness degree (1 if hard clustering is used)
algo: the name of the algorithm used

Note that the S3 method predict is available only for object created with the functions CMeans, GCMeans, SFCMeans, SGFCMeans.

When working with rasters, Data must be a list of rasters, and a second list of rasters with the membership values must be provided is en extra slot named "rasters". In that case, Belongings has not to be defined and will be created automatically.

Warning: the order of the elements is very important. The first row in the matrix "Centers", and the first column in the matrix "Belongings" must both be related to the same group and so on. When working with raster data, the first row in the matrix "Centers" must also match with the first rasterLayer in the list "rasters".

Value

An object of class FCMres

Examples

#This is an internal function, no example provided

Generalized C-means

Description

The generalized c-mean algorithm

Usage

GCMeans(
  data,
  k,
  m,
  beta,
  maxiter = 500,
  tol = 0.01,
  standardize = TRUE,
  robust = FALSE,
  noise_cluster = FALSE,
  delta = NULL,
  verbose = TRUE,
  init = "random",
  seed = NULL
)

Arguments

data

k

An integer describing the number of cluster to find

m

A float for the fuzziness degree

beta

A float for the beta parameter (control speed convergence and classification crispness)

maxiter

An integer for the maximum number of iterations

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

standardize

A boolean to specify if the variables must be centred and reduced (default = True)

robust

A boolean indicating if the "robust" version of the algorithm must be used (see details)

noise_cluster

A boolean indicatong if a noise cluster must be added to the solution (see details)

delta

A float giving the distance of the noise cluster to each observation

verbose

A boolean to specify if the progress should be printed

init

seed

An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected.

Value

An S3 object of class FCMres with the following slots

Centers: a dataframe describing the final centers of the groups
Belongings: the final membership matrix
Groups: a vector with the names of the most likely group for each observation
Data: the dataset used to perform the clustering (might be standardized)
isRaster: TRUE if rasters were used as input data, FALSE otherwise
k: the number of groups
m: the fuzyness degree
alpha: the spatial weighting parameter (if SFCM or SGFCM)
beta: beta parameter for generalized version of FCM (GFCM or SGFCM)
algo: the name of the algorithm used
rasters: a list of rasters with membership values and the most likely group (if rasters were used)
missing: a boolean vector indicating raster cell with data (TRUE) and with NA (FALSE) (if rasters were used)
maxiter: the maximum number of iterations used
tol: the convergence criterio
lag_method: the lag function used (if SFCM or SGFCM)
nblistw: the neighbours list used (if vector data were used for SFCM or SGFCM)
window: the window used (if raster data were used for SFCM or SGFCM)

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
result <- GCMeans(dataset,k = 5, m = 1.5, beta = 0.5, standardize = TRUE)

social and environmental indicators for the Iris of the metropolitan region of Lyon (France)

Description

A dataset containing social and environmental data for the Iris of Lyon (France)

Usage

LyonIris

Format

A SpatialPolygonsDataFrame with 506 rows and 32 variables:

OBJECTID: a simple OID (integer)
INSEE_COM: the code of each commune (factor)
CODE_IRIS: the code of each unit area : iris (factor)
Lden: the annual daily mean noise exposure values in dB (numeric)
NO2: the annual mean of NO2 concentration in ug/m3 (numeric)
PM25: the annual mean of PM25 concentration in ug/m3 (numeric)
PM10: the annual mean of PM25 concentration in ug/m3 (numeric)
Pct0_14: the percentage of people that are 0 to 14 year old (numeric)
Pct_65: the percentage of people older than 64 (numeric)
Pct_Img: the percentage immigrants (numeric)
TxChom1564: the unemployment rate (numeric)
Pct_brevet: the percentage of people that obtained the college diploma (numeric)
NivVieMed: the median standard of living in euros (numeric)
VegHautPrt: the percentage of the iris surface covered by trees (numeric)
X: the X coordinate of the center of the Iris (numeric)
Y: the Y coordinate of the center of the Iris (numeric)

...

Source

https://data.grandlyon.com/portail/fr/accueil

SFCMeans

Description

spatial version of the c-mean algorithm (SFCMeans, FCM_S1)

Usage

SFCMeans(
  data,
  nblistw = NULL,
  k,
  m,
  alpha,
  lag_method = "mean",
  window = NULL,
  noise_cluster = FALSE,
  delta = NULL,
  maxiter = 500,
  tol = 0.01,
  standardize = TRUE,
  robust = FALSE,
  verbose = TRUE,
  init = "random",
  seed = NULL
)

Arguments

data

nblistw

A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input.

k

An integer describing the number of cluster to find

m

A float for the fuzziness degree

alpha

A float representing the weight of the space in the analysis (0 is a typical fuzzy-c-mean algorithm, 1 is balanced between the two dimensions, 2 is twice the weight for space)

lag_method

A string indicating if a classical lag must be used ("mean") or if a weighted median must be used ("median"). When working with rasters, a function can be given (or a string which will be parsed). It will be applied to all the pixels values in the matrix designated by the parameter window and weighted according to the values of this matrix. Typically, to obtain an average of the pixels in a 3x3 matrix one could use the function sum (or "sum") and set the window as: window <- matrix(1/9,nrow = 3, ncol = 3). There is one special case when working with rasters: one can specify "nl" (standing for non-local) which calculated a lagged version of the input rasters, using the inverse of the euclidean distance as spatial weights (see the section Advanced examples in the vignette introduction for more details).

window

If data is a list of rasters, then a window must be specified instead of a list.w object. It will be used to calculate a focal function on each raster. The window must be a square numeric matrix with odd dimensions (such 3x3). The values in the matrix indicate the weight to give to each pixel and the centre of the matrix is the centre of the focal function.

noise_cluster

A boolean indicatong if a noise cluster must be added to the solution (see details)

delta

A float giving the distance of the noise cluster to each observation

maxiter

An integer for the maximum number of iterations

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

standardize

A boolean to specify if the variables must be centred and reduced (default = True)

robust

A boolean indicating if the "robust" version of the algorithm must be used (see details)

verbose

A boolean to specify if the progress should be printed

init

seed

An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected.

Details

The implementation is based on the following article : doi:10.1016/j.patcog.2006.07.011.

the membership matrix (u) is calculated as follow

u_{ik} = \frac{(||x_{k} - v{_i}||^2 + \alpha||\bar{x_{k}} - v{_i}||^2)^{(-1/(m-1))}}{\sum_{j=1}^c(||x_{k} - v{_j}||^2 + \alpha||\bar{x_{k}} - v{_j}||^2)^{(-1/(m-1))}}

the centers of the groups are updated with the following formula

v_{i} = \frac{\sum_{k=1}^N u_{ik}^m(x_{k} + \alpha\bar{x_{k}})}{(1 + \alpha)\sum_{k=1}^N u_{ik}^m}

with

vi the center of the group vi
xk the data point k
xk_bar the spatially lagged data point k

Value

An S3 object of class FCMres with the following slots

Centers: a dataframe describing the final centers of the groups
Belongings: the final membership matrix
Groups: a vector with the names of the most likely group for each observation
Data: the dataset used to perform the clustering (might be standardized)
isRaster: TRUE if rasters were used as input data, FALSE otherwise
k: the number of groups
m: the fuzyness degree
alpha: the spatial weighting parameter (if SFCM or SGFCM)
beta: beta parameter for generalized version of FCM (GFCM or SGFCM)
algo: the name of the algorithm used
rasters: a list of rasters with membership values and the most likely group (if rasters were used)
missing: a boolean vector indicating raster cell with data (TRUE) and with NA (FALSE) (if rasters were used)
maxiter: the maximum number of iterations used
tol: the convergence criterio
lag_method: the lag function used (if SFCM or SGFCM)
nblistw: the neighbours list used (if vector data were used for SFCM or SGFCM)
window: the window used (if raster data were used for SFCM or SGFCM)

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)

SGFCMeans

Description

spatial version of the generalized c-mean algorithm (SGFCMeans)

Usage

SGFCMeans(
  data,
  nblistw = NULL,
  k,
  m,
  alpha,
  beta,
  lag_method = "mean",
  window = NULL,
  maxiter = 500,
  tol = 0.01,
  standardize = TRUE,
  robust = FALSE,
  noise_cluster = FALSE,
  delta = NULL,
  verbose = TRUE,
  init = "random",
  seed = NULL
)

Arguments

data

nblistw

A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input.

k

An integer describing the number of cluster to find

m

A float for the fuzziness degree

alpha

A float representing the weight of the space in the analysis (0 is a typical fuzzy-c-mean algorithm, 1 is balanced between the two dimensions, 2 is twice the weight for space)

beta

A float for the beta parameter (control speed convergence and classification crispness)

lag_method

window

maxiter

An integer for the maximum number of iterations

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

standardize

A boolean to specify if the variables must be centred and reduced (default = True)

robust

A boolean indicating if the "robust" version of the algorithm must be used (see details)

noise_cluster

A boolean indicatong if a noise cluster must be added to the solution (see details)

delta

A float giving the distance of the noise cluster to each observation

verbose

A boolean to specify if the progress should be printed

init

seed

An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected.

Details

The implementation is based on the following article : doi:10.1016/j.dsp.2012.09.016.

the membership matrix (u) is calculated as follow

u_{ik} = \frac{(||x_{k} - v{_i}||^2 -b_k + \alpha||\bar{x_{k}} - v{_i}||^2)^{(-1/(m-1))}}{\sum_{j=1}^c(||x_{k} - v{_j}||^2 -b_k + \alpha||\bar{x_{k}} - v{_j}||^2)^{(-1/(m-1))}}

the centers of the groups are updated with the following formula

v_{i} = \frac{\sum_{k=1}^N u_{ik}^m(x_{k} + \alpha\bar{x_{k}})}{(1 + \alpha)\sum_{k=1}^N u_{ik}^m}

with

vi the center of the group vi
xk the data point k
xk_bar the spatially lagged data point k

b_k = \beta \times min(||x_{k} - v||)

Value

An S3 object of class FCMres with the following slots

Centers: a dataframe describing the final centers of the groups
Belongings: the final membership matrix
Groups: a vector with the names of the most likely group for each observation
Data: the dataset used to perform the clustering (might be standardized)
isRaster: TRUE if rasters were used as input data, FALSE otherwise
k: the number of groups
m: the fuzyness degree
alpha: the spatial weighting parameter (if SFCM or SGFCM)
beta: beta parameter for generalized version of FCM (GFCM or SGFCM)
algo: the name of the algorithm used
rasters: a list of rasters with membership values and the most likely group (if rasters were used)
missing: a boolean vector indicating raster cell with data (TRUE) and with NA (FALSE) (if rasters were used)
maxiter: the maximum number of iterations used
tol: the convergence criterio
lag_method: the lag function used (if SFCM or SGFCM)
nblistw: the neighbours list used (if vector data were used for SFCM or SGFCM)
window: the window used (if raster data were used for SFCM or SGFCM)

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = TRUE)

sum of two matrices by column

Description

sum of two matrices by column

Usage

add_matrices_bycol(x, y)

Arguments

x

a matrix

y

a matrix with the same dimensions

Value

a matrix

Adjusted spatial inconsistency index for rasters

Description

Adjusted spatial inconsistency index for rasters

Usage

adj_spconsist_arr_window_globstd(data, memberships, window, mindist = 1e-11)

Arguments

data

an arma cube of dimension nr,nc,ns

memberships

an arma cube of dimension nr, nc, ks

window

a matrix representing the neighbouring of each pixel

mindist

A minimum value for distance between two observations. If two neighbours have exactly the same values, then the euclidean distance between them is 0, leading to an infinite spatial weight. In that case, the minimum distance is used instead of 0.

Value

a double, the adjusted spatial inconsitency index

Semantic adjusted spatial weights

Description

Function to adjust the spatial weights so that they represent semantic distances between neighbours

Usage

adjustSpatialWeights(data, listw, style, mindist = 1e-11)

Arguments

data

A dataframe with numeric columns

listw

A nb object from spdep

style

A letter indicating the weighting scheme (see spdep doc)

mindist

Value

A listw object (spdep like)

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
Wqueen2 <- adjustSpatialWeights(dataset,queen,style="C")

Bar plots

Description

Return bar plots to compare groups

Usage

barPlots(data, belongmatrix, ncol = 3, what = "mean")

Arguments

data

A dataframe with numeric columns

belongmatrix

A membership matrix

ncol

An integer indicating the number of columns for the bar plot

what

Can be "mean" (default) or "median"

Value

a barplot created with ggplot2

Examples

## Not run: 
data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
barPlots(dataset, result$Belongings)

## End(Not run)

membership matrix calculator for FCM algorithm

Description

membership matrix calculator for FCM algorithm

Usage

belongsFCM(data, centers, dots)

Arguments

data

a matrix (the dataset used for clustering)

centers

a matrix (the centers of the clusters)

dots

a list of other arguments specific to FCM

Value

a matrix with the new membership values

membership matrix calculator for GFCM algorithm

Description

membership matrix calculator for GFCM algorithm

Usage

belongsGFCM(data, centers, dots)

Arguments

data

a matrix (the dataset used for clustering)

centers

a matrix (the centers of the clusters)

dots

a list of other arguments specific to FCM

Value

a matrix with the new membership values

membership matrix calculator for SFCM algorithm

Description

membership matrix calculator for SFCM algorithm

Usage

belongsSFCM(data, centers, dots)

Arguments

data

a matrix (the dataset used for clustering)

centers

a matrix (the centers of the clusters)

dots

a list of other arguments specific to FCM

Value

a matrix with the new membership values

membership matrix calculator for SGFCM algorithm

Description

membership matrix calculator for SGFCM algorithm

Usage

belongsSGFCM(data, centers, dots)

Arguments

data

a matrix (the dataset used for clustering)

centers

a matrix (the centers of the clusters)

dots

a list of other arguments specific to FCM

Value

a matrix with the new membership values

Check the robustness of a classification by Bootstrap

Description

Check that the obtained groups are stable by bootstrap

Usage

boot_group_validation(
  object,
  nsim = 1000,
  maxiter = 1000,
  tol = 0.01,
  init = "random",
  verbose = TRUE,
  seed = NULL
)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans

nsim

The number of replications to do for the bootstrap evaluation

maxiter

An integer for the maximum number of iterations

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

init

A string indicating how the initial centres must be selected. "random" indicates that random observations are used as centres "kpp" use a distance-based method resulting in more dispersed centres at the beginning. Both of them are heuristic.

verbose

A boolean to specify if the progress bar should be displayed.

seed

An integer used for random number generation. It ensures that the starting centres will be the same if the same value is selected.

Details

Considering that the classification produced by a FCM like algorithm depends on its initial state, it is important to check if the groups obtained are stable. This function uses a bootstrap method to do so. During a selected number of iterations (at least 1000), a sample of size n (with replacement) is drawn from the original dataset. For each sample, the same classification algorithm is applied and the results are compared with the reference results. For each original group, the most similar group is identified by calculating the Jaccard similarity index between the columns of the two membership matrices. This index is comprised between 0 (exact difference) and 1 (perfect similarity) and a value is calculated for each group at each iteration. One can investigate the values obtained to determine if the groups are stable. Values under 0.5 are a concern and indicate that the group is dissolving. Values between 0.6 and 0.75 indicate a pattern in the data, but a significant uncertainty. Values above 0.8 indicate strong groups. The values of the centres obtained at each iteration are also returned, it is important to ensure that they approximately follow a normal distribution (or are at least unimodal).

Value

A list of two values: group_consistency: a dataframe indicating the consistency across simulations each cluster ; group_centres: a list with a dataframe for each cluster. The values in the dataframes are the centres of the clusters at each simulation.

Examples

## Not run: 
data(LyonIris)

#selecting the columns for the analysis
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14",
                   "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed")

#rescaling the columns
Data <- sf::st_drop_geometry(LyonIris[AnalysisFields])
for (Col in names(Data)){
  Data[[Col]] <- as.numeric(scale(Data[[Col]]))
}

Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456,
    tol = 0.00001, verbose = FALSE)

validation <- boot_group_validation(Cmean, nsim = 1000, maxiter = 1000,
    tol = 0.01, init = "random")

## End(Not run)

Check that the obtained groups are stable by bootstrap (multicore)

Description

Check that the obtained groups are stable by bootstrap with multicore support

Usage

boot_group_validation.mc(
  object,
  nsim = 1000,
  maxiter = 1000,
  tol = 0.01,
  init = "random",
  verbose = TRUE,
  seed = NULL
)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans

nsim

The number of replications to do for the bootstrap evaluation

maxiter

An integer for the maximum number of iterations

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

init

A string indicating how the initial centres must be selected. "random" indicates that random observations are used as centres. "kpp" use a distance based method resulting in more dispersed centres at the beginning. Both of them are heuristic.

verbose

A boolean to specify if the progress bar should be displayed.

seed

An integer to control randomness, default is NULL

Details

For more details, see the documentation of the function boot_group_validation

Value

Examples

## Not run: 
data(LyonIris)

#selecting the columns for the analysis
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14",
                   "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed")

#rescaling the columns
Data <- sf::st_drop_geometry(LyonIris[AnalysisFields])
for (Col in names(Data)){
  Data[[Col]] <- as.numeric(scale(Data[[Col]]))
}

Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456,
    tol = 0.00001, verbose = FALSE)

future::plan(future::multisession(workers=2))

validation <- boot_group_validation.mc(Cmean, nsim = 1000, maxiter = 1000,
    tol = 0.01, init = "random")
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)

## End(Not run)

Worker function for cluster bootstrapping

Description

Worker function for cluster bootstrapping

Usage

boot_worker(object, wdata, tol, maxiter, init)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans

wdata

The lagged dataset if necessary, can be NULL if not required

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

maxiter

An integer for the maximum number of iteration

init

Details

The worker function for the functions boot_group_validation and boot_group_validation.mc

Value

A list, similar to a FCMres object, but with only necessary slots for cluster bootstraping.

Examples

# this is an internal function, no example provided

Calculate the membership matrix

Description

Calculate the membership matrix according to a set of centroids, the observed data and the fuzziness degree

Usage

calcBelongMatrix(centers, data, m, sigmas)

Arguments

centers

A matrix or a dataframe representing the centers of the clusters with p columns and k rows

data

A dataframe or matrix representing the observed data with n rows and p columns

m

A float representing the fuzziness degree

sigmas

A numeric vector for calculating the robust version of the FCM. Filled with ones if the classical version is required

Value

A n * k matrix representing the probability of belonging of each observation to each cluster

Calculate the membership matrix with a noise cluster

Description

Calculate the membership matrix according to a set of centroids, the observed data and the fuzziness degree

Usage

calcBelongMatrixNoisy(centers, data, m, delta, sigmas)

Arguments

centers

A matrix or a dataframe representing the centers of the clusters with p columns and k rows

data

A dataframe or matrix representing the observed data with n rows and p columns

m

A float representing the fuzziness degree

delta

A float, the value set for delta by the user

sigmas

A numeric vector for calculating the robust version of the FCM. Filled with ones if the classical version is required

Value

A n * k matrix representing the probability of belonging of each observation to each cluster

Calinski-Harabasz index

Description

Calculate the Calinski-Harabasz index of clustering quality.

Usage

calcCalinskiHarabasz(data, belongmatrix, centers)

Arguments

data

The original dataframe used for the clustering (n*p)

belongmatrix

A membership matrix (n*k)

centers

The centres of the clusters

Details

The Calinski-Harabasz index (Da Silva et al. 2020) is the ratio between the clusters separation (between groups sum of squares) and the clusters cohesion (within groups sum of squares). A greater value indicates either more separated clusters or more cohesive clusters.

Value

A float: the Calinski-Harabasz index

References

Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
calcCalinskiHarabasz(result$Data, result$Belongings, result$Centers)

Calculate the centroids

Description

Calculate the new centroids of the clusters based on the membership matrix for a classical FCM.

Arguments

data

A Numeric matrix representing the observed data with n rows and p columns

belongmatrix

A n X k matrix giving for each observation n, its probability to belong to the cluster k

m

A float representing the fuzziness degree

Value

A a matrix with the centers calculated for each cluster

Davies-Bouldin index

Description

Calculate the Davies-Bouldin index of clustering quality.

Usage

calcDaviesBouldin(data, belongmatrix, centers)

Arguments

data

The original dataframe used for the clustering (n*p)

belongmatrix

A membership matrix (n*k)

centers

The centres of the clusters

Details

The Davies-Bouldin index (Da Silva et al. 2020) can be seen as the ratio of the within cluster dispersion and the between cluster separation. A lower value indicates a higher cluster compacity or a higher cluster separation. The formula is:

DB = \frac{1}{k}\sum_{i=1}^k{R_{i}}

with:

R_{i} =\max_{i \neq j}\left(\frac{S_{i}+S_{j}}{M_{i, j}}\right)

S_{l} =\left[\frac{1}{n_{l}} \sum_{l=1}^{n}\left\|\boldsymbol{x_{l}}-\boldsymbol{c_{i}}\right\|*u_{i}\right]^{\frac{1}{2}}

M_{i, j} =\sum\left\|\boldsymbol{c}_{i}-\boldsymbol{c}_{j}\right\|

So, the value of the index is an average of R_{i} values. For each cluster, they represent its worst comparison with all the other clusters, calculated as the ratio between the compactness of the two clusters and the separation of the two clusters.

Value

A float: the Davies-Bouldin index

References

Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
calcDaviesBouldin(result$Data, result$Belongings, result$Centers)

calculate ELSA statistic for a hard partition

Description

Calculate ELSA statistic for a hard partition. This local indicator of spatial autocorrelation can be used to determine where observations belong to different clusters.

Usage

calcELSA(object, nblistw = NULL, window = NULL, matdist = NULL)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a vector of categories. This vector must be filled with integers starting from 1. -1 can be used to indicate missing categories.

nblistw

A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input.

window

A binary (0,1) matrix representing the neighbours spatial weights when working with rasters. The matrix must have odd dimensions.

matdist

A matrix representing the dissimilarity between the clusters. The matrix must be squared and the diagonal must be filled with zeros.

Details

The ELSA index (Naimi et al. 2019) can be used to measure local autocorrelation for a categorical variable. It varies between 0 and 1, 0 indicating a perfect positive spatial autocorrelation and 1 a perfect heterogeneity. It is based on the Shanon entropy index, and uses a measure of difference between categories. Thus it can reflect that proximity of two similar categories is still a form of positive autocorelation. The authors suggest to calculate the mean of the index at several lag distance to create an entrogram which quantifies global spatial structure and can be represented as a variogram-like graph.

Value

A depending of the input, a vector of ELSA values or a raster with the ELSA values.

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
elsa_valus <- calcELSA(result)

Calculate the Euclidean distance

Description

Calculate the euclidean distance between a numeric matrix n * p and a numeric vector of length p

Usage

calcEuclideanDistance(m, v)

Arguments

m

A n * p matrix or dataframe with only numeric columns

v

A numeric vector of length p

Value

A vector of length n giving the euclidean distance between all matrix row and the vector p

Examples

#This is an internal function, no example provided

euclidean distance between rows of a matrix and a vector

Description

euclidean distance between rows of a matrix and a vector

Usage

calcEuclideanDistance2(y, x)

Arguments

y

a matrix

x

a vector (same length as ncol(matrix))

Value

a vector (same length as nrow(matrix))

euclidean distance between rows of a matrix and a vector (arma mode)

Description

euclidean distance between rows of a matrix and a vector (arma mode)

Usage

calcEuclideanDistance3(y, x)

Arguments

y

a matrix

x

a vector (same length as ncol(matrix))

Value

a vector (same length as nrow(matrix))

Calculate the generalized membership matrix

Description

Calculate the generalized membership matrix according to a set of centroids, the observed data, the fuzziness degree, and a beta parameter

Usage

calcFGCMBelongMatrix(centers, data, m, beta, sigmas)

Arguments

centers

A matrix representing the centers of the clusters with p columns and k rows

data

A matrix representing the observed data with n rows and p columns

m

A float representing the fuzziness degree

beta

A float for the beta parameter (control speed convergence and classification crispness)

sigmas

A numeric vector for calculating the robust version of the FCM. Filled with ones if the classical version is required

Value

A n * k matrix representing the belonging probabilities of each observation to each cluster

Calculate the generalized membership matrix with a noise cluster

Description

Calculate the generalized membership matrix according to a set of centroids, the observed data, the fuzziness degree, and a beta parameter

Usage

calcFGCMBelongMatrixNoisy(centers, data, m, beta, delta, sigmas)

Arguments

centers

A matrix representing the centers of the clusters with p columns and k rows

data

A matrix representing the observed data with n rows and p columns

m

A float representing the fuzziness degree

beta

A float for the beta parameter (control speed convergence and classification crispness)

delta

A float, the value set for delta by the user

sigmas

A numeric vector for calculating the robust version of the FCM. Filled with ones if the classical version is required

Value

A n * k matrix representing the belonging probabilities of each observation to each cluster

Fukuyama and Sugeno index

Description

Calculate Fukuyama and Sugeno index of clustering quality

Usage

calcFukuyamaSugeno(data, belongmatrix, centers, m)

Arguments

data

The original dataframe used for the clustering (n*p)

belongmatrix

A membership matrix (n*k)

centers

The centres of the clusters

m

The fuzziness parameter

Details

The Fukuyama and Sugeno index (Fukuyama 1989) is the difference between the compacity of clusters and the separation of clusters. A smaller value indicates a better clustering. The formula is:

S(c)=\sum_{k=1}^{n} \sum_{i=1}^{c}\left(U_{i k}\right)^{m}\left(\left\|x_{k}-v_{i}\right\|^{2}-\left\|v_{i}-\bar{x}\right\|^{2}\right) 2

with n the number of observations, k the number of clusters and \bar{x} the mean of the dataset.

Value

A float: the Fukuyama and Sugeno index

References

Fukuyama Y (1989). “A new method of choosing the number of clusters for the fuzzy c-mean method.” In Proc. 5th Fuzzy Syst. Symp., 1989, 247–250.

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
calcFukuyamaSugeno(result$Data,result$Belongings, result$Centers, 1.5)

calculate ELSA statistic for a fuzzy partition

Description

Calculate ELSA statistic for a fuzzy partition. This local indicator of spatial autocorrelation can be used to identify areas where close observations tend to belong to different clusters.

Usage

calcFuzzyELSA(object, nblistw = NULL, window = NULL, matdist = NULL)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a membership matrix. Each row of this matrix must sum up to 1. Can also be a list of rasters, in which case each raster must represent the membership values for one cluster and the sum of all the rasters must be a raster filled with ones.

nblistw

A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input.

window

A binary (0,1) matrix representing the neighbours spatial weights when working with rasters. The matrix must have odd dimensions.

matdist

A matrix representing the dissimilarity between the clusters. The matrix must be squared and the diagonal must be filled with zeros.

Details

The fuzzy ELSA index is a generalization of the ELSA index (Naimi et al. 2019). It can be used to measure local autocorrelation for a membership matrix. It varies between 0 and 1, 0 indicating a perfect positive spatial autocorrelation and 1 a perfect heterogeneity. It is based on the Shannon entropy index, and uses a measure of dissimilarity between categories.

Value

either a vector or a raster with the ELSA values.

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
elsa_valus <- calcFuzzyELSA(result)

Local Fuzzy ELSA statistic for raster

Description

Calculate the Local Fuzzy ELSA statistic for a numeric raster

Usage

calcFuzzyElsa_raster(rasters, window, matdist)

Arguments

rasters

A List of SpatRaster or a List of matrices, or an array

window

A binary (0,1) matrix representing the neighbours spatial weights when working with rasters. The matrix must have odd dimensions.

matdist

A matrix representing the dissimilarity between the clusters. The matrix must be squared and the diagonal must be filled with zeros.

Value

A raster or a matrix (depending on the input): the values of local fuzzy ELSA statistic

Examples

# this is an internal function, no example provided

Generalized Dunn’s index (43)

Description

Calculate the Generalized Dunn’s index (v43) of clustering quality.

Usage

calcGD43(data, belongmatrix, centers)

Arguments

data

The original dataframe used for the clustering (n*p)

belongmatrix

A membership matrix (n*k)

centers

The centres of the clusters

Details

The Generalized Dunn’s index (Da Silva et al. 2020) is a ratio of the worst pair-wise separation of clusters and the worst compactness of clusters. A higher value indicates a better clustering. The formula is:

GD_{r s}=\frac{\min_{i \neq j}\left[\delta_{r}\left(\omega_{i}, \omega_{j}\right)\right]}{\max_{k}\left[\Delta_{s}\left(\omega_{k}\right)\right]}

The numerator is a measure of the minimal separation between all the clusters i and j given by the formula:

\delta_{r}\left(\omega_{i}, \omega_{j}\right)=\left\|\boldsymbol{c}_{i}-\boldsymbol{c}_{j}\right\|

which is basically the Euclidean distance between the centres of clusters c_{i} and c_{j}

The denominator is a measure of the maximal dispersion of all clusters, given by the formula:

\frac{2*\sum_{l=1}^{n}\left\|\boldsymbol{x}_{l}-\boldsymbol{c_{i}}\right\|^{\frac{1}{2}}}{\sum{u_{i}}}

Value

A float: the Generalized Dunn’s index (43)

References

Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
calcGD43(result$Data, result$Belongings, result$Centers)

Generalized Dunn’s index (53)

Description

Calculate the Generalized Dunn’s index (v53) of clustering quality.

Usage

calcGD53(data, belongmatrix, centers)

Arguments

data

The original dataframe used for the clustering (n*p)

belongmatrix

A membership matrix (n*k)

centers

The centres of the clusters

Details

GD_{r s}=\frac{\min_{i \neq j}\left[\delta_{r}\left(\omega_{i}, \omega_{j}\right)\right]}{\max_{k}\left[\Delta_{s}\left(\omega_{k}\right)\right]}

The numerator is a measure of the minimal separation between all the clusters i and j given by the formula:

\delta_{r}\left(\omega_{i}, \omega_{j}\right)=\frac{\sum_{l=1}^{n}\left\|\boldsymbol{x_{l}}-\boldsymbol{c_{i}}\right\|^{\frac{1}{2}} . u_{il}+\sum_{l=1}^{n}\left\|\boldsymbol{x_{l}}-\boldsymbol{c_{j}}\right\|^{\frac{1}{2}} . u_{jl}}{\sum{u_{i}} + \sum{u_{j}}}

where u is the membership matrix and u_{i} is the column of u describing the membership of the n observations to cluster i. c_{i} is the center of the cluster i.

The denominator is a measure of the maximal dispersion of all clusters, given by the formula:

\frac{2*\sum_{l=1}^{n}\left\|\boldsymbol{x}_{l}-\boldsymbol{c_{i}}\right\|^{\frac{1}{2}}}{\sum{u_{i}}}

Value

A float: the Generalized Dunn’s index (53)

References

Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
calcGD53(result$Data, result$Belongings, result$Centers)

Lagged Data

Description

Calculate Wx, the spatially lagged version of x, by a neighbouring matrix W.

Usage

calcLaggedData(x, nblistw, method = "mean")

Arguments

x

A dataframe with only numeric columns

nblistw

The listw object (spdep like) used to calculate WY

method

A string indicating if a classical lag must be used ("mean") or if a weighted median must be used ("median")

Value

A lagged version of x

Examples

#This is an internal function, no example provided

Negentropy Increment index

Description

Calculate the Negentropy Increment index of clustering quality.

Usage

calcNegentropyI(data, belongmatrix, centers)

Arguments

data

The original dataframe used for the clustering (n*p)

belongmatrix

A membership matrix (n*k)

centers

The centres of the clusters

Details

The Negentropy Increment index (Da Silva et al. 2020) is based on the assumption that a normally shaped cluster is more desirable. It uses the difference between the average negentropy of all the clusters in the partition, and that of the whole partition. A smaller value indicates a better partition. The formula is:

NI=\frac{1}{2} \sum_{j=1}^{k} p_{i} \ln \left|{\boldsymbol{\Sigma}}_{j}\right|-\frac{1}{2} \ln \left|\boldsymbol{\Sigma}_{d a t a}\right|-\sum_{j=1}^{k} p_{j} \ln p_{j}

with a cluster, |.| the determinant of a matrix,

j a cluster
|.| the determinant of a matrix
\left|{\boldsymbol{\Sigma}}_{j}\right| the covariance matrix of the dataset weighted by the membership values to cluster j
\left|\boldsymbol{\Sigma}_{d a t a}\right| the covariance matrix of the dataset
p_{j} the sum of the membership values to cluster j divided by the number of observations.

Value

A float: the Negentropy Increment index

References

Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
calcNegentropyI(result$Data, result$Belongings, result$Centers)

calculate the quality index required

Description

A selector function to get the right quality index

Usage

calcQualIdx(name, ...)

Arguments

name

The name of the index to calculate

...

The parameters needed to calculate the index

Value

A float: the value of the index

Examples

# this is an internal function, no example provided

Calculate sigmas for the robust version of the c-means algorithm

Description

Calculate sigmas for the robust version of the c-means algorithm

Arguments

data

A Numeric matrix representing the observed data with n rows and p columns

belongmatrix

A n X k matrix giving for each observation n, its probability to belong to the cluster k

centers

A c X k matrix giving for each cluster c, its center in k dimensions

m

A float representing the fuzziness degree

Value

A vector with the sigmas for each cluster

Calculate the membership matrix (spatial version)

Description

Calculate the membership matrix (spatial version) according to a set of centroids, the observed data, the fuzziness degree a neighbouring matrix and a spatial weighting term

Usage

calcSFCMBelongMatrix(centers, data, wdata, m, alpha, sigmas, wsigmas)

Arguments

centers

A matrix or a dataframe representing the centers of the clusters with p columns and k rows

data

A matrix representing the observed data with n rows and p columns

wdata

A matrix representing the lagged observed data with n rows and p columns

m

A float representing the fuzziness degree

alpha

A float representing the weight of the space in the analysis (0 is a typical fuzzy-c-mean algorithm, 1 is balanced between the two dimensions, 2 is twice the weight for space)

sigmas

A numeric vector for calculating the robust version of the FCM. Filled with ones if the classical version is required

wsigmas

Same as sigmas, but calculated on the spatially lagged dataset

Value

A n * k matrix representing the belonging probabilities of each observation to each cluster

Calculate the membership matrix (spatial version) with a noise cluster

Description

Calculate the membership matrix (spatial version) according to a set of centroids, the observed data, the fuzziness degree a neighbouring matrix and a spatial weighting term

Usage

calcSFCMBelongMatrixNoisy(
  centers,
  data,
  wdata,
  m,
  alpha,
  delta,
  sigmas,
  wsigmas
)

Arguments

centers

A matrix or a dataframe representing the centers of the clusters with p columns and k rows

data

A matrix representing the observed data with n rows and p columns

wdata

A matrix representing the lagged observed data with n rows and p columns

m

A float representing the fuzziness degree

alpha

A float representing the weight of the space in the analysis (0 is a typical fuzzy-c-mean algorithm, 1 is balanced between the two dimensions, 2 is twice the weight for space)

delta

A float, the value set for delta by the user

sigmas

A numeric vector for calculating the robust version of the FCM. Filled with ones if the classical version is required

wsigmas

Same as sigmas, but calculated on the spatially lagged dataset

Value

A n * k matrix representing the belonging probabilities of each observation to each cluster

Calculate the generalized membership matrix (spatial version)

Description

Calculate the generalized membership matrix (spatial version)

Usage

calcSFGCMBelongMatrix(centers, data, wdata, m, alpha, beta, sigmas, wsigmas)

Arguments

centers

A matrix representing the centers of the clusters with p columns and k rows

data

A matrix representing the observed data with n rows and p columns

wdata

A matrix representing the lagged observed data with n rows and p columns

m

A float representing the fuzziness degree

alpha

A float representing the weight of the space in the analysis (0 is a typical fuzzy-c-mean algorithm, 1 is balanced between the two dimensions, 2 is twice the weight for space)

beta

A float for the beta parameter (control speed convergence and classification crispness)

sigmas

A numeric vector for calculating the robust version of the FCM. Filled with ones if the classical version is required

wsigmas

Same as sigmas, but calculated on the spatially lagged dataset

Value

A n * k matrix representing the belonging probabilities of each observation to each cluster

Calculate the generalized membership matrix (spatial version) with a noise cluster

Description

Calculate the generalized membership matrix (spatial version) with a noise cluster

Usage

calcSFGCMBelongMatrixNoisy(
  centers,
  data,
  wdata,
  m,
  alpha,
  beta,
  delta,
  sigmas,
  wsigmas
)

Arguments

centers

A matrix representing the centers of the clusters with p columns and k rows

data

A matrix representing the observed data with n rows and p columns

wdata

A matrix representing the lagged observed data with n rows and p columns

m

A float representing the fuzziness degree

alpha

A float representing the weight of the space in the analysis (0 is a typical fuzzy-c-mean algorithm, 1 is balanced between the two dimensions, 2 is twice the weight for space)

beta

A float for the beta parameter (control speed convergence and classification crispness)

delta

A float, the value set for delta by the user

sigmas

A numeric vector for calculating the robust version of the FCM. Filled with ones if the classical version is required

wsigmas

Same as sigmas, but calculated on the spatially lagged dataset

Value

A n * k matrix representing the belonging probabilities of each observation to each cluster

Calculate the centroids of SFCM

Description

Calculate the new centroids of the clusters based on the membership matrix for SFCM

Usage

calcSWFCCentroids(data, wdata, belongmatrix, m, alpha)

Arguments

data

A matrix representing the observed data with n rows and p columns

wdata

A matrix representing the lagged observed data with nrows and p columns

belongmatrix

A n X k matrix giving for each observation n, its probability to belong to the cluster k

m

An integer representing the fuzziness degree

alpha

A float representing the weight of the space in the analysis (0 is a typical fuzzy-c-mean algorithm, 1 is balanced between the two dimensions, 2 is twice the weight for space)

Value

A n X k matrix representing the belonging probabilities of each observation to each cluster

Fuzzy Silhouette index

Description

Calculate the Silhouette index of clustering quality.

Usage

calcSilhouetteIdx(data, belongings)

Arguments

data

The original dataframe used for the clustering (n*p)

belongings

A membership matrix (n*k)

Details

The index is calculated with the function SIL.F from the package fclust. When the dataset is too large, an approach by subsampling is used to avoid crash.

Value

A float, the fuzzy Silhouette index

Diversity index

Description

Calculate the diversity (or entropy) index.

Usage

calcUncertaintyIndex(belongmatrix)

Arguments

belongmatrix

A membership matrix

Details

The diversity (or entropy) index (Theil 1972) is calculated for each observation an varies between 0 and 1. When the value is close to 0, the observation belong to only one cluster (as in hard clustering). When the value is close to 1, the observation is undecided and tends to belong to each cluster. Values above 0.9 should be investigated. The formula is:

H2_{i} = \frac{-\sum[u_{ij}\ln(u_{ij})]}{\ln(k)}

with i and observation, j a cluster, k the number of clusters and u the membership matrix.

It is a simplified formula because the sum of each row of a membership matrix is 1.

Value

A vector with the values of the diversity (entropy) index

References

Theil H (1972). Statistical decomposition analysis; with applications in the social and administrative sciences. North-Holland.

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
calcUncertaintyIndex(result$Belongings)

Calculate lagged values for a raster dataset

Description

Calculate lagged values for a raster dataset given a window and an agregation function

Usage

calcWdataRaster(w, dataset, fun, missing_pxl)

Arguments

w

A matrix

dataset

A list of rasters

fun

A string giving the name of a function or a function or "nl" for non-local method

missing_pxl

A boolean vector of missing (FALSE) pixels

Examples

# this is an internal function, no example provided

Jaccard similarity coefficient

Description

Calculate the Jaccard similarity coefficient

Usage

calc_jaccard_idx(x, y)

Arguments

x

A vector of positive reals

y

A vector of positive reals

Value

A double: the Jaccard similarity coefficient

Jaccard similarity coefficient between columns of two matrices

Description

Calculate the Jaccard similarity coefficient between the columns of two matrices

Usage

calc_jaccard_mat(matX, matY)

Arguments

matX

A matrix

matY

A matrix

Value

A matrix with the Jaccard index values

Local Moran I for raster

Description

Calculate the Local Moran I for a numeric raster

Usage

calc_local_moran_raster(rast, window)

Arguments

rast

A SpatRaster or a matrix

window

The window defining the neighbour weights

Value

A SpatRaster or a matrix depending on the input with the local Moran I values

Examples

Arcachon <- terra::rast(system.file("extdata/Littoral4_2154.tif", package = "geocmeans"))
names(Arcachon) <- c("blue", "green", "red", "infrared", "SWIR1", "SWIR2")
rast <- Arcachon[[1]]
w <- matrix(1, nrow = 3, ncol = 3)
calc_local_moran_raster(rast, w)

Global Moran I for raster

Description

Calculate the global Moran I for a numeric raster

Usage

calc_moran_raster(rast, window)

Arguments

rast

A SpatRaster or a matrix

window

The window defining the neighbour weights

Value

A float: the global Moran I

Examples

Arcachon <- terra::rast(system.file("extdata/Littoral4_2154.tif", package = "geocmeans"))
names(Arcachon) <- c("blue", "green", "red", "infrared", "SWIR1", "SWIR2")
rast <- Arcachon[[1]]
w <- matrix(1, nrow = 3, ncol = 3)
calc_moran_raster(rast, w)

calculate spatial inconsistency for raster

Description

Calculate the spatial inconsistency sum for a set of rasters

Usage

calc_raster_spinconsistency(
  matrices,
  window,
  adj = FALSE,
  dataset = NULL,
  mindist = 1e-11
)

Arguments

matrices

A list of matrices

window

The window to use to define spatial neighbouring

adj

A boolean indicating if the adjusted version of the algorithm must be calculated

dataset

A list of matrices with the original data (if adj = TRUE)

mindist

When adj is true, a minimum value for distance between two observations. If two neighbours have exactly the same values, then the euclidean distance between them is 0, leading to an infinite spatial weight. In that case, the minimum distance is used instead of 0.

Value

A float: the sum of spatial inconsistency

Examples

# this is an internal function, no example provided

Explained inertia index

Description

Calculate the explained inertia by a classification

Usage

calcexplainedInertia(data, belongmatrix)

Arguments

data

The original dataframe used for the classification (n*p)

belongmatrix

A membership matrix (n*k)

Value

A float: the percentage of the total inertia explained

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
calcexplainedInertia(result$Data,result$Belongings)

Quality indexes

Description

calculate several clustering quality indexes (some of them come from fclust package)

Usage

calcqualityIndexes(
  data,
  belongmatrix,
  m,
  indices = c("Silhouette.index", "Partition.entropy", "Partition.coeff",
    "XieBeni.index", "FukuyamaSugeno.index", "Explained.inertia")
)

Arguments

data

The original dataframe used for the classification (n*p)

belongmatrix

A membership matrix (n*k)

m

The fuzziness parameter used for the classification

indices

A character vector with the names of the indices to calculate, default is : c("Silhouette.index", "Partition.entropy", "Partition.coeff", "XieBeni.index", "FukuyamaSugeno.index", "Explained.inertia"). Other available indices are : "DaviesBoulin.index", "CalinskiHarabasz.index", "GD43.index", "GD53.index" and "Negentropy.index"

Value

A named list with with the values of the required indices

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
calcqualityIndexes(result$Data,result$Belongings, m=1.5)

Convert categories to membership matrix

Description

Function to convert a character vector to a membership matrix (binary matrix). The columns of the matrix are ordered with the order function.

Usage

cat_to_belongings(categories)

catToBelongings(categories)

Arguments

categories

A vector with the categories of each observation

Value

A binary matrix

center matrix calculator for FCM algorithm

Description

center matrix calculator for FCM algorithm

Usage

centersFCM(data, centers, belongmatrix, dots)

Arguments

data

a matrix (the dataset used for clustering)

centers

a matrix (the centers of the clusters)

belongmatrix

a matrix with the membership values

dots

a list of other arguments specific to FCM

Value

a matrix with the new centers

center matrix calculator for GFCM algorithm

Description

center matrix calculator for GFCM algorithm

Usage

centersGFCM(data, centers, belongmatrix, dots)

Arguments

data

a matrix (the dataset used for clustering)

centers

a matrix (the centers of the clusters)

belongmatrix

a matrix with the membership values

dots

a list of other arguments specific to FCM

Value

a matrix with the new centers

center matrix calculator for SFCM algorithm

Description

center matrix calculator for SFCM algorithm

Usage

centersSFCM(data, centers, belongmatrix, dots)

Arguments

data

a matrix (the dataset used for clustering)

centers

a matrix (the centers of the clusters)

belongmatrix

a matrix with the membership values

dots

a list of other arguments specific to FCM

Value

a matrix with the new centers

center matrix calculator for SGFCM algorithm

Description

center matrix calculator for SGFCM algorithm

Usage

centersSGFCM(data, centers, belongmatrix, dots)

Arguments

data

a matrix (the dataset used for clustering)

centers

a matrix (the centers of the clusters)

belongmatrix

a matrix with the membership values

dots

a list of other arguments specific to FCM

Value

a matrix with the new centers

Check validity of a dissimilarity matrix

Description

Check the validity of a dissimilarity matrix

Usage

check_matdist(matdist)

Arguments

matdist

A dissimilarity matrix

Examples

# this is an internal function, no example provided

Check dimensions of a list of rasters

Description

Check if all the rasters in a list have the same dimensions

Usage

check_raters_dims(rasters)

Arguments

rasters

A list of rasters

Examples

# this is an internal function, no example provided

Check the shape of a window

Description

Check is a window is squarred and have odd dimensions

Usage

check_window(w)

Arguments

w

A matrix

Examples

# this is an internal function, no example provided

Circular window

Description

Create a matrix that can be used as a window when working with rasters. It uses a radius to set to 0 the weights of pixels that are farther than this distance. This is helpful to create circular focals.

Usage

circular_window(radius, res)

Arguments

radius

The size in metres of the radius of the circular focal

res

The width in metres of a pixel. It is assumed that pixels are squares.

Details

The original function comes from here: https://scrogster.wordpress.com/2012/10/05/applying-a-circular-moving-window-filter-to-raster-data-in-r/ but we reworked it to make it faster and to ensure that the result is a matrix with odd dimensions.

Value

A binary weight matrix

Examples

# wide of 100 metres for pixels of 2 metres
window <- circular_window(100, 2)
# row standardisation
window_row_std <- window / sum(window)

element wise division of two matrices by column

Description

element wise division of two matrices by column

Usage

div_matrices_bycol(x, y)

Arguments

x

a matrix

y

a matrix with the same dimensions

Value

a matrix

Local Fuzzy ELSA statistic for vector

Description

Calculate the Local Fuzzy ELSA statistic using a nblistw object

Usage

elsa_fuzzy_vector(memberships, nblistw, matdist)

Arguments

memberships

A membership matrix

nblistw

The spatial weight matrix (nblistw object from spdep)

matdist

A matrix representing the dissimilarity between the clusters. The matrix must be squared and the diagonal must be filled with zeros.

Value

A vector of local ELSA values

Examples

# this is an internal function, no example provided

calculate ELSA spatial statistic for raster dataset

Description

calculate ELSA spatial statistic for vector dataset

Usage

elsa_raster(rast, window, matdist)

Arguments

rast

An integer raster or matrix representing the m categories (0,1,2,..., m)

window

A binary (0,1) matrix representing the neighbours spatial weights when working with rasters. The matrix must have odd dimensions.

matdist

A matrix representing the dissimilarity between the clusters. The matrix must be squared and the diagonal must be filled with zeros.

Value

A raster or a matrix: the local values of ELSA

Examples

# this is an internal function, no example provided

calculate ELSA spatial statistic for vector dataset

Description

calculate ELSA spatial statistic for vector dataset

Usage

elsa_vector(categories, nblistw, dist)

Arguments

categories

An integer vector representing the m categories (1,2,3,..., m), -1 is used to indicate missing values.

nblistw

A listw object from spdep representing neighbour relations

dist

A numeric matrix (m*m) representing the distances between categories

Value

A vector: the local values of ELSA

Examples

# this is an internal function, no example provided

Worker function

Description

Worker function for select_parameters and select_parameters.mc

Usage

eval_parameters(
  algo,
  parameters,
  data,
  nblistw = NULL,
  window = NULL,
  standardize = TRUE,
  robust = FALSE,
  noise_cluster = FALSE,
  spconsist = FALSE,
  classidx = TRUE,
  nrep = 30,
  indices = NULL,
  tol,
  maxiter,
  seed = NULL,
  init = "random",
  verbose = TRUE,
  wrapped = FALSE
)

Arguments

algo

A string indicating which method to use (FCM, GFCM, SFCM, SGFCM)

parameters

A dataframe of parameters with columns k,m and alpha

data

A dataframe with numeric columns

nblistw

A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input.

window

standardize

A boolean to specify if the variable must be centered and reduce (default = True)

spconsist

A boolean indicating if the spatial consistency must be calculated

classidx

A boolean indicating if the quality of classification indices must be calculated

nrep

An integer indicating the number of permutation to do to simulate the random distribution of the spatial inconsistency. Only used if spconsist is TRUE.

indices

A character vector with the names of the indices to calculate, to evaluate clustering quality. default is :c("Silhouette.index", "Partition.entropy", "Partition.coeff", "XieBeni.index", "FukuyamaSugeno.index", "Explained.inertia"). Other available indices are : "DaviesBoulin.index", "CalinskiHarabasz.index", "GD43.index", "GD53.index" and "Negentropy.index".

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

maxiter

An integer for the maximum number of iteration

seed

An integer used for random number generation. It ensures that the start centers will be the same if the same integer is selected.

init

A string indicating how the initial centers must be selected. "random" indicates that random observations are used as centers. "kpp" use a distance based method resulting in more dispersed centers at the beginning. Both of them are heuristic.

verbose

A boolean indicating if a progressbar should be displayed

wrapped

A boolean indicating if the data passed is wrapped or not (see wrap function of terra)

Value

a DataFrame containing for each combinations of parameters several clustering quality indexes.

Examples

#No example provided, this is an internal function

Matrix evaluation

Description

Evaluate if the algorithm converged by comparing two successive membership matrices. Calculate the absolute difference between the matrices and then calculate the max of each row. If all the values of the final vector are below the fixed tolerance, then return True, else return False

Usage

evaluateMatrices(mat1, mat2, tol)

Arguments

mat1

A n X k matrix giving for each observation n, its probability to belong to the cluster k at iteration i

mat2

A n X k matrix giving for each observation n, its probability to belong to the cluster k at iteration i+1

tol

A float representing the algorithm tolerance

Value

A boolean, TRUE if the test is passed, FALSE otherwise

Examples

#This is an internal function, no example provided

focal mean weighted by inverse of euclidean distance on a cube

Description

focal mean weighted by inverse of euclidean distance on a cube

Usage

focal_adj_mean_arr_window(mat, window)

Arguments

mat

an array (cube)

window

a numeric matrix (squared)

Value

a lagged version of the original cube

focal euclidean distance on a list of matrices

Description

focal euclidean distance on a list of matrices

Usage

focal_euclidean_list(matrices, window)

Arguments

matrices

a List of matrices with the same dimensions

window

a numeric matrix

Value

a matrix with the euclidean distance of each cell to its neighbours.

focal euclidean distance on a matrix with a given window for a cube

Description

focal euclidean distance on a matrix with a given window for a cube

Usage

focal_euclidean_arr_window(mat, window)

Arguments

mat

an array (cube)

window

a numeric matrix (squared)

Value

a matrix with the euclidean distance of each cell to its neighbours.

focal euclidean distance on a matrix with a given window

Description

focal euclidean distance on a matrix with a given window

Usage

focal_euclidean_mat_window(mat, window)

Arguments

mat

a matrix

window

a numeric matrix (squared)

Value

a matrix with the euclidean distance of each cell to its neighbours.

geocmeans: A package implementing methods for spatially constrained c-means algorithm

Description

The geocmeans package implements a modified c-means algorithm more suited to work with spatial data (characterized by spatial autocorrelation). The spatial information is introduced with a spatial weight matrix W (n * n) where wij indicate the strength of the spatial relationship between the observations i and j. It is recommended to use a matrix standardized by row (so that the sum of each row is 1). More specifically, the spatial c-means combine the euclidean distance of each observation in the data matrix X to each center with the euclidean distance of the lagged version of X by W (WX). A parameter alpha controls for the weight of the lagged matrix. If alpha = 0, then the spatial c-means is equal to a classical c-means. If alpha = 1, then the weights given to X and WX are equals. If alpha = 2, then the weight of WX is twice the one of X and so on. Several indices are provided to assess the quality of a classification on the semantic and spatial dimensions. To explore results, a shiny app is also available

geocmeans general environment

Description

An environment used by geocmeans to store data, functions and values

Usage

geocmeans_env

Format

An object of class environment of length 0.

Match the groups obtained from two classifications

Description

Match the groups obtained from two classifications based on the Jaccard index calculated on the membership matrices.

Usage

groups_matching(object.x, object.y)

Arguments

object.x

A FCMres object, or a simple membership matrix. It is used as the reference for the ordering of the groups

object.y

A FCMres object, or a simple membership matrix. The order of its groups will be updated to match with the groups of object.x

Details

We can not expect to obtain the groups in the same order in each run of a classification algorithm. This function can be used match the clusters of a first classification with the most similar clusters in a second classification. Thus it might be easier to compare the results of two algorithms or two runs of the same algorithm.

Value

The FCMres object or the membership matrix provided for the parameter object.y with the order of the groups updated.

Examples

data(LyonIris)

#selecting the columns for the analysis
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14",
                   "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed")

#rescaling the columns
Data <- sf::st_drop_geometry(LyonIris[AnalysisFields])
for (Col in names(Data)){
  Data[[Col]] <- as.numeric(scale(Data[[Col]]))
}

Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456, tol = 0.00001, verbose = FALSE)
Cmean2 <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 789, tol = 0.00001, verbose = FALSE)
ordered_Cmean2 <- groups_matching(Cmean,Cmean2)

Raster data preparation

Description

Prepare a raster dataset

Usage

input_raster_data(dataset, w = NULL, fun = sum, standardize = TRUE)

Arguments

dataset

A list of rasters

w

The window to use in the focal function

fun

the function to use as the focal function

standardize

A boolean to specify if the variable must be centered and reduced (default = True)

Value

A list with the required elements to perform clustering

Examples

# this is an internal function, no example provided

is method for FCMres

Description

Check if an object can be considered as a FCMres object

Usage

## S3 method for class 'FCMres'
is(object, class2 = "FCMres")

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans

class2

Character string giving the names of the classe to test (usually "FCMres")

Value

A boolean, TRUE if x can be considered as a FCMres object, FALSE otherwise group

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
is(result, "FCMres")

kpp centers selection

Description

Select the initial centers of centroids by using the k++ approach as suggested in this article: http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf

Usage

kppCenters(data, k)

Arguments

data

The dataset used in the classification

k

The number of groups for the classification

Value

a DataFrame, each row is the center of a cluster

Examples

#This is an internal function, no example provided

Local Moran I calculated on a matrix with a given window

Description

Local Moran I calculated on a matrix with a given window

Usage

local_moranI_matrix_window(mat, window)

Arguments

mat

a matrix

window

the window to use to define neighbours. 0 can be used to indicate that a cell is not a neighbour

Value

a double, the value of Moran I

Main worker function

Description

Execution of the classification algorithm

Usage

main_worker(algo, ...)

Arguments

algo

A string indicating the algorithm to use (one of FCM, GFCM, SGFCM)

...

all the required arguments for the algorithm to use

Value

A named list with

Centers: a dataframe describing the final centers of the groups
Belongings: the final membership matrix
Groups: a vector with the names of the most likely group for each observation
Data: the dataset used to perform the clustering (might be standardized)

Examples

#This is an internal function, no example provided

Mapping the clusters

Description

Build some maps to visualize the results of the clustering

Usage

mapClusters(geodata = NULL, object, undecided = NULL)

Arguments

geodata

An object of class features collection from sf / ordered like the original data used for the clustering. Can be Null if object is a FCMres and has been created with rasters.

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix.

undecided

A float between 0 and 1 giving the minimum value that an observation must get in the membership matrix to not be considered as uncertain (default = NULL)

Value

A named list with :

ProbaMaps : a list of tmap maps showing for each group the probability of the observations to belong to that group
ClusterMap : a tmap map showing the most likely group for observation

Examples

## Not run: 
data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
MyMaps <- mapClusters(LyonIris, result$Belongings)

## End(Not run)

Mapping the clusters (rasters)

Description

Internal function to realize maps based on rasters

Usage

mapRasters(object, undecided)

Arguments

object

A FCMres object

undecided

A float between 0 and 1 giving the minimum value that an observation must get in the membership matrix to not be considered as uncertain (default = NULL)

Value

A named list with :

ProbaMaps : a list of ggplot maps showing for each group the probability of the observations to belong to that group
ClusterMap : a ggplot map showing the most likely group for each observation

Examples

#No example provided, this is an internal function, use the general wrapper function mapClusters

Mapping the clusters

Description

Internal function to realize maps

Usage

mapThis(geodata, belongmatrix, undecided = NULL, geom_type = "polygon")

Arguments

geodata

feature collections ordered like the original data used for the clustering

belongmatrix

The membership matrix obtained at the end of the algorithm

undecided

A float between 0 and 1 giving the minimum value that an observation must get in the membership matrix to not be considered as uncertain (default = NULL)

geom_type

A string indicating the type of geometry (polygon, string or point)

Value

A named list with :

ProbaMaps : a list of ggplot maps showing for each group the probability of the observations to belong to that group
ClusterMap : a ggplot map showing the most likely group for each observation

Examples

#No example provided, this is an internal function, use the general wrapper function mapClusters

maximum in a matrix

Description

maximum in a matrix

Usage

max_mat(x)

Arguments

x

a matrix

Value

a double

Moran I calculated on a matrix with a given window

Description

Moran I calculated on a matrix with a given window

Usage

moranI_matrix_window(mat, window)

Arguments

mat

a matrix

window

the window to use to define neighbours. 0 can be used to indicate that a cell is not a neighbour

Value

a double, the value of Moran I

Raster result transformation

Description

Adapt the results if a raster is used

Usage

output_raster_data(object, missing, rst)

Arguments

object

A FCMres object

missing

A boolean indicating which pixels have no missing values

rst

A raster object used as template to structure the results

Value

A FCMres object with isRaster = TRUE

Examples

# this is an internal function, no example provided

Plot method for FCMres object

Description

Method to plot the results of a FCM.res object

Usage

## S3 method for class 'FCMres'
plot(x, type = "spider", ...)

Arguments

x

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix.

type

A string indicating the type of plot to show. Can be one of "bar", "violin", or "spider". Default is spider.

...

not used

Details

This S3 method is a simple dispatcher for the functions barPlots, violinPlots and spiderPlots. To be able to use all their specific parameters, one can use them directly.

Value

a ggplot2 object, a list, or NULL, depending on the type of plot requested

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")

# rescaling all the variables used in the analysis
for (field in AnalysisFields) {
    LyonIris[[field]] <- scale(LyonIris[[field]])
}

# doing the initial clustering
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = FALSE)

plot(result, type = "spider")

element wise power of a matrix by column

Description

element wise power of a matrix by column

Usage

pow_matrix_bycol(x, p)

Arguments

x

a matrix

p

the exponent

Value

a matrix

power of a matrix

Description

power of a matrix

Usage

power_mat(x, p)

Arguments

x

a matrix

p

a float

Value

x ** p

Predict method for FCMres object

Description

Function to predict the membership matrix of a new set of observations

Usage

## S3 method for class 'FCMres'
predict(
  object,
  new_data,
  nblistw = NULL,
  window = NULL,
  standardize = TRUE,
  ...
)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix.

new_data

A DataFrame with the new observations

nblistw

A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input.

window

standardize

A boolean to specify if the variable must be centred and reduced (default = True)

...

not used

Value

A numeric matrix with the membership values for each new observation

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")

# rescaling all the variables used in the analysis
for (field in AnalysisFields) {
    LyonIris[[field]] <- scale(LyonIris[[field]])
}

# doing the initial clustering
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = FALSE)

# using a subset of the original dataframe as "new data"
new_data <- LyonIris[c(1, 27, 36, 44, 73),]
new_dataset <- sf::st_drop_geometry(new_data[AnalysisFields])
new_nb <- spdep::poly2nb(new_data,queen=TRUE)
new_Wqueen <- spdep::nb2listw(new_nb,style="W")

# doing the prediction
predictions <- predict(result, new_dataset, new_Wqueen, standardize = FALSE)

Predict matrix membership for new observations

Description

Function to predict the membership matrix of a new set of observations

Usage

predict_membership(
  object,
  new_data,
  nblistw = NULL,
  window = NULL,
  standardize = TRUE,
  ...
)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix.

new_data

A DataFrame with the new observations or a list of rasters if object$isRaster is TRUE

nblistw

A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input.

window

standardize

A boolean to specify if the variable must be centered and reduced (default = True)

...

not used

Value

A numeric matrix with the membership values for each new observation. If rasters were used, return a list of rasters with the membership values.

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")

# rescaling all the variables used in the analysis
for (field in AnalysisFields) {
    LyonIris[[field]] <- scale(LyonIris[[field]])
}

# doing the initial clustering
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SGFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, beta = 0.5, standardize = FALSE)

# using a subset of the original dataframe as "new data"
new_data <- LyonIris[c(1, 27, 36, 44, 73),]
new_dataset <- sf::st_drop_geometry(new_data[AnalysisFields])
new_nb <- spdep::poly2nb(new_data,queen=TRUE)
new_Wqueen <- spdep::nb2listw(new_nb,style="W")

# doing the prediction
predictions <- predict_membership(result, new_dataset, new_Wqueen, standardize = FALSE)

print method for FCMres

Description

print a FCMres object

Usage

## S3 method for class 'FCMres'
print(x, ...)

Arguments

x

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans

...

not used

Value

A boolean, TRUE if x can be considered as a FCMres object, FALSE otherwise group

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
result <- CMeans(dataset, k = 5, m = 1.5, standardize = TRUE)
print(result, "FCMres")

element wise product of two matrices by column

Description

element wise product of two matrices by column

Usage

prod_matrices_bycol(x, y)

Arguments

x

a matrix

y

a matrix with the same dimensions

Value

a matrix

minimum of each row of a matrix

Description

minimum of each row of a matrix

Usage

rowmins_mat(x)

Arguments

x

a matrix

Value

a NumericVector

Parameter checking function

Description

Check that the provided parameters are valid

Usage

sanity_check(dots, data)

Arguments

dots

A list of parameters used

data

A numeric and complete dataframe

Value

A boolean, TRUE if all the tests are passed, FALSE otherwise

Examples

#This is an internal function, no example provided

Select parameters for a clustering algorithm

Description

Function to select the parameters for a clustering algorithm.

Usage

select_parameters(
  algo,
  data,
  k,
  m,
  alpha = NA,
  beta = NA,
  nblistw = NULL,
  lag_method = "mean",
  window = NULL,
  spconsist = TRUE,
  classidx = TRUE,
  nrep = 30,
  indices = NULL,
  standardize = TRUE,
  robust = FALSE,
  noise_cluster = FALSE,
  delta = NA,
  maxiter = 500,
  tol = 0.01,
  seed = NULL,
  init = "random",
  verbose = TRUE
)

selectParameters(
  algo,
  data,
  k,
  m,
  alpha = NA,
  beta = NA,
  nblistw = NULL,
  lag_method = "mean",
  window = NULL,
  spconsist = TRUE,
  classidx = TRUE,
  nrep = 30,
  indices = NULL,
  standardize = TRUE,
  robust = FALSE,
  noise_cluster = FALSE,
  delta = NA,
  maxiter = 500,
  tol = 0.01,
  seed = NULL,
  init = "random",
  verbose = TRUE
)

Arguments

algo

A string indicating which method to use (FCM, GFCM, SFCM, SGFCM)

data

A dataframe with numeric columns or a list of rasters.

k

A sequence of values for k to test (>=2)

m

A sequence of values for m to test

alpha

A sequence of values for alpha to test (NULL if not required)

beta

A sequence of values for beta to test (NULL if not required)

nblistw

A list of list.w objects describing the neighbours typically produced by the spdep package (NULL if not required)

lag_method

A string indicating if a classical lag must be used ("mean") or if a weighted median must be used ("median"). Both can be tested by specifying a vector : c("mean","median"). When working with rasters, the string must be parsable to a function like mean, min, max, sum, etc. and will be applied to all the pixels values in the window designated by the parameter window and weighted according to the values of this matrix.

window

A list of windows to use to calculate neighbouring values if rasters are used.

spconsist

A boolean indicating if the spatial consistency must be calculated

classidx

A boolean indicating if the quality of classification indices must be calculated

nrep

An integer indicating the number of permutation to do to simulate the random distribution of the spatial inconsistency. Only used if spconsist is TRUE.

indices

standardize

A boolean to specify if the variable must be centered and reduce (default = True)

robust

A boolean indicating if the "robust" version of the algorithm must be used (see details)

noise_cluster

A boolean indicatong if a noise cluster must be added to the solution (see details)

delta

A float giving the distance of the noise cluster to each observation

maxiter

An integer for the maximum number of iteration

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

seed

An integer used for random number generation. It ensures that the start centers will be the same if the same integer is selected.

init

verbose

A boolean indicating if a progressbar should be displayed

Value

A dataframe with indicators assessing the quality of classifications

Examples


data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
#set spconsist to TRUE to calculate the spatial consistency indicator
#FALSE here to reduce the time during package check
values <- select_parameters(algo = "SFCM", dataset, k = 5, m = seq(2,3,0.1),
    alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)


data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
#set spconsist to TRUE to calculate the spatial consistency indicator
#FALSE here to reduce the time during package check
values <- selectParameters(algo = "SFCM", dataset, k = 5, m = seq(2,3,0.1),
    alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)

Select parameters for clustering algorithm (multicore)

Description

Function to select the parameters for a clustering algorithm. This version of the function allows to use a plan defined with the package future to reduce calculation time.

Usage

select_parameters.mc(
  algo,
  data,
  k,
  m,
  alpha = NA,
  beta = NA,
  nblistw = NULL,
  lag_method = "mean",
  window = NULL,
  spconsist = TRUE,
  classidx = TRUE,
  nrep = 30,
  indices = NULL,
  standardize = TRUE,
  robust = FALSE,
  noise_cluster = FALSE,
  delta = NA,
  maxiter = 500,
  tol = 0.01,
  chunk_size = 5,
  seed = NULL,
  init = "random",
  verbose = TRUE
)

selectParameters.mc(
  algo,
  data,
  k,
  m,
  alpha = NA,
  beta = NA,
  nblistw = NULL,
  lag_method = "mean",
  window = NULL,
  spconsist = TRUE,
  classidx = TRUE,
  nrep = 30,
  indices = NULL,
  standardize = TRUE,
  robust = FALSE,
  noise_cluster = FALSE,
  delta = NA,
  maxiter = 500,
  tol = 0.01,
  chunk_size = 5,
  seed = NULL,
  init = "random",
  verbose = TRUE
)

Arguments

algo

A string indicating which method to use (FCM, GFCM, SFCM, SGFCM)

data

A dataframe with numeric columns

k

A sequence of values for k to test (>=2)

m

A sequence of values for m to test

alpha

A sequence of values for alpha to test (NULL if not required)

beta

A sequence of values for beta to test (NULL if not required)

nblistw

A list of list.w objects describing the neighbours typically produced by the spdep package (NULL if not required)

lag_method

window

A list of windows to use to calculate neighbouring values if rasters are used.

spconsist

A boolean indicating if the spatial consistency must be calculated

classidx

A boolean indicating if the quality of classification indices must be calculated

nrep

An integer indicating the number of permutation to do to simulate the random distribution of the spatial inconsistency. Only used if spconsist is TRUE.

indices

standardize

A boolean to specify if the variable must be centered and reduce (default = True)

robust

A boolean indicating if the "robust" version of the algorithm must be used (see details)

noise_cluster

A boolean indicatong if a noise cluster must be added to the solution (see details)

delta

A float giving the distance of the noise cluster to each observation

maxiter

An integer for the maximum number of iteration

tol

The tolerance criterion used in the evaluateMatrices function for convergence assessment

chunk_size

The size of a chunk used for multiprocessing. Default is 100.

seed

An integer used for random number generation. It ensures that the start centers will be the same if the same integer is selected.

init

verbose

A boolean indicating if a progressbar should be displayed

Value

A dataframe with indicators assessing the quality of classifications

Examples


data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
future::plan(future::multisession(workers=2))
#set spconsist to TRUE to calculate the spatial consistency indicator
#FALSE here to reduce the time during package check
values <- select_parameters.mc("SFCM", dataset, k = 5, m = seq(1,2.5,0.1),
    alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)


data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
future::plan(future::multisession(workers=2))
#set spconsist to TRUE to calculate the spatial consistency indicator
#FALSE here to reduce the time during package check
values <- select_parameters.mc("SFCM", dataset, k = 5, m = seq(1,2.5,0.1),
    alpha = seq(0,2,0.1), nblistw = Wqueen, spconsist=FALSE)

Spatial consistency index

Description

Calculate a spatial consistency index

Usage

spConsistency(
  object,
  nblistw = NULL,
  window = NULL,
  nrep = 999,
  adj = FALSE,
  mindist = 1e-11
)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix.

nblistw

A list.w object describing the neighbours typically produced by the spdep package. Required if data is a dataframe, see the parameter window if you use a list of rasters as input. Can also be NULL if object is a FCMres object.

window

if rasters were used for the classification, the window must be specified instead of a list.w object. Can also be NULL if object is a FCMres object.

nrep

An integer indicating the number of permutation to do to simulate spatial randomness. Note that if rasters are used, each permutation can be very long.

adj

A boolean indicating if the adjusted version of the indicator must be calculated when working with rasters (globally standardized). When working with vectors, see the function adjustSpatialWeights to modify the list.w object.

mindist

Details

This index is experimental, it aims to measure how much a clustering solution is spatially consistent. A classification is spatially inconsistent if neighbouring observation do not belong to the same group. See detail for a description of its calculation

The total spatial inconsistency (*Scr*) is calculated as follow

isp = \sum_{i}\sum_{j}\sum_{k} (u_{ik} - u_{jk})^{2} * W_{ij}

With U the membership matrix, i an observation, k the neighbours of i and W the spatial weight matrix This represents the total spatial inconsistency of the solution (true inconsistency) We propose to compare this total with simulated values obtained by permutations (simulated inconsistency). The values obtained by permutation are an approximation of the spatial inconsistency obtained in a random context Ratios between the true inconsistency and simulated inconsistencies are calculated A value of 0 depict a situation where all observations are identical to their neighbours A value of 1 depict a situation where all observations are as much different as their neighbours that what randomness can produce A classification solution able to reduce this index has a better spatial consistency

Value

A named list with

Mean : the mean of the spatial consistency index
prt05 : the 5th percentile of the spatial consistency index
prt95 : the 95th percentile of the spatial consistency index
samples : all the value of the spatial consistency index
sum_diff : the total sum of squarred difference between observations and their neighbours

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
# NOTE : more replications are needed for proper inference
spConsistency(result$Belongings, nblistw = Wqueen, nrep=25)

Classification result explorer

Description

Start a local Shiny App to explore the results of a classification

Usage

sp_clust_explorer(
  object = NULL,
  spatial = NULL,
  membership = NULL,
  dataset = NULL,
  port = 8100,
  ...
)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans

spatial

A feature collection (sf) used to map the observations. Only needed if object was not created from rasters.

membership

A matrix or a dataframe representing the membership values obtained for each observation. If NULL, then the matrix is extracted from object.

dataset

A dataframe or matrix representing the data used for the classification. If NULL, then the matrix is extracted from object.

port

An integer of length 4 indicating the port on which to start the Shiny app. Default is 8100

...

Other parameters passed to the function runApp

Examples

## Not run: 
data(LyonIris)

#selecting the columns for the analysis
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14",
                   "Pct_65","Pct_Img","TxChom1564","Pct_brevet","NivVieMed")

#rescaling the columns
Data <- sf::st_drop_geometry(LyonIris[AnalysisFields])
for (Col in names(Data)){
  Data[[Col]] <- as.numeric(scale(Data[[Col]]))
}

Cmean <- CMeans(Data,4,1.5,500,standardize = FALSE, seed = 456, tol = 0.00001, verbose = FALSE)

sp_clust_explorer(Cmean, LyonIris)

## End(Not run)

Spatial diagnostic

Description

Utility function to facilitate the spatial diagnostic of a classification

Calculate the following indicators: Moran I index (spdep::moranI) for each column of the membership matrix, Join count test (spdep::joincount.multi) for the most likely groups of each datapoint, Spatial consistency index (see function spConsistency) and the Elsa statistic (see function calcElsa). Note that if the FCMres object given was constructed with rasters, the joincount statistic is not calculated and no p-values are provided for the Moran I indices.

Usage

spatialDiag(
  object,
  nblistw = NULL,
  window = NULL,
  undecided = NULL,
  matdist = NULL,
  nrep = 50
)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans. Can also be a simple membership matrix.

nblistw

window

If rasters were used for the classification, the window must be specified instead of a list.w object. Can also be NULL if object is a FCMres object.

undecided

A float giving the threslhod to detect undecided observations. An observation is undecided if its maximum membership value is bellow this float. If null, no observations are undecided.

matdist

A matrix representing the dissimilarity between the clusters. The matrix must be squared and the diagonal must be filled with zeros.

nrep

An integer indicating the number of permutation to do to simulate the random distribution of the spatial inconsistency

Value

A named list with :

MoranValues : the moran I values for each column of the membership matrix (spdep::MoranI)
JoinCounts : the result of the join count test calculated with the most likely group for each datapoint (spdep::joincount.multi)
SpConsist : the mean value of the spatial consistency index (the lower, the better, see ?spConsistency for details)

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
spatialDiag(result, undecided=0.45, nrep=30)

Spider chart

Description

Display spider charts to quickly compare values between groups

Usage

spiderPlots(data, belongmatrix, chartcolors = NULL)

Arguments

data

A dataframe with numeric columns

belongmatrix

A membership matrix

chartcolors

A vector of color names used for the spider plot

Details

For each group, the weighted mean of each variable in data is calculated based on the probability of belonging to this group of each observation. On the chart the exterior ring represents the maximum value obtained for all the groups and the interior ring the minimum. The groups are located between these two limits in a linear way.

Value

NULL, the plots are displayed directly by the function (see fmsb::radarchart)

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
spiderPlots(dataset,result$Belongings)

element wise square root of a matrix by column

Description

element wise square root of a matrix by column

Usage

sqrt_matrix_bycol(x)

Arguments

x

a matrix

Value

a matrix

Standardizing helper

Description

Create functions to standardize and unstandardize data

Usage

standardizer(x)

Arguments

x

a numeric vector or a data.frame with only numeric columns. Non numeric columns are dropped.

Value

If x was a vector, the function returns a list containing two functions : scale and unscale. The first one is an equivalent of the classical function scale(x, center = TRUE, scale = TRUE). The second can be used to reverse the scaling and get back original units. If x was a data.frame, the same pair of functions is returned inside of a list for each numeric column.

Examples

data(LyonIris)
LyonScales <- standardizer(sf::st_drop_geometry(LyonIris))

substraction of two matrices by column

Description

substraction of two matrices by column

Usage

sub_matrices_bycol(x, y)

Arguments

x

a matrix

y

a matrix with the same dimensions

Value

a matrix

Descriptive statistics by group

Description

Calculate some descriptive statistics of each group

Usage

summarizeClusters(data, belongmatrix, weighted = TRUE, dec = 3, silent = TRUE)

Arguments

data

The original dataframe used for the classification

belongmatrix

A membership matrix

weighted

A boolean indicating if the summary statistics must use the membership matrix columns as weights (TRUE) or simply assign each observation to its most likely cluster and compute the statistics on each subset (FALSE)

dec

An integer indicating the number of digits to keep when rounding (default is 3)

silent

A boolean indicating if the results must be printed or silently returned

Value

A list of length k (the number of group). Each element of the list is a dataframe with summary statistics for the variables of data for each group

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
summarizeClusters(dataset, result$Belongings)

Summary method for FCMres

Description

Calculate some descriptive statistics of each group of a FCMres object

Usage

## S3 method for class 'FCMres'
summary(object, data = NULL, weighted = TRUE, dec = 3, silent = TRUE, ...)

Arguments

object

A FCMres object, typically obtained from functions CMeans, GCMeans, SFCMeans, SGFCMeans

data

A dataframe to use for the summary statistics instead of obj$data

weighted

dec

An integer indicating the number of digits to keep when rounding (default is 3)

silent

A boolean indicating if the results must be printed or silently returned

...

Not used

Value

A list of length k (the number of group). Each element of the list is a dataframe with summary statistics for the variables of data for each group

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
summary(result)

create a logical matrix with inferior comparison

Description

create a logical matrix with inferior comparison

Usage

test_inferior_mat(mat, t)

Arguments

mat

a matrix

t

a double to compare

Value

a LogicalMatrix

Uncertainty map

Description

Return a map to visualize membership matrix

Usage

uncertaintyMap(
  geodata,
  belongmatrix,
  njit = 150,
  radius = NULL,
  colors = NULL,
  pt_size = 0.05
)

Arguments

geodata

An object of class feature collection from sf ordered like the original data used for the clustering.

belongmatrix

A membership matrix

njit

The number of points to map on each feature.

radius

When mapping points, the radius indicates how far random points will be plotted around the original features.

colors

A vector of colors to use for the groups.

pt_size

A float giving the size of the random points on the final map (default is 0.05)

Details

This function maps the membership matrix by plotting random points in polygons, along lines or around points representing the original observations. Each cluster is associated with a color and each random point has a probability to be of that color equal to the membership value of the feature it belongs itself. Thus, it is possible to visualize regions with uncertainty and to identify the strongest clusters.

Value

a map created with tmap

Examples

## Not run: 
data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
  "TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
uncertaintyMap(LyonIris, result$Belongings)

## End(Not run)

Undecided observations

Description

Identify the observation for which the classification is uncertain

Usage

undecidedUnits(belongmatrix, tol = 0.1, out = "character")

Arguments

belongmatrix

The membership matrix obtained at the end of the algorithm

tol

A float indicating the minimum required level of membership to be not considered as undecided

out

The format of the output vector. Default is "character". If "numeric", then the undecided units are set to -1.

Value

A vector indicating the most likely group for each observation or "Undecided" if the maximum probability for the observation does not reach the value of the tol parameter

Examples

data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
undecidedUnits(result$Belongings, tol = 0.45)

minimum of a vector

Description

minimum of a vector

maximum of a vector

Usage

vecmin(x)

vecmax(x)

Arguments

x

a NumericVector

Value

a double

create a matrix by multiplying a vector by its elements one by one as rows

Description

create a matrix by multiplying a vector by its elements one by one as rows

Usage

vector_out_prod(x)

Arguments

x

a vector

Value

a NumericMatrix

Violin plots

Description

Return violin plots to compare the distribution of each variable for each group.

Usage

violinPlots(data, groups)

Arguments

data

A dataframe with numeric columns

groups

A vector indicating the group of each observation

Value

A list of plots created with ggplot2

Examples

## Not run: 
data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometrie(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
violinPlots(dataset, result$Groups)

## End(Not run)