Type: | Package |
Title: | MST-kNN Clustering Algorithm |
Version: | 0.3.2 |
Date: | 2023-01-23 |
Author: | Jorge Parraga-Alava [aut, cre], Pablo Moscato [aut], Mario Inostroza-Ponta [aut] |
Maintainer: | Jorge Parraga-Alava <jorge.parraga@usach.cl> |
Description: | Implements the MST-kNN clustering algorithm which was proposed by Inostroza-Ponta, M. (2008) https://trove.nla.gov.au/work/28729389?selectedversion=NBD44634158. |
Depends: | R (≥ 3.2.5) |
License: | GPL-2 |
Encoding: | UTF-8 |
Imports: | igraph, stats, base |
RoxygenNote: | 7.1.1 |
VignetteBuilder: | knitr |
Suggests: | knitr, rmarkdown |
NeedsCompilation: | no |
Packaged: | 2023-01-27 11:37:45 UTC; jorgeklz |
Repository: | CRAN |
Date/Publication: | 2023-01-27 14:10:02 UTC |
Computes the edge costs sum of a proximity graph
Description
This function computes the edge costs (distances) overall sum of a proximity graph.
Usage
compute.costs.proximity.graph(graph.edges, distance.matrix)
Arguments
graph.edges |
A object of class "matrix" with two columns (object_i, object_j) representing the objects (nodes) of a proximity graph. |
distance.matrix |
A distance matrix between each pair of object i,j in the proximity graph. |
Value
total.costs.graph |
A numeric value representing the edge costs (distance) overall sum of a proximity graph. |
Examples
set.seed(1987)
##Generates a data matrix of dimension 50X13
n=50; m=13
x <- matrix(runif(n*m, min = -5, max = 10), nrow=n, ncol=m)
##Computes a distance matrix of x.
library("stats")
d <- base::as.matrix(stats::dist(x, method="euclidean"))
##Generates complete graph (CG)
cg <- generate.complete.graph(1:nrow(x),d)
##Generates a proximity graph (MST)
mstree <- generate.mst(cg)
##Calculate the edge cost sum of proximity graph (MST)
mstree.cost=as.numeric(compute.costs.proximity.graph(as.matrix(mstree$edges.mst.graph[,1:2]), d))
mstree.cost
##Generates a proximity graph (kNN)
knneig <- generate.knn(cg)
##Calculate the edge cost sum of proximity graph (kNN)
knneig.cost=as.numeric(compute.costs.proximity.graph(as.matrix(knneig$edges.knn.graph[,1:2]), d))
knneig.cost
Indo-European languages dataset
Description
It contains the distances between 84 Indo-European languages based on the mean percent difference in cognacy, using the 200 Swadesh words.
Usage
data(dslanguages)
Format
An data frame with 84 rows and 84 columns containing a distance matrix.
Details
Once the data set is loaded, it can be accessed as an object of class dataframe called dslanguages
.
References
Dyen, I., Kruskal, J., and Black, P. (1992). An indoeuropean classification: A lexicostatistical experiment. Transactions of the American Philosophical Society. 82, (5).
Budding Yeast dataset
Description
It contains the expression levels of 2467 genes on 79 samples corresponding to 8 different experiments of the budding yeast: alpha factor (18 samples), cdc15 (15 samples), cold shock (4 samples), diauxic shift (7 samples), DTT shock (4 samples), elutriation (14 samples), heat shock (6 samples) and sporulation (11 samples).
Usage
data(dsyeastexpression)
Format
An data frame with 2467 rows and 79 columns.
Details
Once the data set is loaded, it can be accessed as an object of class dataframe called dsyeastexpression
.
Source
https://www.pnas.org/doi/10.1073/pnas.95.25.14863
References
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. (1998). Cluster analysis and display of genome-wideexpression patterns.Proceedings of the National Academy of Sciences, 95(25):14863–14868
Generates a complete graph
Description
This function generates a complete graph. Where each node represents a object_i and the edges have a cost representing the distance d_ij between object_i and other object_j.
Usage
generate.complete.graph(nodes.list, distance.matrix)
Arguments
nodes.list |
A vector with a subset of nodes (objects) of the data matrix for which the complete graph must be generated. |
distance.matrix |
A distance matrix between each pair of objects in |
Value
edges.complete.graph |
A object of class "data.frame" with three columns (object_i, object_j, d_ij) representing the distance between object i and object j of the distance matrix. For instance: |
object_i | object_j | d_ij |
1 | 2 | 1.60 |
1 | 3 | 0.08 |
1 | 4 | 1.21 |
... | ... | ... |
n-1 | n | ... |
Author(s)
Mario Inostroza-Ponta, Jorge Parraga-Alava, Pablo Moscato
Examples
set.seed(1987)
##Generates a data matrix of dimension 50X13
n=50; m=13
x <- matrix(runif(n*m, min = -5, max = 10), nrow=n, ncol=m)
##Computes a distance matrix of x.
library("stats")
d <- base::as.matrix(stats::dist(x, method="euclidean"))
##Generates complete graph (CG)
cg <- generate.complete.graph(1:nrow(x),d)
head(cg)
##Visualizing CG graph
library("igraph")
cg.network=igraph::graph.adjacency(d, mode="undirected", weighted=TRUE)
plot(cg.network, edge.label=round(E(cg.network)$weight, 2), main="Complete Graph")
Performs the intersections between MST y kNN graphs
Description
This function performs a graph partition based on the intersection of the edges of two proximity graphs: MST and kNN.
Usage
generate.intersections.mst.knn(nodes.list, distance.matrix, suggested.k)
Arguments
nodes.list |
A vector with a subset of objects (nodes) of the data matrix for which the MST y kNN graphs must be generated. |
distance.matrix |
A distance matrix between each pair of elements in |
suggested.k |
A numeric value representing the number of nearest neighbors to consider to generate the kNN graph. |
Value
A list with the elements
cc |
A numeric value representing the number of connected components (cc) generated after graphs intersection. |
subgraphs |
A list where each item contains the nodes of the connected components (cc) generated. |
ccgraph |
A object of class "igraph" which is a network with each connected components (cc) generated. |
Author(s)
Mario Inostroza-Ponta, Jorge Parraga-Alava, Pablo Moscato
Generates a kNN graph
Description
This function generates the k-Nearest Neighbors (kNN) graph which is a subgraph contains edges between nodes if, and only if, they are one of the k nearest neighbors considering the edges costs (distances). Each node represents an object of the complete graph.
Usage
generate.knn(edges.complete.graph, suggested.k)
Arguments
edges.complete.graph |
A object of class "data.frame" with three columns (object_i, object_j, d_ij) representing the distance d_ij between object_i and object_j. |
suggested.k |
It is an optional argument. A numeric value representing the suggested number of k-nearest neighbors to consider to generate the kNN graph. |
Details
During its generation, the k value is automatically determined by the definition:
k = min{\lfloor \ln(|nodes.list|) \rfloor; min k | kNN is connected; suggested.k }
If suggested.k parameter is not provided, it is not considered by the definition.
Value
A list with the elements
edges.knn.graph |
A object of class "data.frame" with three columns (object_i, object_j, d_ij) representing the d_ij between object_i and object_j that are part of the kNN graph. |
knn.graph |
A object of class "igraph" which is the k-Nearest Neighbors (kNN) graph generated. |
k |
The k value determined by the definition. |
Author(s)
Mario Inostroza-Ponta, Jorge Parraga-Alava, Pablo Moscato
Examples
set.seed(1987)
##Generates a data matrix of dimension 50X13
n=50; m=13
x <- matrix(runif(n*m, min = -5, max = 10), nrow=n, ncol=m)
##Computes a distance matrix of x.
library("stats")
d <- base::as.matrix(stats::dist(x, method="euclidean"))
##Generates complete graph (CG) without suggested.k parameter
cg <- generate.complete.graph(1:nrow(x),d)
##Generates kNN graph
knn <- generate.knn(cg)
##Visualizing kNN graph
plot(knn$knn.graph,
main=paste("kNN \n k=", knn$k, sep=""))
##Generates complete graph (CG) with suggested.k parameter
cg <- generate.complete.graph(1:nrow(x),d)
##Generates kNN graph
knn <- generate.knn(cg, suggested.k=4)
##Visualizing kNN graph
plot(knn$knn.graph,
main=paste("kNN \n k=", knn$k, sep=""))
Generates a MST graph
Description
This function generates the Minimal Spanning Tree (MST) graph which is a connected and acyclic subgraph contains all the nodes of the complete graph (CG) and whose edges sum (distances) has minimum costs. Each node represents an object of the complete graph.
Usage
generate.mst(edges.complete.graph)
Arguments
edges.complete.graph |
A object of class "data.frame" with three columns (object_i, object_j, d_ij) representing the distance d_ij between object i and object j of the complete graph. |
Details
Generation of MST graph is performed using the Prim's algorithm.
Value
A list with the elements
edges.mst.graph |
A object of class "data.frame" with three columns (object_i, object_j, d_ij) representing the distance d_ij between object i and object j that are part of the MST graph. |
mst.graph |
A object of class "igraph" which is the Minimal Spanning Tree (MST) graph generated. |
Author(s)
Mario Inostroza-Ponta, Jorge Parraga-Alava, Pablo Moscato
References
Prim, R.C. (1957). Shortest connection networks and some generalizations. Bell System Technical Journal, 37 1389-1401.
Ignatenkov, E. (2015). Minimum Spanning Tree (MST) for some graph using Prim's MST algorithm. Stanford University course on Coursera.
Examples
set.seed(1987)
##Generates a data matrix of dimension 50X13
n=50; m=13
x <- matrix(runif(n*m, min = -5, max = 10), nrow=n, ncol=m)
##Computes a distance matrix of x.
library("stats")
d <- base::as.matrix(stats::dist(x, method="euclidean"))
##Generates complete graph (CG)
cg <- generate.complete.graph(1:nrow(x),d)
##Generates MST graph
mstree <- generate.mst(cg)
##Visualizing MST graph
plot(mstree$mst.graph, main="MST")
Generates clustering results
Description
This function performs the union the all component connected (cc) yield in each recursion of the MST-kNN algorithm.
Usage
generate.results(g_clusters, distance.matrix)
Arguments
g_clusters |
A object of class igraph containing all component connected (cc=1). |
distance.matrix |
A numeric matrix or data.frame with equals names and numbers of rows and columns representing objects to group. |
Value
A list with the elements
cnumber |
A numeric value representing the number of clusters of the solution. |
cluster |
A named vector of integers of size n with values in range |
partition |
A partition matrix order by cluster where are shown the objects and the cluster where they are assigned. |
csize |
A vector of size k with the cardinality of each cluster in the solution. |
network |
An object of class "igraph" as a network representing the clustering solution. |
Performs the MST-kNN clustering algorithm
Description
Performs the MST-kNN clustering algorithm which generates a clustering solution with automatic number of clusters determination using two proximity graphs: Minimal Spanning Tree (MST) and k-Nearest Neighbor (kNN) which are recursively intersected.
To create MST, Prim algorithm is used. To create kNN, distance.matrix
passed as input is considered.
Usage
mst.knn(distance.matrix, suggested.k)
Arguments
distance.matrix |
A numeric matrix or data.frame with equals numbers of rows and columns representing distances between objects to group. |
suggested.k |
It is an optional argument. A numeric value representing the suggested number of k-nearest neighbors to consider during the generating the kNN graph. Note that, due to the algorithm operation, this number may be different during the algorithm execution. |
Details
To see more details of how MST-kNN works refers to the quick guide.
Value
A list with the elements
cnumber |
A numeric value representing the number of clusters of the solution. |
cluster |
A named vector of integers from |
partition |
A partition matrix order by cluster where are shown the objects and the cluster where they are assigned. |
csize |
A vector with the cardinality of each cluster in the solution. |
network |
An object of class "igraph" as a network representing the clustering solution. |
Author(s)
Mario Inostroza-Ponta, Jorge Parraga-Alava, Pablo Moscato
References
Inostroza-Ponta, M. (2008). An Integrated and Scalable Approach Based on Combinatorial Optimization Techniques for the Analysis of Microarray Data. Ph.D. thesis, School of Electrical Engineering and Computer Science. University of Newcastle.
Examples
set.seed(1987)
##load package
library("mstknnclust")
##Generates a data matrix of dimension 100X15
n=100; m=15
x <- matrix(runif(n*m, min = -5, max = 10), nrow=n, ncol=m)
##Computes a distance matrix of x.
library("stats")
d <- base::as.matrix(stats::dist(x, method="euclidean"))
##Performs MST-kNN clustering using euclidean distance.
results <- mst.knn(d)
## Visualizes the clustering solution
library("igraph")
plot(results$network, vertex.size=8,
vertex.color=igraph::clusters(results$network)$membership,
layout=igraph::layout.fruchterman.reingold(results$network, niter=10000),
main=paste("MST-kNN \n Clustering solution \n Number of clusters=",results$cnumber,sep="" ))
Generates the solution when only singletons are yield
Description
Generates the solution when only singletons are yield
Usage
only.single.graphs(total_nodos, nodos_singletons)
Arguments
total_nodos |
Total number of nodes in data matrix. |
nodos_singletons |
Nodes list with cluster singletons. |
Value
clusteres_unidos
An object of class "igraph" as a network representing the clustering solution.