Type: | Package |
Title: | Non-Parametric Multiple Change-Point Analysis of Multivariate Data |
Version: | 3.1.6 |
Date: | 2024-8-25 |
Maintainer: | Wenyu Zhang <wz258@cornell.edu> |
Description: | Implements various procedures for finding multiple change-points from Matteson D. et al (2013) <doi:10.1080/01621459.2013.849605>, Zhang W. et al (2017) <doi:10.1109/ICDMW.2017.44>, Arlot S. et al (2019). Two methods make use of dynamic programming and pruning, with no distributional assumptions other than the existence of certain absolute moments in one method. Hierarchical and exact search methods are included. All methods return the set of estimated change- points as well as other summary information. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Depends: | R (≥ 3.00), Rcpp |
Suggests: | mvtnorm, MASS, combinat, R.rsp |
LinkingTo: | Rcpp |
NeedsCompilation: | yes |
Repository: | CRAN |
VignetteBuilder: | R.rsp |
Packaged: | 2024-08-25 17:13:46 UTC; wenyuzhang |
Author: | Nicholas A. James [aut], Wenyu Zhang [aut, cre], David S. Matteson [aut] |
Date/Publication: | 2024-08-26 05:50:02 UTC |
Bladder Tumor Micro-Array Data
Description
Micro-array data for 43 different individuals with a bladder tumor.
Usage
data(ACGH)
Format
A list with the following components.
data: The micro-array data for 43 individuals. This information is stored in a 2215 by 43 matrix.
individual: A numeric vector indicating which individuals' mico-array data are present.
Source
Bleakley K., Vert J.-P. (2011), The group fused Lasso for multiple change-point detection
N. Stransky, C. Vallot, F. Reyal, I. Bernard-Pierrot, S.G. Diez de Mediana, R. Segraves, Y. de Rycke, P. Elvin, A. Cassidy, C. Sparaggon, A. Graham, j. Southgate, B. Asselain, Y. Allory, C. C. Addou, D. G. Albertson, J.-P. Thiery, D. K. Chopin, D. Pinkel, and F. Radvanyi. Regional copy number-independent deregulation of transcription in cancer. Nat. Genet., 38(12):1386-1396, Dec 2006
References
Bleakley K., Vert J.-P. (2011), The group fused Lasso for multiple change-point detection
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Examples
data(ACGH, package="ecp")
Dow Jones Industrial Average Index
Description
The weekly log returns for the Dow Jones Industrial Average index from April 1990 to January 2012.
Usage
data(DJIA)
Format
A list with the following components.
dates: A character vector of dates associated with each observation in the returns series.
index: Weekly log returns from April 1990 to January 2012 of the DOW 30 index.
market: Weekly log returns from April 1990 to January 2012, for the companies in the DOW 30 apart from Kraft.
Source
http://research.stlouisfed.org/fred2/series/DJIA/downloaddata
References
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Examples
data(DJIA, package="ecp")
ENERGY AGGLOMERATIVE
Description
An agglomerative hierarchical estimation algorithm for multiple change point analysis.
Usage
e.agglo(X, member=1:nrow(X), alpha=1, penalty=function(cps){0})
Arguments
X |
A T x d matrix containing the length T time series with d-dimensional observations. |
member |
Initial membership vector for the time series. |
alpha |
Moment index used for determining the distance between and within clusters. |
penalty |
Function used to penalize the obtained goodness-of-fit statistics. This function takes as its input a vector of change point locations (cps). |
Details
Homogeneous clusters are created based on the initial clustering provided by the member argument. In each iteration, clusters are merged so as to maximize a goodness-of-fit statistic. The computational complexity of this method is O(T^2), where T is the number of observations.
Value
Returns a list with the following components.
merged |
A (T-1) x 2 matrix indicating which segments were merged at each step of the agglomerative procedure. |
fit |
Vector showing the progression of the penalized goodness-of-fit statistic. |
progression |
A T x (T+1) matrix showing the progression of the set of change points. |
cluster |
The estimated cluster membership vector. |
estimates |
The location of the estimated change points. |
Author(s)
Nicholas A. James
References
Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
See Also
Rizzo M.L., Szekely G.L. (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.
Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.
Examples
set.seed(100)
mem = rep(c(1,2,3,4),times=c(10,10,10,10))
x = as.matrix(c(rnorm(10,0,1),rnorm(20,2,1),rnorm(10,-1,1)))
y = e.agglo(X=x,member=mem,alpha=1,penalty=function(cp,Xts) 0)
y$estimates
## Not run:
# Multivariate spatio-temporal example
# You will need the following packages:
# mvtnorm, combinat, and MASS
library(mvtnorm); library(combinat); library(MASS)
set.seed(2013)
lambda = 1500 #overall arrival rate per unit time
muA = c(-7,-7) ; muB = c(0,0) ; muC = c(5.5,0)
covA = 25*diag(2)
covB = matrix(c(9,0,0,1),2)
covC = matrix(c(9,.9,.9,9),2)
time.interval = matrix(c(0,1,3,4.5,1,3,4.5,7),4,2)
#mixing coefficents
mixing.coef = rbind(c(1/3,1/3,1/3),c(.2,.5,.3), c(.35,.3,.35),
c(.2,.3,.5))
stppData = NULL
for(i in 1:4){
count = rpois(1, lambda* diff(time.interval[i,]))
Z = rmultz2(n = count, p = mixing.coef[i,])
S = rbind(rmvnorm(Z[1],muA,covA), rmvnorm(Z[2],muB,covB),
rmvnorm(Z[3],muC,covC))
X = cbind(rep(i,count), runif(n = count, time.interval[i,1],
time.interval[i,2]), S)
stppData = rbind(stppData, X[order(X[,2]),])
}
member = as.numeric(cut(stppData[,2], breaks = seq(0,7,by=1/12)))
output = e.agglo(X=stppData[,3:4],member=member,alpha=1,
penalty=function(cp,Xts) 0)
## End(Not run)
CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA E-STATISTIC)
Description
An algorithm for multiple change point analysis that uses dynamic programming and pruning. The E-statistic is used as the goodness-of-fit measure.
Usage
e.cp3o(Z, K=1, minsize=30, alpha=1, verbose=FALSE)
Arguments
Z |
A T x d matrix containing the length T time series with d-dimensional observations. |
K |
The maximum number of change points. |
minsize |
The minimum segment size. |
alpha |
The moment index used for determining the distance between and within segments. |
verbose |
A flag indicating if status updates should be printed. |
Details
Segmentations are found through the use of dynamic programming and pruning. For long time series, consider using e.cp3o_delta.
Value
The returned value is a list with the following components.
number |
The estimated number of change points. |
estimates |
The location of the change points estimated by the procedure. |
gofM |
A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on. |
cpLoc |
The list of locations of change points estimated by the procedure for different numbers of change points up to K. |
time |
The total amount to time take to estimate the change point locations. |
Author(s)
Nicholas A. James, Wenyu Zhang
References
W. Zhang, N. A. James and D. S. Matteson, "Pruning and Nonparametric Multiple Change Point Detection," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, 2017, pp. 288-295.
See Also
Rizzo M.L., Szekely G.L (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification.
Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics.
Examples
set.seed(400)
x1 = matrix(c(rnorm(50),rnorm(50,3)))
y1 = e.cp3o(Z=x1, K=2, minsize=30, alpha=1, verbose=FALSE)
#View estimated change point locations
y1$estimates
CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA E-STATISTIC)
Description
An algorithm for multiple change point analysis that uses dynamic programming and pruning. The E-statistic is used as the goodness-of-fit measure.
Usage
e.cp3o_delta(Z, K=1, delta=29, alpha=1, verbose=FALSE)
Arguments
Z |
A T x d matrix containing the length T time series with d-dimensional observations. |
K |
The maximum number of change points. |
delta |
The window size used to calculate the calculate the complete portion of our approximate test statistic. This also corresponds to one less than the minimum segment size. |
alpha |
The moment index used for determining the distance between and within segments. |
verbose |
A flag indicating if status updates should be printed. |
Details
Segmentations are found through the use of dynamic programming and pruning. Between-segment distances are calculated only using points within a window of the segmentation point. The computational complexity of this method is O(KT^2), where K is the maximum number of change points, and T is the number of observations.
Value
The returned value is a list with the following components.
number |
The estimated number of change points. |
estimates |
The location of the change points estimated by the procedure. |
gofM |
A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on. |
cpLoc |
The list of locations of change points estimated by the procedure for different numbers of change points up to K. |
time |
The total amount to time take to estimate the change point locations. |
Author(s)
Nicholas A. James, Wenyu Zhang
References
W. Zhang, N. A. James and D. S. Matteson, "Pruning and Nonparametric Multiple Change Point Detection," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, 2017, pp. 288-295.
See Also
Rizzo M.L., Szekely G.L (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification.
Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics.
Examples
set.seed(400)
x1 = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y1 = e.cp3o_delta(Z=x1, K=7, delta=29, alpha=1, verbose=FALSE)
#View estimated change point locations
y1$estimates
ENERGY DIVISIVE
Description
A divisive hierarchical estimation algorithm for multiple change point analysis.
Usage
e.divisive(X, sig.lvl=.05, R=199, k=NULL, min.size=30, alpha=1)
Arguments
X |
A T x d matrix containing the length T time series with d-dimensional observations. |
sig.lvl |
The level at which to sequentially test if a proposed change point is statistically significant. |
R |
The maximum number of random permutations to use in each iteration of the permutation test. The permutation test p-value is calculated using the method outlined in Gandy (2009). |
k |
Number of change point locations to estimate, suppressing permutation based testing. If k=NULL then only the statistically significant estimated change points are returned. |
min.size |
Minimum number of observations between change points. |
alpha |
The moment index used for determining the distance between and within segments. |
Details
Segments are found through the use of a binary bisection method and a permutation test. The computational complexity of this method is O(kT^2), where k is the number of estimated change points, and T is the number of observations.
Value
The returned value is a list with the following components.
k.hat |
The number of clusters within the data created by the change points. |
order.found |
The order in which the change points were estimated. |
estimates |
Locations of the statistically significant change points. |
considered.last |
Location of the last change point, that was not found to be statistically significant at the given significance level. |
permutations |
The number of permutations performed by each of the sequential permutation test. |
cluster |
The estimated cluster membership vector. |
p.values |
Approximate p-values estimated from each permutation test. |
Author(s)
Nicholas A. James
References
Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
See Also
Gandy, A. (2009) "Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk." Journal of the American Statistical Association.
Rizzo M.L., Szekely G.L (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification.
Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics.
Examples
## Not run:
set.seed(100)
x1 = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y1 = e.divisive(X=x1,sig.lvl=0.05,R=199,k=NULL,min.size=30,alpha=1)
x2 = rbind(MASS::mvrnorm(100,c(0,0),diag(2)),
MASS::mvrnorm(100,c(2,2),diag(2)))
y2 = e.divisive(X=x2,sig.lvl=0.05,R=499,k=NULL,min.size=30,alpha=1)
## End(Not run)
ENERGY SPLIT
Description
Finds the most likely location for a change point across all current clusters.
Usage
e.split(changes, D, min.size, for.sim=FALSE, env=emptyenv())
Arguments
changes |
A vector containing the current set of change points. |
D |
An n by n distance matrix. |
min.size |
Minimum number of observations between change points. |
for.sim |
Boolean value indicating if the function is to be run on permuted data for significance testing. |
env |
Environment that contains information to help reduce computational time. |
Details
This method is called by the e.divisive method, and should not be called by the user.
Value
A list with the following components is returned.
first |
The index of the first element of the cluster to be divided. |
second |
The index of the last element of the cluster to be divided. |
third |
The new set of change points. |
fourth |
The distance between the clusters created by the newly proposed change point. |
Author(s)
Nicholas A. James
References
Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Rizzo M.L., Szekely G.L. (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.
Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.
See Also
Internal Energy Change Point Functions
Description
Internal Energy Change Point functions.
Details
These are not to be called by the user.
FIND CLOSEST CLUSTERS
Description
Determines which two segments to merge.
Usage
find.closest(K, ret)
Arguments
K |
Integer indicating the progress of the agglomerative process. |
ret |
A list with 'open', 'N', and 'right' components |
Details
This method is called by the e.agglomerative method, and should not be called by the user.
Value
Returns a vector with 3 components. The first two indicate which segments are to be merged. The third is the new goodness of fit statistics.
Author(s)
Nicholas A. James
References
James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.
See Also
GET BETWEEN DISTANCE
Description
Returns the energy distance between two sets of numerical data.
Usage
getBetween(alpha_, X_, Y_)
Arguments
alpha_ |
A weighting parameter used for calculating the energy distance. This value should be in (0,2]. |
X_ |
A n by d matrix of the n d-dimensional observations. |
Y_ |
A m by d matrix of the m d-dimensional observations. |
Details
The matrices X_ and Y_ do not need to have the same number of rows, but they do require the same number of columns.
Value
The returned value is a real number indicating the energy distance between the two data sets.
Author(s)
Nicholas A. James
References
James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.
Rizzo ML, Szekely GL (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.
See Also
Examples
set.seed(100)
X = matrix(rnorm(100),ncol=2)
Y = matrix(rnorm(126,1),ncol=2)
alpha = 1
between.distance = getBetween(alpha,X,Y)
GET WITHIN DISTANCE
Description
Calculate the energy distance within a data set.
Usage
getWithin(alpha_, X_)
Arguments
alpha_ |
A weighting parameter used for calculating the energy distance. This value should be in (0,2]. |
X_ |
A n by d matrix of the n d-dimensional observations. |
Value
The returned value is a real number indicating the energy distance within the given data set.
Author(s)
Nicholas A. James
References
James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.
Rizzo ML, Szekely GL (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.
See Also
Examples
set.seed(100)
X = matrix(rnorm(150),ncol=2)
alpha = 1
distance = getWithin(alpha,X)
GOODNESS OF FIT UPDATE
Description
Updates the goodness of fit statistic.
Usage
gof.update(i,ret)
Arguments
i |
segments which is to be merged with the segment that is adjacent to its right. |
ret |
A list with 'gof', 'right', 'left', 'D', and 'size' comonents. |
Details
Called by the e.agglo method, and should not be called by the user.
Value
Returns a real number. This is the updated goodness of fit statistic.
Author(s)
Nicholas A. James
References
James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.
See Also
Kernel Change Point Analysis
Description
An algorithm for multiple change point analysis that uses the 'kernel trick' and dynamic programming.
Usage
kcpa(X, L, C)
Arguments
X |
A T x d matrix containing the length T time series with d-dimensional observations. |
L |
The maximum number of change points. |
C |
The constant used to penalize the inclusion of additional change points in the fitted model. |
Details
Segments are found through the use of dynamic programming and the kernel trick.
Value
If the algorithm determines that the best fit is obtained through using k change points then the returned value is an array of length k, containing the change point locations.
Author(s)
Nicholas A. James
References
Arlot S., Celisse A., Harchaoui Z. (2019). A Kernel Multiple Change-point Algorithm via Model Selection. J. Mach. Learn. Res., 20, 162:1-162:56.
CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA KOLMOGOROV-SMIRNOV STATISTIC)
Description
An algorithm for multiple change point analysis that uses dynamic programming and pruning. The Kolmogorov-Smirnov statistic is used as the goodness-of-fit measure.
Usage
ks.cp3o(Z, K=1, minsize=30, verbose=FALSE)
Arguments
Z |
A T x d matrix containing the length T time series with d-dimensional observations. |
K |
The maximum number of change points. |
minsize |
The minimum segment size. |
verbose |
A flag indicating if status updates should be printed. |
Details
Segmentations are found through the use of dynamic programming and pruning. For long time series, consider using ks.cp3o_delta.
Value
The returned value is a list with the following components.
number |
The estimated number of change points. |
estimates |
The location of the change points estimated by the procedure. |
gofM |
A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on. |
cpLoc |
The list of locations of change points estimated by the procedure for different numbers of change points up to K. |
time |
The total amount to time take to estimate the change point locations. |
Author(s)
Wenyu Zhang
References
W. Zhang, N. A. James and D. S. Matteson, "Pruning and Nonparametric Multiple Change Point Detection," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, 2017, pp. 288-295.
See Also
Kifer D., Ben-David S., Gehrke J. (2004). Detecting change in data streams. International Conference on Very Large Data Bases.
Examples
set.seed(400)
x = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y = ks.cp3o(Z=x, K=7, minsize=30, verbose=FALSE)
#View estimated change point locations
y$estimates
CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA KOLMOGOROV-SMIRNOV STATISTIC)
Description
An algorithm for multiple change point analysis that uses dynamic programming and pruning. The Kolmogorov-Smirnov statistic is used as the goodness-of-fit measure.
Usage
ks.cp3o_delta(Z, K=1, minsize=30, verbose=FALSE)
Arguments
Z |
A T x d matrix containing the length T time series with d-dimensional observations. |
K |
The maximum number of change points. |
minsize |
The minimum segment size. This is also the window size used to calculate between-segment distances. |
verbose |
A flag indicating if status updates should be printed. |
Details
Segmentations are found through the use of dynamic programming and pruning. Between-segment distances are calculated only using points within a window of the segmentation point.
Value
The returned value is a list with the following components.
number |
The estimated number of change points. |
estimates |
The location of the change points estimated by the procedure. |
gofM |
A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on. |
cpLoc |
The list of locations of change points estimated by the procedure for different numbers of change points up to K. |
time |
The total amount to time take to estimate the change point locations. |
Author(s)
Wenyu Zhang
References
W. Zhang, N. A. James and D. S. Matteson, "Pruning and Nonparametric Multiple Change Point Detection," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, 2017, pp. 288-295.
See Also
Kifer D., Ben-David S., Gehrke J. (2004). Detecting change in data streams. International Conference on Very Large Data Bases.
Examples
set.seed(400)
x = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y = ks.cp3o_delta(Z=x, K=7, minsize=30, verbose=FALSE)
#View estimated change point locations
y$estimates
PERMUTE CLUSTERS
Description
Permutes time series observations within specified segments.
Usage
perm.cluster(D, points)
Arguments
D |
A n by n distance matrix. |
points |
The set of current change points. |
Details
Called by the e.divisive method, and should not be called by the user.
Value
Returns the n by n distance matrix for the permuted data.
Author(s)
Nicholas A. James
References
James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Rizzo ML, Szekely GL (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.
See Also
PROCESS DATA
Description
Initializes components necessary to perform agglomerative analysis.
Usage
process.data(member, X, alpha)
Arguments
member |
Segment membership vector for the time series. |
X |
A matrix containing the time series with observations in R^d. |
alpha |
Index used for determining the distance between and within segments. |
Details
Called by the e.agglo method, and should not be called by the user.
Value
Returns a list with the following components.
gof |
Vector showing the progression of the goodness of fit statistic. |
list |
Matrix showing the progression of the set of change points. |
N |
Number of initial segments. |
sizes |
Sizes of each segment during the agglomerative process. |
right |
Vector containing indices of the right adjacent segments. |
left |
Vector containing indices of the left adjacent segments. |
open |
Vector indicating if a segment has been merged. |
D |
Matrix of distances between segments. |
lm |
Vector containing indices of the starting point of a segment. |
Author(s)
Nicholas A. James
See Also
SIGNIFICANCE TEST
Description
Performs a permutation test.
Usage
sig.test(D, R, changes, min.size, obs, env=emptyenv())
Arguments
D |
A n by n distnace matrix. |
R |
The number of permutations to use in the permutation test. |
changes |
The set of current change points. |
min.size |
Minimum number of observations between change points. |
obs |
Test statistic value for non-permuted data. |
env |
Environment with information used to reduce computational time. |
Details
Called by the e.divisive method, and should not be called by the user.
Value
The returned value is the approximate p-value obtained by the permutation test. The permutaiton test is performed using the method outlined in Gandy (2009).
Author(s)
Nicholas A. James
References
Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Gandy, A. (2009) "Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk." Journal of the American Statistical Association.
Rizzo M.L., Szekely G.L. (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.
Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.
See Also
SPLIT POINT
Description
Finds the most likely location of a change point within a given segment.
Usage
splitPoint(start, end, D, min.size)
Arguments
start |
the index of the first observation in a segment. |
end |
The index of the last observation in a segment. |
D |
A n by n distance matrix. |
min.size |
Minimum number of observations between change points. |
Details
Called by the e.divisive method, and should not be called by the user.
Value
The returned value is a vector. The first component is the most likely position of a change point. The second component is the distance between the segments created by this proposed change point.
Author(s)
Nicholas A. James
References
Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Rizzo M.L., Szekely G.L. (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.
Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.
See Also
SPLIT POINT-C
Description
C++ function that is called by splitPoint() to perform calculations.
Usage
splitPointC(s_, e_, D_, min_size_)
Arguments
s_ |
Index of the first obervation in a segment. |
e_ |
Index of the last observation in a segment. |
D_ |
A distance matrix. |
min_size_ |
The minimum segment size. |
Details
As with the splitPoint method, this method should not be calle by the user.
Value
Returns a vector. The first component is the most likely position of a change point. The second component is the distance between the segments created by this proposed change point.
Author(s)
Nicholas A. James
References
James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.
Rizzo ML, Szekely GL (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.
See Also
UPDATE DISTANCE
Description
Update the distance between the newly created segment and the other segments.
Usage
updateDistance(i,j,K,ret)
Arguments
i |
The segment that makes up the left portion of the new segment. |
j |
The segment that makes up the right portion of the new segment. |
K |
Integer indicating the progress of the agglomerative process. |
ret |
A list with 'gof', 'list', 'N', 'sizes', 'right', 'left', 'open', 'D', and 'lm' components. |
Details
This method is called by the e.agglomerative method, and should not be called by the user.
Value
Returns a list with the following components.
gof |
Vector showing the progression of the goodness of fit statistic. |
list |
Matrix showing the progression of the set of change points. |
N |
Number of initial segments. |
sizes |
Sizes of each segment during the agglomerative process. |
right |
Vector containing indices of the right adjacent segments. |
left |
Vector containing indices of the left adjacent segments. |
open |
Vector indicating if a segment has been merged. |
D |
Matrix of distances between segments. |
lm |
Vector containing indices of the starting point of a segment. |
Author(s)
Nicholas A. James
References
James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.
Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"
Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.