Type: Package
Title: Non-Parametric Multiple Change-Point Analysis of Multivariate Data
Version: 3.1.6
Date: 2024-8-25
Maintainer: Wenyu Zhang <wz258@cornell.edu>
Description: Implements various procedures for finding multiple change-points from Matteson D. et al (2013) <doi:10.1080/01621459.2013.849605>, Zhang W. et al (2017) <doi:10.1109/ICDMW.2017.44>, Arlot S. et al (2019). Two methods make use of dynamic programming and pruning, with no distributional assumptions other than the existence of certain absolute moments in one method. Hierarchical and exact search methods are included. All methods return the set of estimated change- points as well as other summary information.
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Depends: R (≥ 3.00), Rcpp
Suggests: mvtnorm, MASS, combinat, R.rsp
LinkingTo: Rcpp
NeedsCompilation: yes
Repository: CRAN
VignetteBuilder: R.rsp
Packaged: 2024-08-25 17:13:46 UTC; wenyuzhang
Author: Nicholas A. James [aut], Wenyu Zhang [aut, cre], David S. Matteson [aut]
Date/Publication: 2024-08-26 05:50:02 UTC

Bladder Tumor Micro-Array Data

Description

Micro-array data for 43 different individuals with a bladder tumor.

Usage

data(ACGH)

Format

A list with the following components.

data: The micro-array data for 43 individuals. This information is stored in a 2215 by 43 matrix.

individual: A numeric vector indicating which individuals' mico-array data are present.

Source

Bleakley K., Vert J.-P. (2011), The group fused Lasso for multiple change-point detection

N. Stransky, C. Vallot, F. Reyal, I. Bernard-Pierrot, S.G. Diez de Mediana, R. Segraves, Y. de Rycke, P. Elvin, A. Cassidy, C. Sparaggon, A. Graham, j. Southgate, B. Asselain, Y. Allory, C. C. Addou, D. G. Albertson, J.-P. Thiery, D. K. Chopin, D. Pinkel, and F. Radvanyi. Regional copy number-independent deregulation of transcription in cancer. Nat. Genet., 38(12):1386-1396, Dec 2006

References

Bleakley K., Vert J.-P. (2011), The group fused Lasso for multiple change-point detection

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Examples

data(ACGH, package="ecp")

Dow Jones Industrial Average Index

Description

The weekly log returns for the Dow Jones Industrial Average index from April 1990 to January 2012.

Usage

data(DJIA)

Format

A list with the following components.

dates: A character vector of dates associated with each observation in the returns series.

index: Weekly log returns from April 1990 to January 2012 of the DOW 30 index.

market: Weekly log returns from April 1990 to January 2012, for the companies in the DOW 30 apart from Kraft.

Source

http://research.stlouisfed.org/fred2/series/DJIA/downloaddata

References

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Examples

data(DJIA, package="ecp")

ENERGY AGGLOMERATIVE

Description

An agglomerative hierarchical estimation algorithm for multiple change point analysis.

Usage

e.agglo(X, member=1:nrow(X), alpha=1, penalty=function(cps){0})

Arguments

X

A T x d matrix containing the length T time series with d-dimensional observations.

member

Initial membership vector for the time series.

alpha

Moment index used for determining the distance between and within clusters.

penalty

Function used to penalize the obtained goodness-of-fit statistics. This function takes as its input a vector of change point locations (cps).

Details

Homogeneous clusters are created based on the initial clustering provided by the member argument. In each iteration, clusters are merged so as to maximize a goodness-of-fit statistic. The computational complexity of this method is O(T^2), where T is the number of observations.

Value

Returns a list with the following components.

merged

A (T-1) x 2 matrix indicating which segments were merged at each step of the agglomerative procedure.

fit

Vector showing the progression of the penalized goodness-of-fit statistic.

progression

A T x (T+1) matrix showing the progression of the set of change points.

cluster

The estimated cluster membership vector.

estimates

The location of the estimated change points.

Author(s)

Nicholas A. James

References

Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

See Also

e.divisive

Rizzo M.L., Szekely G.L. (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.

Examples

set.seed(100)
mem = rep(c(1,2,3,4),times=c(10,10,10,10))
x = as.matrix(c(rnorm(10,0,1),rnorm(20,2,1),rnorm(10,-1,1)))
y = e.agglo(X=x,member=mem,alpha=1,penalty=function(cp,Xts) 0)
y$estimates


## Not run: 
# Multivariate spatio-temporal example
# You will need the following packages:
#	mvtnorm, combinat, and MASS
library(mvtnorm); library(combinat); library(MASS)
set.seed(2013)
lambda = 1500 #overall arrival rate per unit time
muA = c(-7,-7) ; muB = c(0,0) ; muC = c(5.5,0)
covA = 25*diag(2)
covB = matrix(c(9,0,0,1),2)
covC = matrix(c(9,.9,.9,9),2)
time.interval = matrix(c(0,1,3,4.5,1,3,4.5,7),4,2)
#mixing coefficents
mixing.coef = rbind(c(1/3,1/3,1/3),c(.2,.5,.3), c(.35,.3,.35), 
	c(.2,.3,.5))
stppData = NULL
for(i in 1:4){
	count = rpois(1, lambda* diff(time.interval[i,]))
	Z = rmultz2(n = count, p = mixing.coef[i,])
	S = rbind(rmvnorm(Z[1],muA,covA), rmvnorm(Z[2],muB,covB),
		rmvnorm(Z[3],muC,covC))
	X = cbind(rep(i,count), runif(n = count, time.interval[i,1],
		time.interval[i,2]), S)
	stppData = rbind(stppData, X[order(X[,2]),])
}
member = as.numeric(cut(stppData[,2], breaks = seq(0,7,by=1/12)))
output = e.agglo(X=stppData[,3:4],member=member,alpha=1,
	penalty=function(cp,Xts) 0)

## End(Not run)

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA E-STATISTIC)

Description

An algorithm for multiple change point analysis that uses dynamic programming and pruning. The E-statistic is used as the goodness-of-fit measure.

Usage

e.cp3o(Z, K=1, minsize=30, alpha=1, verbose=FALSE)

Arguments

Z

A T x d matrix containing the length T time series with d-dimensional observations.

K

The maximum number of change points.

minsize

The minimum segment size.

alpha

The moment index used for determining the distance between and within segments.

verbose

A flag indicating if status updates should be printed.

Details

Segmentations are found through the use of dynamic programming and pruning. For long time series, consider using e.cp3o_delta.

Value

The returned value is a list with the following components.

number

The estimated number of change points.

estimates

The location of the change points estimated by the procedure.

gofM

A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on.

cpLoc

The list of locations of change points estimated by the procedure for different numbers of change points up to K.

time

The total amount to time take to estimate the change point locations.

Author(s)

Nicholas A. James, Wenyu Zhang

References

W. Zhang, N. A. James and D. S. Matteson, "Pruning and Nonparametric Multiple Change Point Detection," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, 2017, pp. 288-295.

See Also

Rizzo M.L., Szekely G.L (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification.

Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics.

Examples

set.seed(400)
x1 = matrix(c(rnorm(50),rnorm(50,3)))
y1 = e.cp3o(Z=x1, K=2, minsize=30, alpha=1, verbose=FALSE)
#View estimated change point locations
y1$estimates

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA E-STATISTIC)

Description

An algorithm for multiple change point analysis that uses dynamic programming and pruning. The E-statistic is used as the goodness-of-fit measure.

Usage

e.cp3o_delta(Z, K=1, delta=29, alpha=1, verbose=FALSE)

Arguments

Z

A T x d matrix containing the length T time series with d-dimensional observations.

K

The maximum number of change points.

delta

The window size used to calculate the calculate the complete portion of our approximate test statistic. This also corresponds to one less than the minimum segment size.

alpha

The moment index used for determining the distance between and within segments.

verbose

A flag indicating if status updates should be printed.

Details

Segmentations are found through the use of dynamic programming and pruning. Between-segment distances are calculated only using points within a window of the segmentation point. The computational complexity of this method is O(KT^2), where K is the maximum number of change points, and T is the number of observations.

Value

The returned value is a list with the following components.

number

The estimated number of change points.

estimates

The location of the change points estimated by the procedure.

gofM

A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on.

cpLoc

The list of locations of change points estimated by the procedure for different numbers of change points up to K.

time

The total amount to time take to estimate the change point locations.

Author(s)

Nicholas A. James, Wenyu Zhang

References

W. Zhang, N. A. James and D. S. Matteson, "Pruning and Nonparametric Multiple Change Point Detection," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, 2017, pp. 288-295.

See Also

Rizzo M.L., Szekely G.L (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification.

Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics.

Examples

set.seed(400)
x1 = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y1 = e.cp3o_delta(Z=x1, K=7, delta=29, alpha=1, verbose=FALSE)
#View estimated change point locations
y1$estimates

ENERGY DIVISIVE

Description

A divisive hierarchical estimation algorithm for multiple change point analysis.

Usage

e.divisive(X, sig.lvl=.05, R=199, k=NULL, min.size=30, alpha=1)

Arguments

X

A T x d matrix containing the length T time series with d-dimensional observations.

sig.lvl

The level at which to sequentially test if a proposed change point is statistically significant.

R

The maximum number of random permutations to use in each iteration of the permutation test. The permutation test p-value is calculated using the method outlined in Gandy (2009).

k

Number of change point locations to estimate, suppressing permutation based testing. If k=NULL then only the statistically significant estimated change points are returned.

min.size

Minimum number of observations between change points.

alpha

The moment index used for determining the distance between and within segments.

Details

Segments are found through the use of a binary bisection method and a permutation test. The computational complexity of this method is O(kT^2), where k is the number of estimated change points, and T is the number of observations.

Value

The returned value is a list with the following components.

k.hat

The number of clusters within the data created by the change points.

order.found

The order in which the change points were estimated.

estimates

Locations of the statistically significant change points.

considered.last

Location of the last change point, that was not found to be statistically significant at the given significance level.

permutations

The number of permutations performed by each of the sequential permutation test.

cluster

The estimated cluster membership vector.

p.values

Approximate p-values estimated from each permutation test.

Author(s)

Nicholas A. James

References

Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

See Also

e.agglo

Gandy, A. (2009) "Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk." Journal of the American Statistical Association.

Rizzo M.L., Szekely G.L (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification.

Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics.

Examples

## Not run: 
set.seed(100)
x1 = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y1 = e.divisive(X=x1,sig.lvl=0.05,R=199,k=NULL,min.size=30,alpha=1)
x2 = rbind(MASS::mvrnorm(100,c(0,0),diag(2)),
	MASS::mvrnorm(100,c(2,2),diag(2)))
y2 = e.divisive(X=x2,sig.lvl=0.05,R=499,k=NULL,min.size=30,alpha=1)

## End(Not run)

ENERGY SPLIT

Description

Finds the most likely location for a change point across all current clusters.

Usage

	e.split(changes, D, min.size, for.sim=FALSE, env=emptyenv())

Arguments

changes

A vector containing the current set of change points.

D

An n by n distance matrix.

min.size

Minimum number of observations between change points.

for.sim

Boolean value indicating if the function is to be run on permuted data for significance testing.

env

Environment that contains information to help reduce computational time.

Details

This method is called by the e.divisive method, and should not be called by the user.

Value

A list with the following components is returned.

first

The index of the first element of the cluster to be divided.

second

The index of the last element of the cluster to be divided.

third

The new set of change points.

fourth

The distance between the clusters created by the newly proposed change point.

Author(s)

Nicholas A. James

References

Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Rizzo M.L., Szekely G.L. (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.

See Also

e.divisive


Internal Energy Change Point Functions

Description

Internal Energy Change Point functions.

Details

These are not to be called by the user.


FIND CLOSEST CLUSTERS

Description

Determines which two segments to merge.

Usage

find.closest(K, ret)

Arguments

K

Integer indicating the progress of the agglomerative process.

ret

A list with 'open', 'N', and 'right' components

Details

This method is called by the e.agglomerative method, and should not be called by the user.

Value

Returns a vector with 3 components. The first two indicate which segments are to be merged. The third is the new goodness of fit statistics.

Author(s)

Nicholas A. James

References

James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

See Also

e.agglo


GET BETWEEN DISTANCE

Description

Returns the energy distance between two sets of numerical data.

Usage

getBetween(alpha_, X_, Y_)

Arguments

alpha_

A weighting parameter used for calculating the energy distance. This value should be in (0,2].

X_

A n by d matrix of the n d-dimensional observations.

Y_

A m by d matrix of the m d-dimensional observations.

Details

The matrices X_ and Y_ do not need to have the same number of rows, but they do require the same number of columns.

Value

The returned value is a real number indicating the energy distance between the two data sets.

Author(s)

Nicholas A. James

References

James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

Rizzo ML, Szekely GL (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.

See Also

e.agglo e.divisive

Examples

set.seed(100)
X = matrix(rnorm(100),ncol=2)
Y = matrix(rnorm(126,1),ncol=2)
alpha = 1
between.distance = getBetween(alpha,X,Y)

GET WITHIN DISTANCE

Description

Calculate the energy distance within a data set.

Usage

getWithin(alpha_, X_)

Arguments

alpha_

A weighting parameter used for calculating the energy distance. This value should be in (0,2].

X_

A n by d matrix of the n d-dimensional observations.

Value

The returned value is a real number indicating the energy distance within the given data set.

Author(s)

Nicholas A. James

References

James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

Rizzo ML, Szekely GL (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.

See Also

e.agglo e.divisive

Examples

set.seed(100)
X = matrix(rnorm(150),ncol=2)
alpha = 1
distance = getWithin(alpha,X)

GOODNESS OF FIT UPDATE

Description

Updates the goodness of fit statistic.

Usage

gof.update(i,ret)

Arguments

i

segments which is to be merged with the segment that is adjacent to its right.

ret

A list with 'gof', 'right', 'left', 'D', and 'size' comonents.

Details

Called by the e.agglo method, and should not be called by the user.

Value

Returns a real number. This is the updated goodness of fit statistic.

Author(s)

Nicholas A. James

References

James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

See Also

e.agglo


Kernel Change Point Analysis

Description

An algorithm for multiple change point analysis that uses the 'kernel trick' and dynamic programming.

Usage

kcpa(X, L, C)

Arguments

X

A T x d matrix containing the length T time series with d-dimensional observations.

L

The maximum number of change points.

C

The constant used to penalize the inclusion of additional change points in the fitted model.

Details

Segments are found through the use of dynamic programming and the kernel trick.

Value

If the algorithm determines that the best fit is obtained through using k change points then the returned value is an array of length k, containing the change point locations.

Author(s)

Nicholas A. James

References

Arlot S., Celisse A., Harchaoui Z. (2019). A Kernel Multiple Change-point Algorithm via Model Selection. J. Mach. Learn. Res., 20, 162:1-162:56.


CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA KOLMOGOROV-SMIRNOV STATISTIC)

Description

An algorithm for multiple change point analysis that uses dynamic programming and pruning. The Kolmogorov-Smirnov statistic is used as the goodness-of-fit measure.

Usage

ks.cp3o(Z, K=1, minsize=30, verbose=FALSE)

Arguments

Z

A T x d matrix containing the length T time series with d-dimensional observations.

K

The maximum number of change points.

minsize

The minimum segment size.

verbose

A flag indicating if status updates should be printed.

Details

Segmentations are found through the use of dynamic programming and pruning. For long time series, consider using ks.cp3o_delta.

Value

The returned value is a list with the following components.

number

The estimated number of change points.

estimates

The location of the change points estimated by the procedure.

gofM

A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on.

cpLoc

The list of locations of change points estimated by the procedure for different numbers of change points up to K.

time

The total amount to time take to estimate the change point locations.

Author(s)

Wenyu Zhang

References

W. Zhang, N. A. James and D. S. Matteson, "Pruning and Nonparametric Multiple Change Point Detection," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, 2017, pp. 288-295.

See Also

Kifer D., Ben-David S., Gehrke J. (2004). Detecting change in data streams. International Conference on Very Large Data Bases.

Examples

set.seed(400)
x = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y = ks.cp3o(Z=x, K=7, minsize=30, verbose=FALSE)
#View estimated change point locations
y$estimates

CHANGE POINTS ESTIMATION BY PRUNED OBJECTIVE (VIA KOLMOGOROV-SMIRNOV STATISTIC)

Description

An algorithm for multiple change point analysis that uses dynamic programming and pruning. The Kolmogorov-Smirnov statistic is used as the goodness-of-fit measure.

Usage

ks.cp3o_delta(Z, K=1, minsize=30, verbose=FALSE)

Arguments

Z

A T x d matrix containing the length T time series with d-dimensional observations.

K

The maximum number of change points.

minsize

The minimum segment size. This is also the window size used to calculate between-segment distances.

verbose

A flag indicating if status updates should be printed.

Details

Segmentations are found through the use of dynamic programming and pruning. Between-segment distances are calculated only using points within a window of the segmentation point.

Value

The returned value is a list with the following components.

number

The estimated number of change points.

estimates

The location of the change points estimated by the procedure.

gofM

A vector of goodness of fit values for differing number of change points. The first entry corresponds to when there is only a single change point, the second for when there are two, and so on.

cpLoc

The list of locations of change points estimated by the procedure for different numbers of change points up to K.

time

The total amount to time take to estimate the change point locations.

Author(s)

Wenyu Zhang

References

W. Zhang, N. A. James and D. S. Matteson, "Pruning and Nonparametric Multiple Change Point Detection," 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, 2017, pp. 288-295.

See Also

Kifer D., Ben-David S., Gehrke J. (2004). Detecting change in data streams. International Conference on Very Large Data Bases.

Examples

set.seed(400)
x = matrix(c(rnorm(100),rnorm(100,3),rnorm(100,0,2)))
y = ks.cp3o_delta(Z=x, K=7, minsize=30, verbose=FALSE)
#View estimated change point locations
y$estimates

PERMUTE CLUSTERS

Description

Permutes time series observations within specified segments.

Usage

perm.cluster(D, points)

Arguments

D

A n by n distance matrix.

points

The set of current change points.

Details

Called by the e.divisive method, and should not be called by the user.

Value

Returns the n by n distance matrix for the permuted data.

Author(s)

Nicholas A. James

References

James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Rizzo ML, Szekely GL (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.

See Also

e.divisive


PROCESS DATA

Description

Initializes components necessary to perform agglomerative analysis.

Usage

process.data(member, X, alpha)

Arguments

member

Segment membership vector for the time series.

X

A matrix containing the time series with observations in R^d.

alpha

Index used for determining the distance between and within segments.

Details

Called by the e.agglo method, and should not be called by the user.

Value

Returns a list with the following components.

gof

Vector showing the progression of the goodness of fit statistic.

list

Matrix showing the progression of the set of change points.

N

Number of initial segments.

sizes

Sizes of each segment during the agglomerative process.

right

Vector containing indices of the right adjacent segments.

left

Vector containing indices of the left adjacent segments.

open

Vector indicating if a segment has been merged.

D

Matrix of distances between segments.

lm

Vector containing indices of the starting point of a segment.

Author(s)

Nicholas A. James

See Also

e.agglo


SIGNIFICANCE TEST

Description

Performs a permutation test.

Usage

sig.test(D, R, changes, min.size, obs, env=emptyenv())

Arguments

D

A n by n distnace matrix.

R

The number of permutations to use in the permutation test.

changes

The set of current change points.

min.size

Minimum number of observations between change points.

obs

Test statistic value for non-permuted data.

env

Environment with information used to reduce computational time.

Details

Called by the e.divisive method, and should not be called by the user.

Value

The returned value is the approximate p-value obtained by the permutation test. The permutaiton test is performed using the method outlined in Gandy (2009).

Author(s)

Nicholas A. James

References

Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Gandy, A. (2009) "Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk." Journal of the American Statistical Association.

Rizzo M.L., Szekely G.L. (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.

See Also

e.divisive


SPLIT POINT

Description

Finds the most likely location of a change point within a given segment.

Usage

splitPoint(start, end, D, min.size)

Arguments

start

the index of the first observation in a segment.

end

The index of the last observation in a segment.

D

A n by n distance matrix.

min.size

Minimum number of observations between change points.

Details

Called by the e.divisive method, and should not be called by the user.

Value

The returned value is a vector. The first component is the most likely position of a change point. The second component is the distance between the segments created by this proposed change point.

Author(s)

Nicholas A. James

References

Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Rizzo M.L., Szekely G.L. (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.

See Also

e.divisive


SPLIT POINT-C

Description

C++ function that is called by splitPoint() to perform calculations.

Usage

splitPointC(s_, e_, D_, min_size_)

Arguments

s_

Index of the first obervation in a segment.

e_

Index of the last observation in a segment.

D_

A distance matrix.

min_size_

The minimum segment size.

Details

As with the splitPoint method, this method should not be calle by the user.

Value

Returns a vector. The first component is the most likely position of a change point. The second component is the distance between the segments created by this proposed change point.

Author(s)

Nicholas A. James

References

James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

Rizzo ML, Szekely GL (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.

See Also

splitPoint


UPDATE DISTANCE

Description

Update the distance between the newly created segment and the other segments.

Usage

updateDistance(i,j,K,ret)

Arguments

i

The segment that makes up the left portion of the new segment.

j

The segment that makes up the right portion of the new segment.

K

Integer indicating the progress of the agglomerative process.

ret

A list with 'gof', 'list', 'N', 'sizes', 'right', 'left', 'open', 'D', and 'lm' components.

Details

This method is called by the e.agglomerative method, and should not be called by the user.

Value

Returns a list with the following components.

gof

Vector showing the progression of the goodness of fit statistic.

list

Matrix showing the progression of the set of change points.

N

Number of initial segments.

sizes

Sizes of each segment during the agglomerative process.

right

Vector containing indices of the right adjacent segments.

left

Vector containing indices of the left adjacent segments.

open

Vector indicating if a segment has been merged.

D

Matrix of distances between segments.

lm

Vector containing indices of the starting point of a segment.

Author(s)

Nicholas A. James

References

James NA, Matteson DS (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

Rizzo ML, Szekely GL (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

See Also

e.agglo