Type: | Package |
Title: | Tools for the Continuous Convolution Trick in Nonparametric Estimation |
Version: | 0.1.3 |
Description: | Implements the uniform scaled beta distribution and the continuous convolution kernel density estimator. |
License: | GPL-3 |
Encoding: | UTF-8 |
Imports: | stats, Rcpp (≥ 0.12.5), qrng |
LinkingTo: | Rcpp, RcppArmadillo |
RoxygenNote: | 7.3.2 |
Suggests: | testthat |
NeedsCompilation: | yes |
Packaged: | 2025-03-24 18:13:36 UTC; n5 |
Author: | Thomas Nagler [aut, cre] |
Maintainer: | Thomas Nagler <mail@tnagler.com> |
Repository: | CRAN |
Date/Publication: | 2025-03-24 19:30:05 UTC |
Tools for the continuous convolution trick in nonparametric estimation
Description
Implements the uniform scaled beta distribution dusb()
, a generic function
for continuous convolution cont_conv()
, and the continuous convolution
kernel density estimator cckde()
.
Author(s)
Thomas Nagler
References
Nagler, T. (2017). A generic approach to nonparametric function estimation with mixed data. arXiv:1704.07457
Continuous convolution density estimator
Description
The continuous convolution kernel density estimator is defined as the
classical kernel density estimator based on continuously convoluted data (see
cont_conv()
). cckde()
fits the estimator (including bandwidth selection),
dcckde()
and predict.cckde()
can be used to evaluate the estimator.
Usage
cckde(x, bw = NULL, mult = 1, theta = 0, nu = 5, ...)
dcckde(x, object)
## S3 method for class 'cckde'
predict(object, newdata, ...)
Arguments
x |
a matrix or data frame containing the data (or evaluation points). |
bw |
vector of bandwidth parameter; if |
mult |
bandwidth multiplier; either a positive number or a vector of such. Each bandwidth parameter is multiplied with the corresponding multiplier. |
theta |
scale parameter of the USB distribution (see, |
nu |
smoothness parameter of the USB distribution (see, |
... |
unused. |
object |
|
newdata |
matrix or data frame containing evaluation points. |
Details
If a variable should be treated as ordered discrete, declare it as
ordered()
, factors are expanded into discrete dummy codings.
References
Nagler, T. (2017). A generic approach to nonparametric function estimation with mixed data. arXiv:1704.07457
Examples
# dummy data with discrete variables
dat <- data.frame(
F1 = factor(rbinom(10, 4, 0.1), 0:4),
Z1 = ordered(rbinom(10, 5, 0.5), 0:5),
Z2 = ordered(rpois(10, 1), 0:10),
X1 = rnorm(10),
X2 = rexp(10)
)
fit <- cckde(dat) # fit estimator
dcckde(dat, fit) # evaluate density
predict(fit, dat) # equivalent
Continuous convolution
Description
Applies the continuous convolution trick, i.e. adding continuous noise to all
discrete variables. If a variable should be treated as discrete, declare it
as ordered()
(passed to expand_as_numeric()
).
Usage
cont_conv(x, theta = 0, nu = 5, quasi = TRUE)
Arguments
x |
data; numeric matrix or data frame. |
theta |
scale parameter of the USB distribution (see, |
nu |
smoothness parameter of the USB distribution (see, |
quasi |
logical indicating whether quasi random numbers sholuld be used
( |
Details
The UPSB distribution (dusb()
) is used as the noise distribution.
Discrete variables are assumed to be integer-valued.
Value
A data frame with noise added to each discrete variable (ordered columns).
References
Nagler, T. (2017). A generic approach to nonparametric function estimation with mixed data. arXiv:1704.07457
Examples
# dummy data with discrete variables
dat <- data.frame(
F1 = factor(rbinom(10, 4, 0.1), 0:4),
Z1 = ordered(rbinom(10, 5, 0.5), 0:5),
Z2 = ordered(rpois(10, 1), 0:10),
X1 = rnorm(10),
X2 = rexp(10)
)
pairs(dat)
pairs(expand_as_numeric(dat)) # expanded variables without noise
pairs(cont_conv(dat)) # continuously convoluted data
Uniform scaled beta distribution
Description
The uniform scaled beta (USB) distribution describes the distribution of the random variable
U_{b, \nu} = U + \theta(B - 0.5),
where U
is a U[-0.5, 0.5]
random variable, B
is a
Beta(\nu, \nu)
random variable, and theta > 0, \nu >= 1
.
Usage
dusb(x, theta = 0, nu = 5)
rusb(n, theta = 0, nu = 5, quasi = FALSE)
Arguments
x |
vector of quantiles. |
theta |
scale parameter of the USB distribution. |
nu |
smoothness parameter of the USB distribution. |
n |
number of observations. |
quasi |
logical indicating whether quasi random numbers
( |
References
Nagler, T. (2017). A generic approach to nonparametric function estimation with mixed data. arXiv:1704.07457
Examples
# plot distribution
sq <- seq(-0.8, 0.8, by = 0.01)
plot(sq, dusb(sq), type = "l")
lines(sq, dusb(sq, theta = 0.25), col = 2)
lines(sq, dusb(sq, theta = 0.25, nu = 10), col = 3)
# simulate from the distribution
x <- rusb(100, theta = 0.3, nu = 0)
Numeric model matrix for continuous convolution
Description
Turns ordered variables into integers and expands factors as binary dummy
codes. cont_conv()
additionally adds noise to discrete variables, but this is only
useful for estimation. [cc_prepare()]
can be used to evaluate an already
fitted estimate.
Usage
expand_as_numeric(x)
Arguments
x |
a vector or data frame with numeric, ordered, or factor columns. |
Value
A numeric matrix containing the expanded variables. It has additional
type expanded_as_numeric
and attr(, "i_disc")
cntains the indices of
discrete variables.
Examples
# dummy data with discrete variables
dat <- data.frame(
F1 = factor(rbinom(100, 4, 0.1), 0:4),
Z1 = as.ordered(rbinom(100, 5, 0.5)),
Z2 = as.ordered(rpois(100, 1)),
X1 = rnorm(100),
X2 = rexp(100)
)
pairs(dat)
pairs(expand_as_numeric(dat)) # expanded variables without noise
pairs(cont_conv(dat)) # continuously convoluted data
Expands names for expand_as_numeric
Description
Expands each element according to the factor expansions of columns in
expand_as_numeric()
.
Usage
expand_names(x)
Arguments
x |
as in |
Value
A vector of size ncol(expand_as_numeric(x))
.
Expand a vector like expand_as_numeric
Description
Expands each element according to the factor expansions of columns in
expand_as_numeric()
.
Usage
expand_vec(y, x)
Arguments
y |
a vector of length 1 or |
x |
as in |
Value
A vector of size ncol(expand_as_numeric(x))
.