---
title: "Other Basis Expansions and Embeddings"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Other Basis Expansions and Embeddings}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

Other R packages may be useful for generating basis expansions for certain kinds of data.
This page lists several options.

## Categorical predictors

- The [`embed`](https://embed.tidymodels.org/) package provides several methods for encoding categorical predictors based on their relationship with an outcome variable. 
  Users should note that this creates a feedback effect, where the outcome variable is used to define the predictors, which may cause problems in certain statistical workflows.

## Continuous predictors

- The [`embed`](https://embed.tidymodels.org/) package allows for discretizing continuous variables based on their relationship with an outcome variable, using CART, and extracting PCA components from a set of numerical predictors. 
  Users should note that `ridge()` regression automatically shrinks lower-variance PCA components more than higher-variance components, and providing normalized PCA components to a predictive model may lead to unintuitive results in some cases.
  Selecting a sparse subset of principal components may be useful, however.
  
## Text data

- The [`conText`](https://github.com/prodriguezsosa/conText) package estimates context-specific word and document embeddings.

- The [`text2vec`](https://text2vec.org/) package provides a number of tools to convert text to numeric vectors, including fitting custom GloVe models and topic modeling, and is designed to handle large-scale data.

- The [`text2map`](https://CRAN.R-project.org/package=text2map) and its accompanying [`text2map.pretrained`](https://culturalcartography.gitlab.io/text2map.pretrained/) package (not on CRAN) provides access to a number of pre-trained word embeddings.
  
## Images and audio data 

- The [`torchvision`](https://mlverse.github.io/torchvision/) package provides functions for various transformations of image data.
  It also provides access to pre-trained models from which image embeddings may be extracted.
  
- The [`torchaudio`](https://mlverse.github.io/torchvision/) package provides functions for various transformations of audio data, including a number of spectral transformations.