Type: | Package |
Title: | A 'Shiny' App for Exploration of Text Collections |
Version: | 0.9.0 |
Description: | Facilitates dynamic exploration of text collections through an intuitive graphical user interface and the power of regular expressions. The package contains 1) a helper function to convert a data frame to a 'corporaexplorerobject' and 2) a 'Shiny' app for fast and flexible exploration of a 'corporaexplorerobject'. The package also includes demo apps with which one can explore Jane Austen's novels and the State of the Union Addresses (data from the 'janeaustenr' and 'sotu' packages respectively). |
Depends: | R (≥ 3.0.0) |
Imports: | data.table, dplyr, ggplot2, lubridate, magrittr, padr, plyr, RColorBrewer, re2, rlang, rmarkdown, scales, shiny, shinydashboard, shinyjs, shinyWidgets, stringi, stringr, tibble, tidyr |
Suggests: | janeaustenr, shinytest2, sotu, testthat (≥ 3.0.0) |
License: | GPL-3 | file LICENSE |
Date/Publication: | 2024-09-02 14:50:02 UTC |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
URL: | https://kgjerde.github.io/corporaexplorer/, https://github.com/kgjerde/corporaexplorer |
BugReports: | https://github.com/kgjerde/corporaexplorer/issues |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-09-02 14:23:02 UTC; kristianlg |
Author: | Kristian Lundby Gjerde
|
Maintainer: | Kristian Lundby Gjerde <kristian.gjerde@gmail.com> |
Repository: | CRAN |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Deprecated functions in package corporaexplorer
Description
The functions listed below are deprecated and will be defunct in the near future.
Usage
run_corpus_explorer(...)
Arguments
... |
For |
run_corpus_explorer
For run_corpus_explorer
, use explore
.
Create a data frame with State of the Union texts and metadata
Description
From the "sotu" package.
Usage
create_sotu_df()
Value
data frame
Create test_data
Description
Create test_data
Usage
create_test_data()
Value
A corporaexplorerobject used for testing
Demo app: Jane Austen's novels
Description
run_janeausten_app()
is a convenience function to directly
run the demo app without first creating
a corporaexplorerobject.
Equals explore(create_janeausten_app())
.
Interrupt R to stop the
application (usually by pressing Ctrl+C or Esc).
Usage
run_janeausten_app(...)
create_janeausten_app()
Arguments
... |
Arguments passed to |
Details
The demo app's data are Jane Austen's six novels, retrieved through the "janeaustenr" package (https://github.com/juliasilge/janeaustenr) – which must be installed for these functions to work – and converted to a corporaexplorerobject as shown at https://kgjerde.github.io/corporaexplorer/articles/jane_austen.html.
Value
run_janeausten_app()
launches a Shiny app. create_janeausten_app()
returns
a corporaexplorerobject.
Examples
## Create corporaexplorerobject for demo app:
jane_austen <- create_janeausten_app()
if(interactive()){
## Run the corporaexplorerobject:
explore(jane_austen)
## Or create and run the demo app in one step:
run_janeausten_app()
}
Demo apps: State of the Union addresses
Description
Two demo apps exploring the United States Presidential State of the Union addresses. The data are provided by the sotu package, and include all addresses through 2016. Interrupt R to stop the application (usually by pressing Ctrl+C or Esc).
Usage
run_sotu_app(...)
create_sotu_app()
run_sotu_decade_app(...)
create_sotu_decade_app()
Arguments
... |
Arguments passed to |
Details
For details, see https://kgjerde.github.io/corporaexplorer/articles/sotu.html.
Value
The run_sotu_*
functions launch a Shiny app.
The create_sotu_*
functions return a corporaexplorerobject
.
Launch Shiny app for exploration of text collection
Description
Launch Shiny app for exploration of text collection. Interrupt R to stop the application (usually by pressing Ctrl+C or Esc).
explore()
explores a 'corporaexplorerobject'
created with the prepare_data()
function.
App settings optionally specified in
the arguments to explore()
.
explore0()
is a convenience function to directly explore
a data frame or character vector
without first creating a corporaexplorerobject using
prepare_data()
, instead creating one on the fly as the app
launches.
Functionally equivalent to
explore(prepare_data(dataset, use_matrix = FALSE))
.
Usage
explore(
corpus_object,
search_options = list(),
ui_options = list(),
search_input = list(),
plot_options = list(),
...
)
explore0(
dataset,
arguments_prepare_data = list(use_matrix = FALSE),
arguments_explore = list()
)
Arguments
corpus_object |
A corporaexplorerobject created by
|
search_options |
List. Specify how search operations in the app are carried out. Available options:
|
ui_options |
List. Specify custom app settings (see example below). Currently available:
|
search_input |
List. Gives the opportunity to pre-populate the following sidebar fields (see example below):
|
plot_options |
List. Specify custom plot settings (see example below). Currently available:
|
... |
Other arguments passed to |
dataset |
Data frame or character vector as specified in |
arguments_prepare_data |
List. Arguments to be passed to
|
arguments_explore |
List. Arguments to be passed to
|
Details
For explore0()
:
by default, no document term matrix will be generated,
meaning that the data will be prepared for exploration faster than
by using the default settings in prepare_data()
,
but also that searches in the app are likely to be slower.
Value
Launches a Shiny app.
Examples
# Constructing test data frame:
dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-"))
texts <- paste0(
"This is a document about ", month.name[1:10], ". ",
"This is not a document about ", rev(month.name[1:10]), "."
)
titles <- paste("Text", 1:10)
test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles)
# Converting to corporaexplorerobject:
corpus <- prepare_data(test_df, corpus_name = "Test corpus")
if(interactive()){
# Running exploration app:
explore(corpus)
explore(corpus,
search_options = list(optional_info = TRUE),
ui_options = list(font_size = "10px"),
search_input = list(search_terms = c("Tottenham", "Spurs")),
plot_options = list(max_docs_in_wall_view = 12001,
colours = c("gray", "green")))
# Running app to extract documents:
run_document_extractor(corpus)
}
if (interactive()) {
explore0(rep(sample(LETTERS), 10))
explore0(rep(sample(LETTERS), 10),
arguments_explore = list(search_input = list(search_terms = "Z"))
)
}
Retrieve the document data frame from a corporaexplorerobject
Description
Retrieve the document data frame from a corporaexplorerobject
Usage
get_df(x, make_normal = TRUE)
Arguments
x |
corporaexplorerobject |
Value
data.frame
Split up returned list from matrix_via_r()
Description
Split up returned list from matrix_via_r()
Usage
get_matrix(returned_list)
Arguments
returned_list |
Returned list from matrix_via_r() |
Value
Document term matrix (data.table).
Split up returned list from matrix_via_r()
Description
Split up returned list from matrix_via_r()
Usage
get_term_vector(returned_list)
Arguments
returned_list |
Returned list from matrix_via_r() |
Value
Word vector (character vector).
Values for custom UI sidebar checkbox filtering
Description
Values for custom UI sidebar checkbox filtering
Usage
include_columns_for_ui_checkboxes(new_df, columns_for_ui_checkboxes = NULL)
Arguments
new_df |
A "data_dok" tibble produced by |
columns_for_ui_checkboxes |
Character. Character or factor column(s) in dataset.
Include sets of checkboxes in the app sidebar for
convenient filtering of corpus.
Typical useful for columns with a small set of unique
(and short) values.
Checkboxes will be arranged by |
Value
List: column_names; values. Or NULL.
Create document term matrix for fast search of single words
Description
The characters removed
Usage
matrix_via_r(df, matrix_without_punctuation = TRUE)
Arguments
df |
A "data_dok" tibble |
matrix_without_punctuation |
Should punctuation and digits be stripped
from the text before constructing the document term matrix? If
If |
Value
List: 1) Document term matrix (data.table), 2) word vector (character vector).
Prepare data for corpus exploration
Description
Convert data frame or character vector to a ‘corporaexplorerobject’ for subsequent exploration.
Usage
prepare_data(dataset, ...)
## S3 method for class 'data.frame'
prepare_data(
dataset,
date_based_corpus = TRUE,
text_column = "Text",
grouping_variable = NULL,
within_group_identifier = "sequential",
columns_doc_info = c("Date", "Title", "URL"),
corpus_name = NULL,
use_matrix = TRUE,
matrix_without_punctuation = TRUE,
tile_length_range = c(1, 10),
columns_for_ui_checkboxes = NULL,
...
)
## S3 method for class 'character'
prepare_data(
dataset,
corpus_name = NULL,
use_matrix = TRUE,
matrix_without_punctuation = TRUE,
...
)
Arguments
dataset |
Object to convert to corporaexplorerobject:
|
... |
Other arguments to be passed to |
date_based_corpus |
Logical. Set to |
text_column |
Character. Default: "Text".
The column in |
grouping_variable |
Character string indicating column name in dataset. If date_based_corpus is TRUE, this argument is ignored. If date_based_corpus is FALSE, this argument is used to group the documents, e.g., if dataset is organised by chapters belonging to different books. The order of groups in the app is determined as follows:
|
within_group_identifier |
Character string indicating column name in |
columns_doc_info |
Character vector. The columns from |
corpus_name |
Character string with name of corpus. |
use_matrix |
Logical. Should the function create a document term matrix
for fast searching? If |
matrix_without_punctuation |
Should punctuation and digits be stripped
from the text before constructing the document term matrix? If
If |
tile_length_range |
Numeric vector of length two.
Fine-tune the tile lengths in document wall
and day corpus view. Tile length is calculated by
|
columns_for_ui_checkboxes |
Character. Character or factor column(s) in dataset.
Include sets of checkboxes in the app sidebar for
convenient filtering of corpus.
Typical useful for columns with a small set of unique
(and short) values.
Checkboxes will be arranged by |
Details
For data.frame: Each row in dataset
is treated as a base differentiating unit in the corpus,
typically chapters in books, or a single document in document collections.
The following column names are reserved and cannot be used in dataset
:
"Date_",
"cx_ID",
"Text_original_case",
"Text_column_",
"Tile_length",
"Year_",
"cx_Seq",
"Weekday_n",
"Day_without_docs",
"Invisible_fake_date",
"Tile_length".
A character vector will be converted to a simple corporaexplorerobject with no metadata.
Value
A corporaexplorer
object to be passed as argument to
explore
and
run_document_extractor
.
Examples
## From data.frame
# Constructing test data frame:
dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-"))
texts <- paste0(
"This is a document about ", month.name[1:10], ". ",
"This is not a document about ", rev(month.name[1:10]), "."
)
titles <- paste("Text", 1:10)
test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles)
# Converting to corporaexplorerobject:
corpus <- prepare_data(test_df, corpus_name = "Test corpus")
if(interactive()){
# Running exploration app:
explore(corpus)
# Running app to extract documents:
run_document_extractor(corpus)
}
## From character vector
alphabet_corpus <- prepare_data(LETTERS)
if(interactive()){
# Running exploration app:
explore(alphabet_corpus)
}
Print corporaexplorerobject
Description
Print corporaexplorerobject
Usage
## S3 method for class 'corporaexplorerobject'
print(x, ...)
Arguments
x |
A corporaexplorerobject |
Value
Console-friendly output
Deprecated: run_corpus_explorer()
Description
Deprecated. Use explore()
instead.
See Also
Launch Shiny app for retrieval of documents from text collection
Description
This function will be removed in a future version of corporexplorer.
Usage
run_document_extractor(corpus_object, max_html_docs = 400, ...)
Arguments
corpus_object |
A |
max_html_docs |
The maximum number of documents allowed in one HTML report. |
... |
Other arguments passed to |
Details
Shiny app for simple retrieval/extraction of documents from a "corporaexplorerobject" in a reading-friendly format. Interrupt R to stop the application (usually by pressing Ctrl+C or Esc).
Examples
# Constructing test data frame:
dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-"))
texts <- paste0(
"This is a document about ", month.name[1:10], ". ",
"This is not a document about ", rev(month.name[1:10]), "."
)
titles <- paste("Text", 1:10)
test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles)
# Converting to corporaexplorer object:
corpus <- prepare_data(test_df, corpus_name = "Test corpus")
if(interactive()){
# Running exploration app:
explore(corpus)
# Running app to extract documents:
run_document_extractor(corpus)
}
A tiny test dataset to test basic functionality
Description
Created by corporaexplorer:::create_test_data()
.
Usage
test_data
Format
A corporaexplorerobject.
Convert "data_dok" tibble to "data_365" tibble
Description
Convert "data_dok" tibble to "data_365" tibble
Usage
transform_365(new_df)
Arguments
new_df |
A "data_dok" tibble produced by |
Value
A "data_365" tibble.
Adjusts data frame to corporaexplorer format
Description
Adjusts data frame to corporaexplorer format
Usage
transform_regular(df, tile_length_range = c(1, 10))
Arguments
df |
Data frame with text column (character), Date column (Date) (if date based corpus), and optionally other columns. |
tile_length_range |
Numeric vector of length two.
Fine-tune the tile lengths in document wall
and day corpus view. Tile length is calculated by
|
Value
A tibble ("data_dok")