Type: | Package |
Title: | Fluid Data Transformations |
Version: | 1.2.1 |
Date: | 2023-08-19 |
URL: | https://github.com/WinVector/cdata/, https://winvector.github.io/cdata/ |
Maintainer: | John Mount <jmount@win-vector.com> |
BugReports: | https://github.com/WinVector/cdata/issues |
Description: | Supplies higher-order coordinatized data specification and fluid transform operators that include pivot and anti-pivot as special cases. The methodology is describe in 'Zumel', 2018, "Fluid data reshaping with 'cdata'", https://winvector.github.io/FluidData/FluidDataReshapingWithCdata.html , <doi:10.5281/zenodo.1173299> . This package introduces the idea of explicit control table specification of data transforms. Works on in-memory data or on remote data using 'rquery' and 'SQL' database interfaces. |
License: | GPL-2 | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Depends: | R (≥ 3.4.0), wrapr (≥ 2.0.9) |
Imports: | rquery (≥ 1.4.9), rqdatatable (≥ 1.3.2), methods, stats |
Suggests: | DBI, RSQLite, knitr, rmarkdown, yaml, tinytest |
VignetteBuilder: | knitr |
ByteCompile: | true |
NeedsCompilation: | no |
Packaged: | 2023-08-19 23:37:26 UTC; johnmount |
Author: | John Mount [aut, cre], Nina Zumel [aut], Win-Vector LLC [cph] |
Repository: | CRAN |
Date/Publication: | 2023-08-20 00:02:32 UTC |
cdata
: Fluid Data Transformations.
Description
Supplies implementations of higher order "fluid data" transforms. These transforms move data between rows and columns, are controlled by a graphical transformation specification, and have pivot and un-pivot as special cases. Large scale implementation is based on 'rquery', so should be usable on 'SQL' compliant data sources (include large systems such as 'PostgreSQL' and 'Spark'). This package introduces the idea of control table specification of data transforms (later aslo adapted from 'cdata' by 'tidyr'). A theory of fluid data transforms can be found in the following articles: https://winvector.github.io/FluidData/FluidDataReshapingWithCdata.html, https://github.com/WinVector/cdata and https://winvector.github.io/FluidData/FluidData.html.
Author(s)
Maintainer: John Mount jmount@win-vector.com
Authors:
Nina Zumel nzumel@win-vector.com
Other contributors:
Win-Vector LLC [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/WinVector/cdata/issues
Multiply/join row records into block records.
Description
Call rowrecs_to_blocks()
.
Usage
table %**% transform
Arguments
table |
data (data.frame or relop). |
transform |
a rowrecs_to_blocks_spec. |
Value
rowrecs_to_blocks() result.
Examples
d <- wrapr::build_frame(
"id", "AUC", "R2" |
1 , 0.7 , 0.4 |
2 , 0.8 , 0.5 )
transform <- rowrecs_to_blocks_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "id")
d %**% transform
# identity (in structure)
d %**% transform %//% t(transform)
Factor-out (aggregate/project) block records into row records.
Description
Call blocks_to_rowrecs()
.
Usage
table %//% transform
Arguments
table |
data (data.frame or relop). |
transform |
a rowrecs_to_blocks_spec. |
Value
blocks_to_rowrecs() result.
Examples
d <- wrapr::build_frame(
"id", "measure", "value" |
1 , "AUC" , 0.7 |
1 , "R2" , 0.4 |
2 , "AUC" , 0.8 |
2 , "R2" , 0.5 )
transform <- blocks_to_rowrecs_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "id")
d %//% transform
# identity (in structure)
d %//% transform %**% t(transform)
Map data records from block records to row records.
Description
Map data records from block records (which each record may be more than one row) to row records (where each record is a single row).
Usage
blocks_to_rowrecs(
tallTable,
keyColumns,
controlTable,
...,
columnsToCopy = NULL,
checkNames = TRUE,
checkKeys = TRUE,
strict = FALSE,
controlTableKeys = colnames(controlTable)[[1]],
tmp_name_source = wrapr::mk_tmp_name_source("bltrr"),
temporary = TRUE,
allow_rqdatatable = FALSE
)
## Default S3 method:
blocks_to_rowrecs(
tallTable,
keyColumns,
controlTable,
...,
columnsToCopy = NULL,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
controlTableKeys = colnames(controlTable)[[1]],
tmp_name_source = wrapr::mk_tmp_name_source("btrd"),
temporary = TRUE,
allow_rqdatatable = FALSE
)
## S3 method for class 'relop'
blocks_to_rowrecs(
tallTable,
keyColumns,
controlTable,
...,
columnsToCopy = NULL,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
controlTableKeys = colnames(controlTable)[[1]],
tmp_name_source = wrapr::mk_tmp_name_source("bltrr"),
temporary = TRUE,
allow_rqdatatable = FALSE
)
Arguments
tallTable |
data.frame containing data to be mapped (in-memory data.frame). |
keyColumns |
character vector of column defining row groups |
controlTable |
table specifying mapping (local data frame) |
... |
force later arguments to be by name. |
columnsToCopy |
character, extra columns to copy. |
checkNames |
logical, if TRUE check names. |
checkKeys |
logical, if TRUE check keyColumns uniquely identify blocks (required). |
strict |
logical, if TRUE check control table name forms |
controlTableKeys |
character, which column names of the control table are considered to be keys. |
tmp_name_source |
a tempNameGenerator from cdata::mk_tmp_name_source() |
temporary |
logical, if TRUE use temporary tables |
allow_rqdatatable |
logical, if TRUE allow rqdatatable shortcutting on simple conversions. |
Details
The controlTable defines the names of each data element in the two notations: the notation of the tall table (which is row oriented) and the notation of the wide table (which is column oriented). controlTable[ , 1] (the group label) cross colnames(controlTable) (the column labels) are names of data cells in the long form. controlTable[ , 2:ncol(controlTable)] (column labels) are names of data cells in the wide form. To get behavior similar to tidyr::gather/spread one builds the control table by running an appropriate query over the data.
Some discussion and examples can be found here: https://winvector.github.io/FluidData/FluidData.html and here https://github.com/WinVector/cdata.
Value
wide table built by mapping key-grouped tallTable rows to one row per group
See Also
build_pivot_control
, rowrecs_to_blocks
Examples
# pivot example
d <- data.frame(meas = c('AUC', 'R2'),
val = c(0.6, 0.2))
cT <- build_pivot_control(d,
columnToTakeKeysFrom= 'meas',
columnToTakeValuesFrom= 'val')
blocks_to_rowrecs(d,
keyColumns = NULL,
controlTable = cT)
d <- data.frame(meas = c('AUC', 'R2'),
val = c(0.6, 0.2))
cT <- build_pivot_control(
d,
columnToTakeKeysFrom= 'meas',
columnToTakeValuesFrom= 'val')
ops <- rquery::local_td(d) %.>%
blocks_to_rowrecs(.,
keyColumns = NULL,
controlTable = cT)
cat(format(ops))
if(requireNamespace("rqdatatable", quietly = TRUE)) {
library("rqdatatable")
d %.>%
ops %.>%
print(.)
}
if(requireNamespace("RSQLite", quietly = TRUE)) {
db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(db,
'd',
d,
overwrite = TRUE,
temporary = TRUE)
db %.>%
ops %.>%
print(.)
DBI::dbDisconnect(db)
}
Map sets rows to columns (query based, take name of table).
Description
Transform data facts from rows into additional columns using SQL and controlTable.
Usage
blocks_to_rowrecs_q(
tallTable,
keyColumns,
controlTable,
my_db,
...,
columnsToCopy = NULL,
tempNameGenerator = mk_tmp_name_source("mvtcq"),
strict = FALSE,
controlTableKeys = colnames(controlTable)[[1]],
checkNames = TRUE,
checkKeys = FALSE,
showQuery = FALSE,
defaultValue = NULL,
dropDups = TRUE,
temporary = FALSE,
resultName = NULL,
incoming_qualifiers = NULL,
outgoing_qualifiers = NULL,
executeQuery = TRUE
)
Arguments
tallTable |
name of table containing data to be mapped (db/Spark data) |
keyColumns |
character list of column defining row groups |
controlTable |
table specifying mapping (local data frame) |
my_db |
db handle |
... |
force later arguments to be by name. |
columnsToCopy |
character list of column names to copy |
tempNameGenerator |
a tempNameGenerator from cdata::mk_tmp_name_source() |
strict |
logical, if TRUE check control table name forms |
controlTableKeys |
character, which column names of the control table are considered to be keys. |
checkNames |
logical, if TRUE check names |
checkKeys |
logical, if TRUE check keying of tallTable |
showQuery |
if TRUE print query |
defaultValue |
if not NULL literal to use for non-match values. |
dropDups |
logical if TRUE suppress duplicate columns (duplicate determined by name, not content). |
temporary |
logical, if TRUE make result temporary. |
resultName |
character, name for result table. |
incoming_qualifiers |
optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
outgoing_qualifiers |
optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
executeQuery |
logical, if TRUE execute the query and return result. |
Details
This is using the theory of "fluid data"n (https://github.com/WinVector/cdata), which includes the principle that each data cell has coordinates independent of the storage details and storage detail dependent coordinates (usually row-id, column-id, and group-id) can be re-derived at will (the other principle is that there may not be "one true preferred data shape" and many re-shapings of data may be needed to match data to different algorithms and methods).
The controlTable defines the names of each data element in the two notations: the notation of the tall table (which is row oriented) and the notation of the wide table (which is column oriented). controlTable[ , 1] (the group label) cross colnames(controlTable) (the column labels) are names of data cells in the long form. controlTable[ , 2:ncol(controlTable)] (column labels) are names of data cells in the wide form. To get behavior similar to tidyr::gather/spread one builds the control table by running an appropriate query over the data.
Some discussion and examples can be found here: https://winvector.github.io/FluidData/FluidData.html and here https://github.com/WinVector/cdata.
Value
wide table built by mapping key-grouped tallTable rows to one row per group
See Also
build_pivot_control_q
, blocks_to_rowrecs
Examples
if (requireNamespace("DBI", quietly = TRUE) &&
requireNamespace("RSQLite", quietly = TRUE)) {
my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
# pivot example
d <- data.frame(meas = c('AUC', 'R2'), val = c(0.6, 0.2))
rquery::rq_copy_to(my_db,
'd',
d,
temporary = TRUE)
cT <- build_pivot_control_q('d',
columnToTakeKeysFrom= 'meas',
columnToTakeValuesFrom= 'val',
my_db = my_db)
tab <- blocks_to_rowrecs_q('d',
keyColumns = NULL,
controlTable = cT,
my_db = my_db)
qlook(my_db, tab)
DBI::dbDisconnect(my_db)
}
Create a block records to row records transform specification.
Description
Create a block records to row records transform specification object that holds the pivot control table, specification of extra row keys, and control table keys.
Usage
blocks_to_rowrecs_spec(
controlTable,
...,
recordKeys = character(0),
controlTableKeys = colnames(controlTable)[[1]],
checkNames = TRUE,
checkKeys = TRUE,
strict = FALSE,
allow_rqdatatable = FALSE
)
Arguments
controlTable |
an all character data frame or cdata pivot control. |
... |
not used, force later arguments to bind by name. |
recordKeys |
vector of columns identifying records. |
controlTableKeys |
vector of keying columns of the controlTable. |
checkNames |
passed to blocks_to_rowrecs. |
checkKeys |
passed to blocks_to_rowrecs. |
strict |
passed to blocks_to_rowrecs. |
allow_rqdatatable |
logical, if TRUE allow rqdatatable shortcutting on simple conversions. |
Value
a record specification object
Examples
d <- wrapr::build_frame(
"id", "measure", "value" |
1 , "AUC" , 0.7 |
1 , "R2" , 0.4 |
2 , "AUC" , 0.8 |
2 , "R2" , 0.5 )
transform <- blocks_to_rowrecs_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "id")
print(transform)
d %.>% transform
inv_transform <- t(transform)
print(inv_transform)
# identity (in structure)
d %.>% transform %.>% inv_transform
# identity again (using .() "immediate" notation)
d %.>% transform %.>% .(t(transform))
Build a blocks_to_rowrecs()/rowrecs_to_blocks() control table that specifies a pivot from a data.frame
.
Description
Some discussion and examples can be found here: https://winvector.github.io/FluidData/FluidData.html.
Usage
build_pivot_control(
table,
columnToTakeKeysFrom,
columnToTakeValuesFrom,
...,
prefix = columnToTakeKeysFrom,
sep = NULL,
tmp_name_source = wrapr::mk_tmp_name_source("bpc"),
temporary = FALSE
)
## Default S3 method:
build_pivot_control(
table,
columnToTakeKeysFrom,
columnToTakeValuesFrom,
...,
prefix = columnToTakeKeysFrom,
sep = NULL,
tmp_name_source = wrapr::mk_tmp_name_source("bpcd"),
temporary = TRUE
)
## S3 method for class 'relop'
build_pivot_control(
table,
columnToTakeKeysFrom,
columnToTakeValuesFrom,
...,
prefix = columnToTakeKeysFrom,
sep = NULL,
tmp_name_source = wrapr::mk_tmp_name_source("bpc"),
temporary = FALSE
)
Arguments
table |
data.frame to scan for new column names (in-memory data.frame). |
columnToTakeKeysFrom |
character name of column build new column names from. |
columnToTakeValuesFrom |
character name of column to get values from. |
... |
not used, force later args to be by name |
prefix |
column name prefix (only used when sep is not NULL) |
sep |
separator to build complex column names. |
tmp_name_source |
a tempNameGenerator from cdata::mk_tmp_name_source() |
temporary |
logical, if TRUE use temporary tables |
Value
control table
See Also
Examples
d <- data.frame(measType = c("wt", "ht"),
measValue = c(150, 6),
stringsAsFactors = FALSE)
build_pivot_control(d,
'measType', 'measValue',
sep = '_')
d <- data.frame(measType = c("wt", "ht"),
measValue = c(150, 6),
stringsAsFactors = FALSE)
ops <- rquery::local_td(d) %.>%
build_pivot_control(.,
'measType', 'measValue',
sep = '_')
cat(format(ops))
if(requireNamespace("rqdatatable", quietly = TRUE)) {
library("rqdatatable")
d %.>%
ops %.>%
print(.)
}
if(requireNamespace("RSQLite", quietly = TRUE)) {
db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(db,
'd',
d,
overwrite = TRUE,
temporary = TRUE)
db %.>%
ops %.>%
print(.)
DBI::dbDisconnect(db)
}
Build a blocks_to_rowrecs_q() control table that specifies a pivot (query based, takes name of table).
Description
Some discussion and examples can be found here: https://winvector.github.io/FluidData/FluidData.html.
Usage
build_pivot_control_q(
tableName,
columnToTakeKeysFrom,
columnToTakeValuesFrom,
my_db,
...,
prefix = columnToTakeKeysFrom,
sep = NULL,
qualifiers = NULL
)
Arguments
tableName |
Name of table to scan for new column names. |
columnToTakeKeysFrom |
character name of column build new column names from. |
columnToTakeValuesFrom |
character name of column to get values from. |
my_db |
db handle |
... |
not used, force later args to be by name |
prefix |
column name prefix (only used when sep is not NULL) |
sep |
separator to build complex column names. |
qualifiers |
optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
Value
control table
See Also
blocks_to_rowrecs_q
, build_pivot_control
Examples
if (requireNamespace("DBI", quietly = TRUE) &&
requireNamespace("RSQLite", quietly = TRUE)) {
my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
d <- data.frame(measType = c("wt", "ht"),
measValue = c(150, 6),
stringsAsFactors = FALSE)
rquery::rq_copy_to(my_db,
'd',
d,
overwrite = TRUE,
temporary = TRUE)
build_pivot_control_q('d', 'measType', 'measValue',
my_db = my_db,
sep = '_') %.>%
print(.)
DBI::dbDisconnect(my_db)
}
Build a rowrecs_to_blocks() control table that specifies a un-pivot (or "shred").
Description
Some discussion and examples can be found here: https://winvector.github.io/FluidData/FluidData.html and here https://github.com/WinVector/cdata.
Usage
build_unpivot_control(
nameForNewKeyColumn,
nameForNewValueColumn,
columnsToTakeFrom,
...
)
Arguments
nameForNewKeyColumn |
character name of column to write new keys in. |
nameForNewValueColumn |
character name of column to write new values in. |
columnsToTakeFrom |
character array names of columns to take values from. |
... |
not used, force later args to be by name |
Value
control table
See Also
Examples
build_unpivot_control("measurmentType", "measurmentValue", c("c1", "c2"))
Check columns form unique keys
Description
Check columns form unique keys
Usage
check_cols_form_unique_keys(data, keyColNames)
Arguments
data |
data.frame to check |
keyColNames |
character, names of columns to consider as keys |
Value
logical TRUE if rows are uniquely keyed by named columns
Convert a layout_specification, blocks_to_rowrecs_spec, or rowrecs_to_blocks_spec to a simple object.
Description
Convert a layout_specification, blocks_to_rowrecs_spec, or rowrecs_to_blocks_spec to a simple object.
Usage
convert_cdata_spec_to_yaml(spec)
Arguments
spec |
a layout_specification, blocks_to_rowrecs_spec, or rowrecs_to_blocks_spec |
Value
a simple object suitable for YAML serialization
General transform from arbitrary record shape to arbitrary record shape.
Description
General transform from arbitrary record shape to arbitrary record shape.
Usage
convert_records(
table,
incoming_shape = NULL,
outgoing_shape = NULL,
...,
keyColumns = NULL,
columnsToCopy_in = NULL,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
incoming_controlTableKeys = colnames(incoming_shape)[[1]],
outgoing_controlTableKeys = colnames(outgoing_shape)[[1]],
tmp_name_source = wrapr::mk_tmp_name_source("crec"),
temporary = TRUE,
allow_rqdatatable_in = FALSE,
allow_rqdatatable_out = FALSE
)
Arguments
table |
data.frame or relop. |
incoming_shape |
data.frame, definition of incoming record shape. |
outgoing_shape |
data.frame, defintion of outgoing record shape. |
... |
force later arguments to bind by name. |
keyColumns |
character vector of column defining incoming row groups |
columnsToCopy_in |
character array of incoming column names to copy. |
checkNames |
logical, if TRUE check names. |
checkKeys |
logical, if TRUE check columnsToCopy form row keys (not a requirement, unless you want to be able to invert the operation). |
strict |
logical, if TRUE check control table name forms. |
incoming_controlTableKeys |
character, which column names of the incoming control table are considered to be keys. |
outgoing_controlTableKeys |
character, which column names of the outgoing control table are considered to be keys. |
tmp_name_source |
a tempNameGenerator from cdata::mk_tmp_name_source() |
temporary |
logical, if TRUE use temporary tables |
allow_rqdatatable_in |
logical, if TRUE allow rqdatatable shortcutting on simple conversions. |
allow_rqdatatable_out |
logical, if TRUE allow rqdatatable shortcutting on simple conversions. |
Value
processing pipeline or transformed table
Examples
incoming_shape <- qchar_frame(
"row", "col1", "col2", "col3" |
"row1", v11, v12, v13 |
"row2", v21, v22, v23 |
"row3", v31, v32, v33 )
outgoing_shape <- qchar_frame(
"column", "row1", "row2", "row3" |
"col1", v11, v21 , v31 |
"col2", v12, v22 , v32 |
"col3", v13, v23 , v33 )
data <- build_frame(
'record_id', 'row', 'col1', 'col2', 'col3' |
1, 'row1', 1, 2, 3 |
1, 'row2', 4, 5, 6 |
1, 'row3', 7, 8, 9 |
2, 'row1', 11, 12, 13 |
2, 'row2', 14, 15, 16 |
2, 'row3', 17, 18, 19 )
print(data)
convert_records(
data,
keyColumns = 'record_id',
incoming_shape = incoming_shape,
outgoing_shape = outgoing_shape)
td <- rquery::local_td(data)
ops <- convert_records(
td,
keyColumns = 'record_id',
incoming_shape = incoming_shape,
outgoing_shape = outgoing_shape)
cat(format(ops))
Read a cdata record transform from a simple object (such as is imported from YAML).
Description
Read a cdata record transform from a simple object (such as is imported from YAML).
Usage
convert_yaml_to_cdata_spec(obj)
Arguments
obj |
object to convert |
Value
cdata transform specification
Upack details of a cdata record transform.
Description
Unpack details, especially: generate data frames representing both sides of a transform.
Usage
get_transform_details(x)
Arguments
x |
blocks_to_rowrecs_spec or rowrecs_to_blocks_spec |
Value
detailed fields
Use transform spec to layout data.
Description
Use transform spec to layout data.
Usage
layout_by(transform, table)
Arguments
transform |
object of class rowrecs_to_blocks_spec |
table |
data.frame or relop. |
Value
re-arranged data or data reference (relop).
Examples
d <- wrapr::build_frame(
"id" , "AUC", "R2" |
1 , 0.7 , 0.4 |
2 , 0.8 , 0.5 )
transform <- rowrecs_to_blocks_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "id")
print(transform)
layout_by(transform, d)
d <- wrapr::build_frame(
"id", "measure", "value" |
1 , "AUC" , 0.7 |
1 , "R2" , 0.4 |
2 , "AUC" , 0.8 |
2 , "R2" , 0.5 )
transform <- blocks_to_rowrecs_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "id")
print(transform)
layout_by(transform, d)
Use transform spec to layout data.
Description
Use transform spec to layout data.
Usage
## S3 method for class 'blocks_to_rowrecs_spec'
layout_by(transform, table)
Arguments
transform |
object of class blocks_to_rowrecs_spec. |
table |
data.frame or relop. |
Value
re-arranged data or data reference (relop).
Examples
d <- wrapr::build_frame(
"id", "measure", "value" |
1 , "AUC" , 0.7 |
1 , "R2" , 0.4 |
2 , "AUC" , 0.8 |
2 , "R2" , 0.5 )
transform <- blocks_to_rowrecs_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "id")
print(transform)
layout_by(transform, d)
Use transform spec to layout data.
Description
Use transform spec to layout data.
Usage
## S3 method for class 'cdata_general_transform_spec'
layout_by(transform, table)
Arguments
transform |
object of class blocks_to_rowrecs_spec. |
table |
data.frame or relop. |
Value
re-arranged data or data reference (relop).
Use transform spec to layout data.
Description
Use transform spec to layout data.
Usage
## S3 method for class 'rowrecs_to_blocks_spec'
layout_by(transform, table)
Arguments
transform |
object of class rowrecs_to_blocks_spec |
table |
data.frame or relop. |
Value
re-arranged data or data reference (relop).
Examples
d <- wrapr::build_frame(
"id" , "AUC", "R2" |
1 , 0.7 , 0.4 |
2 , 0.8 , 0.5 )
transform <- rowrecs_to_blocks_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "id")
print(transform)
layout_by(transform, d)
Create a record to record spec.
Description
Create a general record to record transform specification.
Usage
layout_specification(
incoming_shape = NULL,
outgoing_shape = NULL,
...,
recordKeys = character(0),
incoming_controlTableKeys = colnames(incoming_shape)[[1]],
outgoing_controlTableKeys = colnames(outgoing_shape)[[1]],
checkNames = TRUE,
checkKeys = TRUE,
strict = FALSE,
allow_rqdatatable_in = FALSE,
allow_rqdatatable_out = FALSE
)
Arguments
incoming_shape |
data.frame, definition of incoming record shape. |
outgoing_shape |
data.frame, defintion of outgoing record shape. |
... |
not used, force later arguments to bind by name. |
recordKeys |
vector of columns identifying records. |
incoming_controlTableKeys |
character, which column names of the incoming control table are considered to be keys. |
outgoing_controlTableKeys |
character, which column names of the outgoing control table are considered to be keys. |
checkNames |
passed to rowrecs_to_blocks. |
checkKeys |
passed to rowrecs_to_blocks. |
strict |
passed to rowrecs_to_blocks. |
allow_rqdatatable_in |
logical, if TRUE allow rqdatatable shortcutting on simple conversions. |
allow_rqdatatable_out |
logical, if TRUE allow rqdatatable shortcutting on simple conversions. |
Value
a record specification object
Examples
incoming_shape <- qchar_frame(
"row", "col1", "col2", "col3" |
"row1", v11, v12, v13 |
"row2", v21, v22, v23 |
"row3", v31, v32, v33 )
outgoing_shape <- qchar_frame(
"column", "row1", "row2", "row3" |
"col1", v11, v21 , v31 |
"col2", v12, v22 , v32 |
"col3", v13, v23 , v33 )
data <- build_frame(
'record_id', 'row', 'col1', 'col2', 'col3' |
1, 'row1', 1, 2, 3 |
1, 'row2', 4, 5, 6 |
1, 'row3', 7, 8, 9 |
2, 'row1', 11, 12, 13 |
2, 'row2', 14, 15, 16 |
2, 'row3', 17, 18, 19 )
print(data)
layout <- layout_specification(
incoming_shape = incoming_shape,
outgoing_shape = outgoing_shape,
recordKeys = 'record_id')
print(layout)
data %.>% layout
data %.>% layout %.>% .(t(layout))
Map field values from one column into new derived columns (takes a data.frame
).
Description
Map field values from one column into new derived columns (takes a data.frame
).
Usage
map_fields(d, cname, m)
Arguments
d |
name of table to re-map. |
cname |
name of column to re-map. |
m |
name of table of data describing the mapping (cname column is source, derived columns are destinations). |
Value
re-mapped table
Examples
d <- data.frame(what = c("acc", "loss",
"val_acc", "val_loss"),
score = c(0.8, 1.2,
0.7, 1.7),
stringsAsFactors = FALSE)
m <- data.frame(what = c("acc", "loss",
"val_acc", "val_loss"),
measure = c("accuracy", "log-loss",
"accuracy", "log-loss"),
dataset = c("train", "train", "validation", "validation"),
stringsAsFactors = FALSE)
map_fields(d, 'what', m)
Map field values from one column into new derived columns (query based, takes name of table).
Description
Map field values from one column into new derived columns (query based, takes name of table).
Usage
map_fields_q(
dname,
cname,
mname,
my_db,
rname,
...,
d_qualifiers = NULL,
m_qualifiers = NULL
)
Arguments
dname |
name of table to re-map. |
cname |
name of column to re-map. |
mname |
name of table of data describing the mapping (cname column is source, derived columns are destinations). |
my_db |
database handle. |
rname |
name of result table. |
... |
force later arguments to be by name. |
d_qualifiers |
optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
m_qualifiers |
optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
Value
re-mapped table
Examples
if (requireNamespace("DBI", quietly = TRUE) &&
requireNamespace("RSQLite", quietly = TRUE)) {
my_db <- DBI::dbConnect(RSQLite::SQLite(),
":memory:")
DBI::dbWriteTable(
my_db,
'd',
data.frame(what = c("acc", "loss",
"val_acc", "val_loss"),
score = c(0.8, 1.2,
0.7, 1.7),
stringsAsFactors = FALSE),
overwrite = TRUE,
temporary = TRUE)
DBI::dbWriteTable(
my_db,
'm',
data.frame(what = c("acc", "loss",
"val_acc", "val_loss"),
measure = c("accuracy", "log-loss",
"accuracy", "log-loss"),
dataset = c("train", "train", "validation", "validation"),
stringsAsFactors = FALSE),
overwrite = TRUE,
temporary = TRUE)
map_fields_q('d', 'what', 'm', my_db, "dm")
cdata::qlook(my_db, 'dm')
DBI::dbDisconnect(my_db)
}
Map data records from block records that have one row per measurement value to row records.
Description
Map data records from block records (where each record may be more than one row) to row records (where each record is a single row). Values specified in rowKeyColumns determine which sets of rows build up records and are copied into the result.
Usage
pivot_to_rowrecs(
data,
columnToTakeKeysFrom,
columnToTakeValuesFrom,
rowKeyColumns,
...,
sep = NULL,
checkNames = TRUE,
checkKeys = TRUE,
strict = FALSE,
allow_rqdatatable = FALSE
)
layout_to_rowrecs(
data,
columnToTakeKeysFrom,
columnToTakeValuesFrom,
rowKeyColumns,
...,
sep = NULL,
checkNames = TRUE,
checkKeys = TRUE,
strict = FALSE,
allow_rqdatatable = FALSE
)
Arguments
data |
data.frame to work with (must be local, for remote please try |
columnToTakeKeysFrom |
character name of column build new column names from. |
columnToTakeValuesFrom |
character name of column to get values from. |
rowKeyColumns |
character array names columns that should be table keys. |
... |
force later arguments to bind by name. |
sep |
character if not null build more detailed column names. |
checkNames |
logical, if TRUE check names. |
checkKeys |
logical, if TRUE check keyColumns uniquely identify blocks (required). |
strict |
logical, if TRUE check control table name forms |
allow_rqdatatable |
logical, if TRUE allow rqdatatable shortcutting on simple conversions. |
Value
new data.frame with values moved to columns.
See Also
unpivot_to_blocks
, blocks_to_rowrecs
Examples
d <- data.frame(model_id = c("m1", "m1"), meas = c('AUC', 'R2'), val= c(0.6, 0.2))
pivot_to_rowrecs(d,
columnToTakeKeysFrom= 'meas',
columnToTakeValuesFrom= 'val',
rowKeyColumns= "model_id") %.>%
print(.)
Quick look at remote data
Description
Quick look at remote data
Usage
qlook(
my_db,
tableName,
...,
displayRows = 10,
countRows = TRUE,
qualifiers = NULL
)
Arguments
my_db |
database handle |
tableName |
name of table to look at |
... |
force later arguments to be by name. |
displayRows |
number of rows to sample |
countRows |
logical, if TRUE return row count. |
qualifiers |
optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
Value
str view of data
Examples
if ( requireNamespace("DBI", quietly = TRUE) &&
requireNamespace("RSQLite", quietly = TRUE)) {
my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
rquery::rq_copy_to(my_db,
'd',
data.frame(AUC = 0.6, R2 = 0.2),
overwrite = TRUE,
temporary = TRUE)
qlook(my_db, 'd')
DBI::dbDisconnect(my_db)
}
Map a data records from row records to block records.
Description
Map a data records from row records (records that are exactly single rows) to block records (records that may be more than one row).
Usage
rowrecs_to_blocks(
wideTable,
controlTable,
...,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
controlTableKeys = colnames(controlTable)[[1]],
columnsToCopy = NULL,
tmp_name_source = wrapr::mk_tmp_name_source("rrtbl"),
temporary = TRUE,
allow_rqdatatable = FALSE
)
## Default S3 method:
rowrecs_to_blocks(
wideTable,
controlTable,
...,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
controlTableKeys = colnames(controlTable)[[1]],
columnsToCopy = NULL,
tmp_name_source = wrapr::mk_tmp_name_source("rrtobd"),
temporary = TRUE,
allow_rqdatatable = FALSE
)
## S3 method for class 'relop'
rowrecs_to_blocks(
wideTable,
controlTable,
...,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
controlTableKeys = colnames(controlTable)[[1]],
columnsToCopy = NULL,
tmp_name_source = wrapr::mk_tmp_name_source("rrtbl"),
temporary = TRUE,
allow_rqdatatable = FALSE
)
Arguments
wideTable |
data.frame containing data to be mapped (in-memory data.frame). |
controlTable |
table specifying mapping (local data frame). |
... |
force later arguments to be by name. |
checkNames |
logical, if TRUE check names. |
checkKeys |
logical, if TRUE check columnsToCopy form row keys (not a requirement, unless you want to be able to invert the operation). |
strict |
logical, if TRUE check control table name forms. |
controlTableKeys |
character, which column names of the control table are considered to be keys. |
columnsToCopy |
character array of column names to copy. |
tmp_name_source |
a tempNameGenerator from cdata::mk_tmp_name_source() |
temporary |
logical, if TRUE use temporary tables |
allow_rqdatatable |
logical, if TRUE allow rqdatatable shortcutting on simple conversions. |
Details
The controlTable defines the names of each data element in the two notations: the notation of the tall table (which is row oriented) and the notation of the wide table (which is column oriented). controlTable[ , 1] (the group label) cross colnames(controlTable) (the column labels) are names of data cells in the long form. controlTable[ , 2:ncol(controlTable)] (column labels) are names of data cells in the wide form. To get behavior similar to tidyr::gather/spread one builds the control table by running an appropriate query over the data.
Some discussion and examples can be found here: https://winvector.github.io/FluidData/FluidData.html and here https://github.com/WinVector/cdata.
rowrecs_to_blocks.default
will change some factor columns to character, and there
are issues with time columns with different time zones.
Value
long table built by mapping wideTable to one row per group
See Also
build_unpivot_control
, blocks_to_rowrecs
Examples
# un-pivot example
d <- data.frame(AUC = 0.6, R2 = 0.2)
cT <- build_unpivot_control(nameForNewKeyColumn= 'meas',
nameForNewValueColumn= 'val',
columnsToTakeFrom= c('AUC', 'R2'))
rowrecs_to_blocks(d, cT)
d <- data.frame(AUC = 0.6, R2 = 0.2)
cT <- build_unpivot_control(
nameForNewKeyColumn= 'meas',
nameForNewValueColumn= 'val',
columnsToTakeFrom= c('AUC', 'R2'))
ops <- rquery::local_td(d) %.>%
rowrecs_to_blocks(., cT)
cat(format(ops))
if(requireNamespace("rqdatatable", quietly = TRUE)) {
library("rqdatatable")
d %.>%
ops %.>%
print(.)
}
if(requireNamespace("RSQLite", quietly = TRUE)) {
db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(db,
'd',
d,
overwrite = TRUE,
temporary = TRUE)
db %.>%
ops %.>%
print(.)
DBI::dbDisconnect(db)
}
Map a set of columns to rows (query based, take name of table).
Description
Transform data facts from columns into additional rows using SQL and controlTable.
Usage
rowrecs_to_blocks_q(
wideTable,
controlTable,
my_db,
...,
columnsToCopy = NULL,
tempNameGenerator = mk_tmp_name_source("mvtrq"),
strict = FALSE,
controlTableKeys = colnames(controlTable)[[1]],
checkNames = TRUE,
checkKeys = FALSE,
showQuery = FALSE,
defaultValue = NULL,
temporary = FALSE,
resultName = NULL,
incoming_qualifiers = NULL,
outgoing_qualifiers = NULL,
temp_qualifiers = NULL,
executeQuery = TRUE
)
Arguments
wideTable |
name of table containing data to be mapped (db/Spark data) |
controlTable |
table specifying mapping (local data frame) |
my_db |
db handle |
... |
force later arguments to be by name. |
columnsToCopy |
character array of column names to copy |
tempNameGenerator |
a tempNameGenerator from cdata::mk_tmp_name_source() |
strict |
logical, if TRUE check control table name forms |
controlTableKeys |
character, which column names of the control table are considered to be keys. |
checkNames |
logical, if TRUE check names |
checkKeys |
logical, if TRUE check wideTable keys |
showQuery |
if TRUE print query |
defaultValue |
if not NULL literal to use for non-match values. |
temporary |
logical, if TRUE make result temporary. |
resultName |
character, name for result table. |
incoming_qualifiers |
optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
outgoing_qualifiers |
optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
temp_qualifiers |
optional named ordered vector of strings carrying additional db hierarchy terms, such as schema. |
executeQuery |
logical, if TRUE execute the query and return result. |
Details
This is using the theory of "fluid data"n (https://github.com/WinVector/cdata), which includes the principle that each data cell has coordinates independent of the storage details and storage detail dependent coordinates (usually row-id, column-id, and group-id) can be re-derived at will (the other principle is that there may not be "one true preferred data shape" and many re-shapings of data may be needed to match data to different algorithms and methods).
The controlTable defines the names of each data element in the two notations: the notation of the tall table (which is row oriented) and the notation of the wide table (which is column oriented). controlTable[ , 1] (the group label) cross colnames(controlTable) (the column labels) are names of data cells in the long form. controlTable[ , 2:ncol(controlTable)] (column labels) are names of data cells in the wide form. To get behavior similar to tidyr::gather/spread one builds the control table by running an appropriate query over the data.
Some discussion and examples can be found here: https://winvector.github.io/FluidData/FluidData.html and here https://github.com/WinVector/cdata.
Value
long table built by mapping wideTable to one row per group (or query if executeQuery is FALSE)
See Also
build_unpivot_control
, rowrecs_to_blocks
Examples
if (requireNamespace("DBI", quietly = TRUE) &&
requireNamespace("RSQLite", quietly = TRUE)) {
my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
# un-pivot example
d <- data.frame(AUC = 0.6, R2 = 0.2)
rquery::rq_copy_to(my_db,
'd',
d,
overwrite = TRUE,
temporary = TRUE)
cT <- build_unpivot_control(nameForNewKeyColumn= 'meas',
nameForNewValueColumn= 'val',
columnsToTakeFrom= c('AUC', 'R2'))
tab <- rowrecs_to_blocks_q('d', cT, my_db = my_db)
qlook(my_db, tab)
DBI::dbDisconnect(my_db)
}
Create a row records to block records transform specification.
Description
Create a row records to block records transform specification object that holds the pivot control table, specification of extra row keys, and control table keys.
Usage
rowrecs_to_blocks_spec(
controlTable,
...,
recordKeys = character(0),
controlTableKeys = colnames(controlTable)[[1]],
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
allow_rqdatatable = FALSE
)
Arguments
controlTable |
an all character data frame or cdata pivot control. |
... |
not used, force later arguments to bind by name. |
recordKeys |
vector of columns identifying records. |
controlTableKeys |
vector of keying columns of the controlTable. |
checkNames |
passed to rowrecs_to_blocks. |
checkKeys |
passed to rowrecs_to_blocks. |
strict |
passed to rowrecs_to_blocks. |
allow_rqdatatable |
logical, if TRUE allow rqdatatable shortcutting on simple conversions. |
Value
a record specification object
Examples
d <- wrapr::build_frame(
"id" , "AUC", "R2" |
1 , 0.7 , 0.4 |
2 , 0.8 , 0.5 )
transform <- rowrecs_to_blocks_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "id")
print(transform)
d %.>% transform
inv_transform <- t(transform)
print(inv_transform)
# identity (in structure)
d %.>% transform %.>% inv_transform
# identity again (using .() "immediate" notation)
d %.>% transform %.>% .(t(transform))
Check if table rows are uniquely keyed by keyset.
Description
Return TRUE if table rows are uniquely keyed by key_columns.
Usage
rows_are_uniquely_keyed(table_rep, key_columns, db)
Arguments
table_rep |
rquery op_tree |
key_columns |
character vector names of key columns |
db |
database handle |
Value
TRUE if table rows are uniquely keyed by key columns
Map a data records from row records to block records with one record row per columnsToTakeFrom value.
Description
Map a data records from row records (records that are exactly single rows) to block records (records that may be more than one row). All columns not named in columnsToTakeFrom are copied to each record row in the result.
Usage
unpivot_to_blocks(
data,
nameForNewKeyColumn,
nameForNewValueColumn,
columnsToTakeFrom,
...,
nameForNewClassColumn = NULL,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
tmp_name_source = wrapr::mk_tmp_name_source("upb"),
temporary = TRUE,
allow_rqdatatable = FALSE
)
layout_to_blocks(
data,
nameForNewKeyColumn,
nameForNewValueColumn,
columnsToTakeFrom,
...,
nameForNewClassColumn = NULL,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
tmp_name_source = wrapr::mk_tmp_name_source("upb"),
temporary = TRUE,
allow_rqdatatable = FALSE
)
pivot_to_blocks(
data,
nameForNewKeyColumn,
nameForNewValueColumn,
columnsToTakeFrom,
...,
nameForNewClassColumn = NULL,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
tmp_name_source = wrapr::mk_tmp_name_source("upb"),
temporary = TRUE,
allow_rqdatatable = FALSE
)
## Default S3 method:
unpivot_to_blocks(
data,
nameForNewKeyColumn,
nameForNewValueColumn,
columnsToTakeFrom,
...,
nameForNewClassColumn = NULL,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
allow_rqdatatable = FALSE
)
## S3 method for class 'relop'
unpivot_to_blocks(
data,
nameForNewKeyColumn,
nameForNewValueColumn,
columnsToTakeFrom,
...,
checkNames = TRUE,
checkKeys = FALSE,
strict = FALSE,
nameForNewClassColumn = NULL,
tmp_name_source = wrapr::mk_tmp_name_source("upb"),
temporary = TRUE,
allow_rqdatatable = FALSE
)
Arguments
data |
data.frame to work with. |
nameForNewKeyColumn |
character name of column to write new keys in. |
nameForNewValueColumn |
character name of column to write new values in. |
columnsToTakeFrom |
character array names of columns to take values from. |
... |
force later arguments to bind by name. |
nameForNewClassColumn |
optional name to land original cell classes to. |
checkNames |
logical, if TRUE check names. |
checkKeys |
logical, if TRUE check columnsToCopy form row keys (not a requirement, unless you want to be able to invert the operation). |
strict |
logical, if TRUE check control table name forms. |
tmp_name_source |
a tempNameGenerator from cdata::mk_tmp_name_source() |
temporary |
logical, if TRUE make result temporary. |
allow_rqdatatable |
logical, if TRUE allow rqdatatable shortcutting on simple conversions. |
Value
new data.frame with values moved to rows.
See Also
pivot_to_rowrecs
, rowrecs_to_blocks
Examples
d <- data.frame(model_name = "m1", AUC = 0.6, R2 = 0.2)
unpivot_to_blocks(d,
nameForNewKeyColumn= 'meas',
nameForNewValueColumn= 'val',
columnsToTakeFrom= c('AUC', 'R2')) %.>%
print(.)
d <- data.frame(AUC= 0.6, R2= 0.2)
ops <- rquery::local_td(d) %.>%
unpivot_to_blocks(
.,
nameForNewKeyColumn= 'meas',
nameForNewValueColumn= 'val',
columnsToTakeFrom= c('AUC', 'R2'))
cat(format(ops))
if(requireNamespace("rqdatatable", quietly = TRUE)) {
library("rqdatatable")
d %.>%
ops %.>%
print(.)
}
if(requireNamespace("RSQLite", quietly = TRUE)) {
db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(db,
'd',
d,
overwrite = TRUE,
temporary = TRUE)
db %.>%
ops %.>%
print(.)
DBI::dbDisconnect(db)
}