Version: 1.0.0
Title: Multiset Intersection Generator
Date: 2019-03-7
Author: Ivan Tomic ORCID iD [aut, cre, cph], Adriana Tomic ORCID iD [aut, ctb]
Description: Computes efficient data distributions from highly inconsistent datasets with many missing values using multi-set intersections. Based upon hash functions, 'mulset' can quickly identify intersections from very large matrices of input vectors across columns and rows and thus provides scalable solution for dealing with missing values. Tomic et al. (2019) <doi:10.1101/545186>.
Maintainer: Ivan Tomic <info@ivantomic.com>
Packaged: 2019-03-07 20:24:54 UTC; login
Imports: gtools, digest, stats
Depends: R (≥ 3.4.0)
URL: https://github.com/LogIN-/mulset
BugReports: https://github.com/LogIN-/mulset/issues
License: EUPL (≥ 1.2)
Encoding: UTF-8
LazyLoad: yes
LazyData: yes
RoxygenNote: 6.1.1.9000
NeedsCompilation: no
Repository: CRAN
Date/Publication: 2019-03-08 16:50:03 UTC

A intersection function

Description

intersection() returns all intersections it found.

Usage

intersection(...)

Arguments

...

Vector with master values to check and vector to compare values against

Value

Character vector of all common attributes

Examples

input1 <- seq(50, 100, by=10)
input2 <- seq(70, 130, by=10)
intersection(input1, input2)

A mulset function

Description

mulset() returns all multi-set intersections

Usage

mulset(data, exclude = NULL, include = c("samples", "samples_count",
  "datapoints"), maxIntersections = NULL, hashMethod = "md5")

Arguments

data

Data frame containing your data

exclude

Vector containing one or more variable names from names(data)

include

List of attributes which will be shown in results. Possible values are: c("samples", "samples_count", "datapoints"). If parameter is set to NULL only c("features", "feature_count") will be returned.

maxIntersections

Maximum number of unique datasets to generate, if NULL all datasets will be generated

hashMethod

Hashing method to use for unique sets identification. Available choices: md5(default), sha1, crc32, sha256, sha512, xxhash32, xxhash64, murmur32

Details

This function allows you to generate specific type of multi-set intersections. It searches for multi set intersections between rows and column identifiers. If no NA values are present only 1 dataset is returned as expected.

Value

If any intersections are found it returns a list that contains all available multi-set intersections You can convert this to data-frame following example provided or use it as it is.

Examples

data(mulsetDemo)
print(head(mulsetDemo))
resamples <- mulset(mulsetDemo, exclude = c("outcome", "age", "gender"), maxIntersections = 250)
## Loop through returned list or convert it to data-frame
## resamplesFrame <- as.data.frame(t(sapply(resamples,c)))

Demo data set from mulset package. This data is used in this package examples. It consist of 4x4 feature matrix + additional dummy columns that can be used for testing.

Description

Demo data set from mulset package. This data is used in this package examples. It consist of 4x4 feature matrix + additional dummy columns that can be used for testing.

Usage

data(mulsetDemo)

Format

An object of class data.frame with 4 rows and 7 columns.

Examples

data(mulsetDemo)
print(head(mulsetDemo))
resamples <- mulset(mulsetDemo, exclude = c("outcome", "age", "gender"))