Help for package parseRPDR

Type:

Package

Title:

Parse and Manipulate Research Patient Data Registry ('RPDR') Text Queries

Version:

1.1.2

Date:

2025-01-19

Maintainer:

Marton Kolossvary <mkolossvary@mgh.harvard.edu>

Description:

Functions to load Research Patient Data Registry ('RPDR') text queries from Partners Healthcare institutions into R. The package also provides helper functions to manipulate data and execute common procedures such as finding the closest radiological exams considering a given timepoint, or creating a DICOM header database from the downloaded images. All functionalities are parallelized for fast and efficient analyses.

License:

AGPL (≥ 3)

Depends:

R (≥ 4.0)

Imports:

data.table (≥ 1.14.1), stringr (≥ 1.4.0), readr (≥ 1.4.0), parallelly (≥ 1.36.0), foreach (≥ 1.5.1), future (≥ 1.33.1), doFuture (≥ 1.0.1), progressr (≥ 0.14.0)

RoxygenNote:

7.3.2

NeedsCompilation:

Suggests:

testthat (≥ 3.0.0), reticulate (≥ 1.20), knitr, rmarkdown, covr

Encoding:

UTF-8

URL:

https://github.com/martonkolossvary/parseRPDR

BugReports:

https://github.com/martonkolossvary/parseRPDR/issues

Config/testthat/edition:

Config/testthat/parallel:

false

Config/testthat/start-first:

load_all_data, create_img_db, find_exam, load_*, convert_*

Packaged:

2025-01-19 17:55:35 UTC; mjk2

Author:

Marton Kolossvary [aut, cre]

Repository:

CRAN

Date/Publication:

2025-01-19 18:10:02 UTC

Legacy function to create a vector of all possible IDs for mi2b2 workbench

Description

Legacy function to gather all possible MGH and BWH IDs from mrn.txt and con.txt input sources to provide a vector of all possible MGH or BWH IDs to be used as a data request for mi2b2 workbench.

Usage

all_ids_mi2b2(type = "MGH", d_mrn, d_con)

Arguments

type

string, either "MGH" or "BWH" specifying which IDs to use.

d_mrn

data.table, parsed mrn dataset using the load_mrn function.

d_con

data.table, parsed con dataset using the load_con function.

Value

vector, with all MGH or BWH IDs that occur in the con and mrn datasources for all patients. Previously this was required to for mi2b2 workbenches allowing access to all possible images of the patients, even if the MGH or BWH changed over time.

Examples

## Not run: 
all_MGH_mrn <- all_ids_mi2b2(type = "MGH", d_mrn = data_mrn, d_con = data_con)

## End(Not run)

Searches diagnosis columns for given diseases.

Description

Analyzes diagnosis data loaded using load_dia. Searches diagnosis columns for a specified set of diseases. By default, the data.table is returned with new columns corresponding to boolean values, whether given group of diagnoses are present among the diagnoses. If collapse is given, then the information is aggregated based-on the collapse column and the earliest of latest time of the given diagnosis is provided.

Usage

convert_dia(
  d,
  code = "dia_code",
  code_type = "dia_code_type",
  codes_to_find = NULL,
  collapse = NULL,
  code_time = "time_dia",
  aggr_type = "earliest",
  nThread = parallel::detectCores() - 1
)

Arguments

d

data.table, database containing diagnosis information data loaded using the load_dia function.

code

string, column name of the diagnosis code column. Defaults to dia_code.

code_type

string, column name of the code_type column. Defaults to dia_code_type.

codes_to_find

list, a list of string arrays corresponding to sets of code types and codes separated by :, i.e.: "ICD9:250.00". The function searches for the given disease code type and code pair and adds new boolean columns with the name of each list element. These columns are indicators whether any of the disease code type and code pair occurs in the set of codes.

collapse

string, a column name on which to collapse the data.table. Used in case we wish to assess whether given disease codes are present within all the same instances of collapse. See vignette for details.

code_time

string, column name of the time column. Defaults to time_dia. Used in case collapse is present to provide the earliest or latest instance of diagnosing the given disease.

aggr_type

string, if multiple diagnoses are present within the same case of collapse, which timepoint to return. Supported are: "earliest" or "latest". Defaults to earliest.

nThread

integer, number of threads to use for parallelization. If it is set to 1, then no parallel backends are created and the function is executed sequentially.

Value

data.table, with indicator columns whether the any of the given diagnoses are reported. If collapse is present, then only unique ID and the summary columns are returned.

Examples

## Not run: 
#Search for Hypertension and Stroke ICD codes
diseases <- list(HT = c("ICD10:I10"), Stroke = c("ICD9:434.91", "ICD9:I63.50"))
data_dia_parse <- convert_dia(d = data_dia, codes_to_find = diseases, nThread = 2)

#Search for Hypertension and Stroke ICD codes and summarize per patient providing earliest time
diseases <- list(HT = c("ICD10:I10"), Stroke = c("ICD9:434.91", "ICD9:I63.50"))
data_dia_disease <-  convert_dia(d = data_dia, codes_to_find = diseases, nThread = 2,
collapse = "ID_MERGE", aggr_type = "earliest")

## End(Not run)

Searches columns for given diseases defined by ICD codes.

Description

Analyzes encounter data loaded using load_enc. Converts columns with ICD codes and text to simple ICD codes. If requested, the data.table is returned with new columns corresponding to boolean values, whether given group of diagnoses are present in the given columns. If collapse is given, then the information is aggregated based-on the collapse column and the earliest of latest time of the given diagnosis is provided.

Usage

convert_enc(
  d,
  code = c("enc_diag_admit", "enc_diag_princ", paste0("enc_diag_", 1:10)),
  keep = FALSE,
  codes_to_find = NULL,
  collapse = NULL,
  code_time = "time_enc_admit",
  aggr_type = "earliest",
  nThread = parallel::detectCores() - 1
)

Arguments

d

data.table, database containing encounter information data loaded using the load_enc function.

code

string vector, an array of column names to convert to simple ICD codes. The new column names will be the old one with ICD_ added to the beginning of it.

keep

boolean, whether to keep original columns that were converted. Defaults to FALSE.

codes_to_find

list, a list of arrays corresponding to sets of ICD codes. The function searches the columns in code and new boolean columns with the name of each list element will be created. These columns are indicators whether the given disease is present in the set of ICD codes or not.

collapse

string, a column name on which to collapse the data.table. Used in case we wish to assess whether given diagnoses are present within all the same instances of collapse. See vignette for details.

code_time

string, column name of the time column. Defaults to time_enc_admit. Used in case collapse is present to provide the earliest or latest instance of diagnosing the given disease.

aggr_type

string, if multiple diagnoses are present within the same case of collapse, which timepoint to return. Supported are: "earliest" or "latest". Defaults to earliest.

nThread

integer, number of threads to use for parallelization. If it is set to 1, then no parallel backends are created and the function is executed sequentially.

Value

data.table, with formatted ICD code columns and possibly indicator columns if provided. If collapse is present, then only unique ID and the summary columns are returned.

Examples

## Not run: 
#Parse encounter ICD columns and keep original ones as well
data_enc_parse <- convert_enc(d = data_enc, keep = TRUE, nThread = 2)

#Parse encounter ICD columns and discard original ones,
#and create indicator variable for the following diseases
diseases <- list(HT = c("I10"), Stroke = c("434.91", "I63.50"))
data_enc_disease <-  convert_enc(d = data_enc, keep = FALSE,
codes_to_find = diseases, nThread = 2)

#Parse encounter ICD columns and discard original ones
#and create indicator variables for the following diseases and summarize per patient,
#whether there are any encounters where the given diseases were registered
diseases <- list(HT = c("I10"), Stroke = c("434.91", "I63.50"))
data_enc_disease <-  convert_enc(d = data_enc, keep = FALSE,
codes_to_find = diseases, nThread = 2, collapse = "ID_MERGE")

## End(Not run)

Converts lab results to normal/abnormal based-on reference values.

Description

Analyzes laboratory data loaded using load_lab. Converts laboratory results to values without ">" or "<" by creating a column where these characters are removed. Furthermore, adds two indicator columns where based-on the reference ranges or the Abnormal_Flag column in RPDR (lab_result_abn using load_lab), the value is considered normal or abnormal.

Usage

convert_lab(
  d,
  code_results = "lab_result",
  code_reference = "lab_result_range",
  code_flag = "lab_result_abn"
)

Arguments

d

data.table, database containing laboratory results data loaded using the load_lab function.

code_results

string vector, column name containing the results. Defaults to: "lab_result".

code_reference

string vector, column name containing the reference ranges. Defaults to: "lab_result_range".

code_flag

string vector, column name containing the abnormal flags. Defaults to: "lab_result_abn".

Value

data.table, with three additional columns: "lab_result_pretty" containing numerical results. In case of ">" or "<" notation, the numeric value is returned, as we only have information that it is at least as much or not larger than a given value. The other column: "lab_result_abn_pretty" can take values: NORMAL/ABNORMAL, depending on whether the value is within the reference range. Please be aware that there can be very different representations of values, and in some cases this will result in misclassification of values. The third column: "lab_result_abn_flag_pretty" gives abnormal if the original Abnormal_Flag column contains any information. Borderline values are considered NORMAL.

Examples

## Not run: 
#Convert loaded lab results
data_lab_pretty <- convert_lab(d = data_lab)
data_lab_pretty[, c("lab_result", "lab_result_pretty", "lab_result_range",
"lab_result_abn_pretty", "lab_result_abn_flag_pretty")]

## End(Not run)

Adds boolean columns corresponding to a group of medications whether it is present in the given row.

Description

Analyzes medication data loaded using load_med. By default, the data.table is returned with new columns corresponding to boolean values, whether given group of medications are present. If collapse is given, then the information is aggregated based-on the collapse column and the earliest of latest time of the given medication is provided.

Usage

convert_med(
  d,
  code = "med",
  codes_to_find = NULL,
  collapse = NULL,
  code_time = "time_med",
  aggr_type = "earliest",
  nThread = parallel::detectCores() - 1
)

Arguments

d

data.table, database containing medication data loaded using the load_med function.

code

string, column name of the medication column. Defaults to med.

codes_to_find

list, a list of arrays corresponding to sets of medication names. New boolean columns with the name of each list element will be created. These columns are indicators whether the given medication is present in the set of medication names or not.

collapse

string, a column name on which to collapse the data.table. Used in case we wish to assess whether given medications are present within all the same instances of collapse. See vignette for details.

code_time

string, column name of the time column. Defaults to time_med. Used in case collapse is present to provide the earliest or latest instance of diagnosing the given disease.

aggr_type

string, if multiple occurences of the medications are present within the same case of collapse, which timepoint to return. Supported are: "earliest" or "latest". Defaults to earliest.

nThread

integer, number of threads to use for parallelization. If it is set to 1, then no parallel backends are created and the function is executed sequentially.

Value

data.table, with indicator columns whether given group of codes_to_find is present or not. If collapse is present, then only unique ID and the summary columns are returned.

Examples

## Not run: 
#Define medication group and add an indicator column whether
#the given medication group was administered
meds <- list(statin = c("Simvastatin", "Atorvastatin"),
             NSAID  = c("Acetaminophen", "Paracetamol"))

data_med_indic <- convert_med(d = data_med, codes_to_find = meds, nThread = 1)

#Summarize per patient if they ever had the given medication groups registered
data_med_indic_any <- convert_med(d = data_med,
codes_to_find = meds, collapse = "ID_MERGE", nThread = 2)

## End(Not run)

Extracts information from notes free text.

Description

Analyzes notes loaded using load_notes or load_lno. Extracts information from the free text present in abc_rep_txt, where abc stands for the three letter abbreviation of the given type of note. An array of string is provided using the anchors argument. The function will return as many columns as there are anchor points. Each column will contain the text between the given anchor point and the next following anchor point. This way the free text report is split into corresponding smaller texts. By default, these are the common standard elements of given note types. Here are provided potential anchor points for the given types of notes:

Cardiology:: c("Report Number:", "Report Status:", "Type:", "Date:", "Ordering Provider:", "SYSTOLIC BLOOD PRESSURE", "DIASTOLIC BLOOD PRESSURE", "VENTRICULAR RATE EKG/MIN", "ATRIAL RATE", "PR INTERVAL", "QRS DURATION", "QT INTERVAL", "QTC INTERVAL", "P AXIS", "R AXIS", "T WAVE AXIS", "LOC", "DX:", "REF:", "Electronically Signed", "report_end")
Discharge:: c("***This text report", "Patient Information", "Physician Discharge Summary", "Surgeries this Admission", "Items for Post-Hospitalization Follow-Up:", "Pending Results", "Hospital Course", "ED Course:", "Diagnosis", "Prescriptions prior to admission", "Family History:", "Physical Exam on Admission:", "Discharge Exam", "report_end")
Endoscopy:: c("NAME:", "DATE:", "Patient Information", "report_end")
History & Physical:: c("***This text report", "Patient Information", "H&P by", "Author:", "Service:", "Author Type:", "Filed:", "Note Time:", "Status:", "Editor:", "report_end")
Operative:: c("NAME:", "UNIT NO:, "DATE:", "SURGEON:", "ASST:", "PREOPERATIVE DIAGNOSIS:", "POSTOPERATIVE DIAGNOSIS:", "NAME OF OPERATION:", "ANESTHESIA:", "INDICATIONS", "OPERATIVE FINDINGS:", "DESCRIPTION OF PROCEDURE:", "Electronically Signed", "report_end")
Pathology:: c("Accession Number:", "Report Status:", "Type:", "Report:", "CASE:", "PATIENT:", "Date", "Source Care Unit:", "Path Subspecialty Service:", "Results To:", "Signed Out by:", "CLINICAL DATA:", "FINAL DIAGNOSIS:", "GROSS DESCRIPTION:", "report_end")
Progress:: c("***This text report", "Patient Information", "History", "Overview", "Progress Notes", "Medications", "Relevant Orders", "Level of Service", "report_end")
Pulmonary:: c("The Pulmonary document", "Name:", "Unit #:", "Date:", "Location:", "Smoking Status:", "Pack Years:", "SPIROMETRY:", "LUNG VOLUMES:", "DIFFUSION:", "PLETHYSMOGRAPHY:" "Pulmonary Function Test Interpretation", "Spirometry", "report_end")
Radiology:: c("Exam Code", "Ordering Provider", "HISTORY", "Associated Reports", "Report Below", "REASON", "REPORT", "TECHNIQUE", "COMPARISON", "FINDINGS", "IMPRESSION", "RECOMMENDATION", "SIGNATURES", "report_end")
Visit:: c("***This text report", "Reason for Visit", "Reason for Visit", "Vital Signs", "Chief Complaint", "History", "Overview", "Medications", "Relevant Orders", "Level of Service", "report_end"
LMR:: c("Subject", "Patient Name:", "Reason for visit", "report_end"

However, these may be modified and extended to include sections of interest, i.e. if a given score is reported in a standard fashion, then adding this phrase (i.e. "CAD-RADS") would create a column where the text following this statement is returned. After this the resulting columns can be easily cleaned up if needed. Be aware to always include "report_end" in the anchors array, to provide the function of the last occurring statement in the report.

Usage

convert_notes(
  d,
  code = NULL,
  anchors = NULL,
  nThread = parallel::detectCores() - 1
)

Arguments

d

data.table, database containing notes loaded using the load_notes function.

code

string vector, column name containing the results, which should be "abc_rep_txt", where abc stands for the three letter abbreviation of the given type of note.

anchors

string array, elements to search for in the text report.

nThread

integer, number of threads to use for parallelization. If it is set to 1, then no parallel backends are created and the function is executed sequentially.

Value

data.table, with new columns corresponding to elements in anchors.

Examples

## Not run: 
#Create columns with specific parts of the radiological report defined by anchors
data_rad_parsed <- convert_notes(d = data_rad, code = "rad_rep_txt",
anchors = c("Exam Code", "Ordering Provider", "HISTORY", "Associated Reports",
"Report Below", "REASON", "REPORT", "TECHNIQUE", "COMPARISON", "FINDINGS",
"IMPRESSION", "RECOMMENDATION", "SIGNATURES", "report_end"), nThread = 2)

## End(Not run)

Searches health history data for given codes

Description

Analyzes health history data loaded using load_phy. Searches health history columns for a specified set of codes. By default, the data.table is returned with new columns corresponding to boolean values, whether given group of health history data are present within the respective columns. If collapse is given, then the information is aggregated based-on the collapse column and the earliest of latest time of the given diagnosis is provided.

Usage

convert_phy(
  d,
  code = "phy_code",
  code_type = "phy_code_type",
  codes_to_find = NULL,
  collapse = NULL,
  code_time = "time_phy",
  aggr_type = "earliest",
  nThread = parallel::detectCores() - 1
)

Arguments

d

data.table, database containing health history information data loaded using the load_phy function.

code

string, column name of the diagnosis code column. Defaults to phy_code.

code_type

string, column name of the code_type column. Defaults to phy_code_type.

codes_to_find

list, a list of string arrays corresponding to sets of code types and codes separated by :, i.e.: "LMR:3688". The function searches for the given health history code type and code pair and adds new boolean columns with the name of each list element. These columns are indicators whether any of the health history code type and code pair occurs in the set of codes.

collapse

string, a column name on which to collapse the data.table. Used in case we wish to assess whether multiple health history codes are present within all the same instances of collapse. See vignette for details.

code_time

string, column name of the time column. Defaults to time_phy. Used in case collapse is present to provide the earliest or latest instance of health history information.

aggr_type

string, if multiple health histories are present within the same case of collapse, which timepoint to return. Supported are: "earliest" or "latest". Defaults to earliest.

nThread

integer, number of threads to use for parallelization. If it is set to 1, then no parallel backends are created and the function is executed sequentially.

Value

data.table, with indicator columns whether the any of the given health histories are reported. If collapse is present, then only unique ID and the summary columns are returned.

Examples

## Not run: 
#Search for Height and Weight codes
anthropometrics <- list(Weight = c("LMR:3688", "EPIC:WGT"), Height = c("LMR:3771", "EPIC:HGT"))
data_phy_parse <- convert_phy(d = data_phy, codes_to_find = anthropometrics, nThread = 2)

#Search for for Height and Weight codes and summarize per patient providing earliest time
anthropometrics <- list(Weight = c("LMR:3688", "EPIC:WGT"), Height = c("LMR:3771", "EPIC:HGT"))
data_phy_parse <- convert_phy(d = data_phy, codes_to_find = anthropometrics, nThread = 2,
collapse = "ID_MERGE", aggr_type = "earliest")

## End(Not run)

Searches procedures columns for given procedures.

Description

Analyzes procedure data loaded using load_prc. Searches procedures columns for a specified set of procedures. By default, the data.table is returned with new columns corresponding to boolean values, whether given group of procedures are present in the given procedure. If collapse is given, then the information is aggregated based-on the collapse column and the earliest of latest time of the given procedure is provided.

Usage

convert_prc(
  d,
  code = "prc_code",
  code_type = "prc_code_type",
  codes_to_find = NULL,
  collapse = NULL,
  code_time = "time_prc",
  aggr_type = "earliest",
  nThread = parallel::detectCores() - 1
)

Arguments

d

data.table, database containing procedure information data loaded using the load_prc function.

code

string, column name of the procedure code column. Defaults to prc_code.

code_type

string, column name of the code_type column. Defaults to prc_code_type.

codes_to_find

list, a list of string arrays corresponding to sets of code types and codes separated by :, i.e.: "CPT:00104". The function searches for the given procedure code type and code pair and adds new boolean columns with the name of each list element. These columns are indicators whether any of the procedure code type and code pair occurs in the set of codes.

collapse

string, a column name on which to collapse the data.table. Used in case we wish to assess multiple procedure codes are present within all the same instances of collapse. See vignette for details.

code_time

string, column name of the time column. Defaults to time_prc. Used in case collapse is present to provide the earliest or latest instance of the given procedure.

aggr_type

string, if multiple procedures are present within the same case of collapse, which timepoint to return. Supported are: "earliest" or "latest". Defaults to earliest.

nThread

integer, number of threads to use for parallelization. If it is set to 1, then no parallel backends are created and the function is executed sequentially.

Value

data.table, with indicator columns whether the any of the given procedures are reported. If collapse is present, then only unique ID and the summary columns are returned.

Examples

## Not run: 
#Search for Anesthesia CPT codes
procedures <- list(Anesthesia = c("CTP:00410", "CPT:00104"))
data_prc_parse <- convert_prc(d = data_prc, codes_to_find = procedures, nThread = 2)

#Search for Anesthesia CPT codes
procedures <- list(Anesthesia = c("CTP:00410", "CPT:00104"))
data_prc_procedures <- convert_prc(d = data_prc, codes_to_find = procedures,
nThread = 2, collapse = "ID_MERGE", aggr_type = "earliest")

## End(Not run)

Searches columns for given reason for visit defined by ERFV codes.

Description

Analyzes reason for visit data loaded using load_rfv. If requested, the data.table is returned with new columns corresponding to boolean values, whether given group of ERFV are present in the given columns. If collapse is given, then the information is aggregated based-on the collapse column and the earliest of latest time of the given reason for visit is provided.

Usage

convert_rfv(
  d,
  code = "rfv_concept_id",
  codes_to_find = NULL,
  collapse = NULL,
  code_time = "time_rfv_start",
  aggr_type = "earliest",
  nThread = parallel::detectCores() - 1
)

Arguments

d

data.table, database containing reason for visit information data loaded using the load_rfv function.

code

string vector, an array of column names to search.

codes_to_find

list, a list of arrays corresponding to sets of ERFV codes. The function searches the columns in code and the name of each list element will be created. These columns are indicators whether the given disease is present in the set of ERFV codes or not.

collapse

string, a column name on which to collapse the data.table. Used in case we wish to assess whether multiple ERFV are present within the same instances of collapse. See vignette for details.

code_time

string, column name of the time column. Defaults to time_rfv_start. Used in case collapse is present to provide the earliest or latest instance of reason for visit.

aggr_type

string, if multiple reason for visits are present within the same case of collapse, which timepoint to return. Supported are: "earliest" or "latest". Defaults to earliest.

nThread

integer, number of threads to use for parallelization. If it is set to 1, then no parallel backends are created and the function is executed sequentially.

Value

data.table, with indicator columns if provided. If collapse is present, then only unique ID and the summary columns are returned.

Examples

## Not run: 
#Parse reason for visit columns
#and create indicator variables for the following reasons and summarize per patient,
#whether there are any encounters where the given reasons were registered
reasons <- list(Pain = c("ERFV:160357", "ERFV:140012"), Visit = c("ERFV:501"))
data_rfv_disease <-  convert_rfv(d = data_rfv, keep = FALSE,
codes_to_find = reasons, nThread = 2, collapse = "ID_MERGE")

## End(Not run)

Create a database of DICOM headers.

Description

The function creates a database of DICOM headers present in a folder structure. Each series should be in its own folder, but they can be in a nested folder structure. Files where there are also folder present next to them at the same level will not be parsed. That is the folder structure needs to comply with the DICOM standard. Be aware that the function requires python and pydicom to be installed! The function cycles through all folders present in the provided path and recursively goes through them, every subfolder, and extracts the DICOM header information from the files using the dcmread function of the pydicom package. The extension of the files can be provided by the ext argument, as DICOM files may have different extensions then that of .dcm. Also, using the all boolean argument, you can specify whether the function provides output for each file, or only for the first file, which is beneficial if you are analyzing multi-slice series, as all instances have almost all the same header information. Furthermore, using the keywords argument you can manually specify which DICOM keywords you wish to extract. These need to be a valid keyword specified in the DICOM standard.

Usage

create_img_db(
  path,
  ext = c(".dcm", ".dicom", ".ima", ".tmp", ""),
  all = TRUE,
  keywords = c("StudyDate", "StudyTime", "SeriesDate", "SeriesTime", "AcquisitionDate",
    "AcquisitionTime", "ConversionType", "Manufacturer", "InstitutionName",
    "InstitutionalDepartmentName", "ReferringPhysicianName", "Modality",
    "ManufacturerModelName", "StudyDescription", "SeriesDescription", "StudyComments",
    "ProtocolName", "RequestedProcedureID", "ViewPosition", "StudyInstanceUID",
    "SeriesInstanceUID", "SOPInstanceUID", "AccessionNumber", "PatientName", "PatientID",
    "IssuerOfPatientID", "PatientBirthDate", 
     "PatientSex", "PatientAge",
    "PatientSize", "PatientWeight", "StudyID", "SeriesNumber", "AcquisitionNumber",
    "InstanceNumber", "BodyPartExamined", "SliceThickness", "SpacingBetweenSlices",
    "PixelSpacing", "PixelAspectRatio", "Rows", "Columns", "FieldOfViewDimensions",
    "RescaleIntercept", "RescaleSlope", "WindowCenter", "WindowWidth", "BitsAllocated",
    "BitsStored", "PhotometricInterpretation", "KVP", "ExposureTime", "XRayTubeCurrent",
    "ExposureInuAs", "ImageAndFluoroscopyAreaDoseProduct", "FilterType", 
    
    "ConvolutionKernel", "CTDIvol", "ReconstructionFieldOfView"),
  nThread = parallel::detectCores() - 1,
  na = TRUE,
  identical = TRUE
)

Arguments

path

string vector, full folder path to folder that contains the images.

ext

string array, possible file extensions to parse. It is advised to add . before the extensions as the given character patterns may be present elsewhere in the file names. Furthermore, if DICOM files without an extension should also be parsed, then add "" to the extensions as then the script will try to read all files without an extension. Also, the file names and the extensions are converted to lower case before matching to avoid mismatches due to capitals.

all

boolean, whether all files in a series should be parsed, or only the first one.

keywords

string array, of valid DICOM keywords.

nThread

integer, number of threads to use for parsing data.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

Value

data.table, with DICOM header information return unchanged. However, the function also provides additional new columns which help further data manipulations, these are:

time_study: POSIXct, StudyDate and StudyTime concatentated together to POSIXct.
time_series: POSIXct, SeriesDate and SeriesTime concatentated together to POSIXct.
time_acquisition: POSIXct, AcquisitionDate and AcquisitionTime concatentated together to POSIXct.
name_img: string, PatientName with special characters removed.
time_date_of_birth_img: POSIXct, PatientBirthDate as POSIXct.
img_pixel_spacing: numeric, PixelSpacing value of the first element in the array returned as numerical value.

Examples

## Not run: 
#Create a database with DICOM header information
all_dicom_headers <- create_img_db(path = "/Users/Test/Data/DICOM/")
all_dicom_headers <- create_img_db(path = "/Users/Test/Data/DICOM/", ext = c(".dcm", ".DICOM"))
#Create a database with DICOM header information for only IDs and accession numbers
all_dicom_headers <- create_img_db(path = "/Users/Test/Data/DICOM/",
keywords = c("PatientID", "AccessionNumber"))

## End(Not run)

Internal function to create a database of DICOM headers.

Description

Usage

dcm_db(path, ext, all, keywords, nThread, pydicom)

Arguments

path

string vector, full folder path to folder that contains the images.

ext

all

boolean, whether all files in a series should be parsed, or only the first one.

keywords

string array, of valid DICOM keywords.

nThread

integer, number of threads to use for parsing data.

pydicom

package, pydicom package initiated from parent environment.

Value

data.table, with DICOM header information. This is then used by create_img_db which formats the output.

Exports free text notes to individual text files.

Description

Exports out the contents of a given cell per row into individual text files. Can be used to export out reports into individual text files for further analyses.

Usage

export_notes(d, folder, code, name1 = "ID_MERGE", name2)

Arguments

d

data.table, database containing notes loaded using the load_notes function. Theoretically any other data.table can be given and the contents of the specified cell will be exported into the corresponding files. In case of notes, it is advised to load them with format_orig = TRUE, as then the output will retain the original format of the report making it easier to read.

folder

string, full folder path to folder where the files should be exported. If folder does not exist, the function stops.

code

string vector, column name containing the data that should be exported. Generally should be "abc_rep_txt", where abc stands for the three letter abbreviation of the given type of note.

name1

string, the first part of the file names. Defaults to ID_MERGE.

name2

string, the second part of the file names. name1 and name2 will be separated using "_". Generally should be "abc_rep_num", where abc stands for the three letter abbreviation of the given type of note.

Value

NULL, files are exported to given folder.

Examples

## Not run: 
#Output all cardiology notes to given folder
d <- load_notes("Car.txt", type = "car", nThread = 2, format_orig = TRUE)
export_notes(d, folder = "/Users/Test/Notes/", code = "car_rep_txt",
name1 = "ID_MERGE", name2 = "car_rep_num")

## End(Not run)

Find exam data within a given timeframe using parallel CPU computing.

Description

Finds all, earliest or closest examination to a given timepoints using parallel computing. A progress bar is also reported in the terminal to show the progress of the computation.

Usage

find_exam(
  d_from,
  d_to,
  d_from_ID = "ID_MERGE",
  d_to_ID = "ID_MERGE",
  d_from_time = "time_rad_exam",
  d_to_time = "time_enc_admit",
  time_diff_name = "timediff_exam_to_db",
  before = TRUE,
  after = TRUE,
  time = 1,
  time_unit = "days",
  multiple = "closest",
  add_column = NULL,
  keep_data = FALSE,
  nThread = parallel::detectCores() - 1,
  shared_RAM = FALSE
)

Arguments

d_from

data table, the database which is searched to find examinations within the timeframe.

d_to

data table, the database to which we wish to find examinations within the timeframe.

d_from_ID

string, column name of the patient ID column in d_from. Defaults to ID_MERGE.

d_to_ID

string, column name of the patient ID column in d_to. Defaults to ID_MERGE.

d_from_time

string, column name of the time variable column in d_from. Defaults to time_rad_exam.

d_to_time

string, column name of the time variable column in d_to. Defaults to time_enc_admit.

time_diff_name

string, column name of the new column created which holds the time difference between the exam and the time provided by d_to. Defaults to timediff_exam_to_db.

before

boolean, should times before the given time be considered. Defaults to TRUE.

after

boolean, should times after the given time be considered. Defaults to TRUE.

time

integer, the timeframe considered between the exam and the d_to timepoints. Defaults to 1.

time_unit

string, the unit of time used. Time variables are in d_to and d_from are truncated to the supplied time unit. For example: "2005-09-18 08:15:01 PDT" would be truncated to "2005-09-18 PDT" if time_unit is set to days. Then the time differences is calculated using difftime passing the argument to units. The following time units are supported: "secs", "mins", "hours", "days", "months" and "years" are supported. Defautls to days.

multiple

string, which exams to give back. closest gives back the exam closest to the time provided by d_to. all gives back all occurrences within the timeframe. earliest the earliest exam within the timeframe. In case of ties for closest or earliest, all are returned. Defaults to closest.

add_column

string, a column name in d_to to add to the output. Defaults to NULL.

keep_data

boolean, whether to include empty rows with only the d_from_ID column filed out for cases that have data in the d_from, but not within the time range. Defaults to FALSE.

nThread

integer, number of threads to use for parallelization. If it is set to 1, then no parallel backends are created and the function is executed sequentially.

shared_RAM

boolean, depreciated from version 1.1.0 onwards, only kept for compatibility, as Bigmemory package has issues on running on different operating systems. Now all computations are run using the memory usage specifications of the paralellization strategy.

Value

data table, with d_from filtered to ones only within the timeframe. The columns of d_from are returned with the corresponding time column in data_to where the rows are instances which comply with the time constraints specified by the function. An additional column specified in time_diff_name is also returned, which shows the time difference between the time column in d_from and d_to for that given case. Also the time column from d_to specified by d_to_time is returned under the name of time_to_db. An additional column specified in add_column may be added from data_to to the data table.

Examples

## Not run: 
#Filter encounters for first emergency visits at one of MGH's ED departments
data_enc_ED <- data_enc[enc_clinic == "MGH EMERGENCY 10020010608"]
data_enc_ED <- data_enc_ED[!duplicated(data_enc_ED$ID_MERGE)]

#Find all radiological examinations within 3 day of the ED registration
rdt_ED <- find_exam(d_from = data_rdt, d_to = data_enc_ED,
d_from_ID = "ID_MERGE", d_to_ID = "ID_MERGE",
d_from_time = "time_rdt_exam", d_to_time = "time_enc_admit", time_diff_name = "time_diff_ED_rdt",
before = TRUE, after = TRUE, time = 3, time_unit = "days", multiple = "all",
nThread = 2)

#Find earliest radiological examinations within 3 day of the ED registration
rdt_ED <- find_exam(d_from = data_rdt, d_to = data_enc_ED,
d_from_ID = "ID_MERGE", d_to_ID = "ID_MERGE",
d_from_time = "time_rdt_exam", d_to_time = "time_enc_admit", time_diff_name = "time_diff_ED_rdt",
before = TRUE, after = TRUE, time = 3, time_unit = "days", multiple = "earliest",
nThread = 2)

#Find closest radiological examinations on or after 1 day of the ED registration
#and add primary diagnosis column from encounters
rdt_ED <- find_exam(d_from = data_rdt, d_to = data_enc_ED,
d_from_ID = "ID_MERGE", d_to_ID = "ID_MERGE",
d_from_time = "time_rdt_exam", d_to_time = "time_enc_admit", time_diff_name = "time_diff_ED_rdt",
before = FALSE, after = TRUE, time = 1, time_unit = "days", multiple = "earliest",
add_column = "enc_diag_princ", nThread = 2)

#Find closest radiological examinations on or after 1 day of the ED registration
#but also provide empty rows for patients with exam data but not within the timeframe
rdt_ED <- find_exam(d_from = data_rdt, d_to = data_enc_ED,
d_from_ID = "ID_MERGE", d_to_ID = "ID_MERGE",
d_from_time = "time_rdt_exam", d_to_time = "time_enc_admit", time_diff_name = "time_diff_ED_rdt",
before = FALSE, after = TRUE, time = 1, time_unit = "days", multiple = "earliest",
add_column = "enc_diag_princ", keep_data = TRUE nThread = 2)

## End(Not run)

Find exam data within a given timeframe using parallel CPU computing without shared RAM management.

Description

Finds all, earliest or closest examination to a given timepoints using parallel computing. A progress bar is also reported in the terminal to show the progress of the computation.

Usage

find_exam_ram(
  d_from,
  d_to,
  d_from_ID = "ID_MERGE",
  d_to_ID = "ID_MERGE",
  d_from_time = "time_rad_exam",
  d_to_time = "time_enc_admit",
  time_diff_name = "timediff_exam_to_db",
  before = TRUE,
  after = TRUE,
  time = 1,
  time_unit = "days",
  multiple = "closest",
  add_column = NULL,
  keep_data = FALSE,
  nThread = parallel::detectCores() - 1
)

Loads allergy data information into R.

Description

Loads allergy information into the R environment.

Usage

load_all(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to All.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

string, indicating whether to modify MRN length based-on required values id_length = standard, or to keep lengths as is id_length = asis. If id_length = standard then in case of MGH, BWH, MCL, EMPI and PMRN the length of the MRNs are corrected accordingly by adding zeros, or removing numeral from the beginning. In other cases the lengths are unchanged. Defaults to standard.

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with allergy information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_all_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from all datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_all_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from all datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_all_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_all: POSIXct, Date when the allergy was first noted, corresponds to Noted_Date in RPDR. Converted to POSIXct format.
all_all: string, Name of the allergen, corresponds to Allergen in RPDR.
all_all_code: string, Epic internal identifier for the specific allergen, corresponds to Allergen_Code in RPDR.
all_all_type: string, Hierarchy for the type of allergy noted. Denotes known level of specificity of allergen, corresponds to Allergen_Type in RPDR.
all_reac: string, Noted reactions to the allergen, corresponds to Reactions in RPDR.
all_reac_type: string, Category of reaction to the allergen, corresponds to Reaction_Type in RPDR.
all_severity: string, Degree of severity of noted reactions, corresponds to Severity in RPDR.
all_status: string, Last known status of allergen, either active or deleted from the patient's allergy record, corresponds to Status in RPDR.
all_system: string, The source system where the data was collected, corresponds to System in RPDR.
all_comment: string, Free-text information about the allergen, corresponds to Comments in RPDR.
all_del_reason: string, Free-text information about why the allergen was removed from the patient's allergy list, corresponds to Deleted_Reason in RPDR.

Examples

## Not run: 
#Using defaults
d_all <- load_all(file = "test_All.txt")

#Use sequential processing
d_all <- load_all(file = "test_All.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_all <- load_all(file = "test_All.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads all RPDR text outputs into R.

Description

Loads all RPDR text outputs into R and returns a list of data tables processed. If multiple text files of the same type are available (if the query is larger than 25000 patients), then add a "_" and a number to merge the same data sources into a single output in the order of the provided number.

Usage

load_all_data(
  folder,
  which_data = c("mrn", "con", "dem", "all", "bib", "dia", "enc", "lab", "lno", "mcm",
    "med", "mic", "phy", "prc", "prv", "ptd", "rdt", "rfv", "trn", "car", "dis", "end",
    "hnp", "opn", "pat", "prg", "pul", "rad", "vis"),
  old_dem = FALSE,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  many_sources = TRUE,
  load_report = TRUE,
  format_orig = FALSE
)

Arguments

folder

string, full folder path to RPDR text files.

which_data

string vector, an array of abbreviation corresponding to the datasources wished to load.

old_dem

boolean, should old load_dem function be used for loading demographic data. Defaults to TRUE, should be set to FALSE for Dem.txt datasets prior to 2022.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EMPI, as it is the preferred MRN in the RPDR system. In case of mrn dataset, leave at EMPI, as it is automatically converted to: "Enterprise_Master_Patient_Index".

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use for parallelization.

many_sources

boolean, if TRUE, then parallelization is done on the level of the datasources. If FALSE, then parallelization is done within the datasources. If there are many datasources, then it is advised to set this TRUE, as then each different datasource will be processed in parallel. However, if there are only a few datasources selected to load, but many files per datasource (result of large queries), then it may be faster to parallelize within each datasource and therefore should be set to FALSE. If there are only a few sources each with one file then set to TRUE.

load_report

boolean, should the report text be returned for notes. Defaults to TRUE.

format_orig

boolean, should report be returned in its original formatting or should white spaces used for formatting be removed. Defaults to FALSE.

Value

list of parsed data tables containing the information.

Examples

## Not run: 
#Load all Con, Dem and Mrn datasets processing all files within given datasource in parallel
load_all_data(folder = folder_rpdr, which_data = c("con", "dem", "mrn"),
nThread = 2, many_sources = FALSE)

#Load all supported file types parallelizing on the level of datasources
load_all_data(folder = folder_rpdr, nThread = 2, many_sources = TRUE,
format_orig = TRUE)

## End(Not run)

Helper function for loading RPDR data into R.

Description

Helper function to load different datasources from RPDR. Should not be used on its own.

Usage

load_base(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE,
  src = "mrn",
  fill = FALSE,
  sep_load = "|"
)

Arguments

file

string, full file path to given RPDR txt.

merge_id

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

src

string, what is the three letter source ID of the file, such as dem.

Value

data table, with minimally parsed data and the raw data.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_src_EMPI: string, EMPI IDs from src datasource, if the datasource is not mrn. Data is formatted using pretty_mrn().
ID_src_PMRN: string, PMRN IDs from src datasource, if the datasource is not mrn. Data is formatted using pretty_mrn().
ID_scr_loc: string, from datasource src, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().

Loads BiobankFile data into R.

Description

Loads Biobank file data into the R environment.

Usage

load_bib(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Bib.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept. Not used for loading mrn data.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with BiobankFile data.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_bib_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_bib_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information, corresponds to Enterprise_Master_Patient_Index in RPDR. Data is formatted using pretty_mrn().
ID_bib_MGH: string, Unique Medical Record Number for Mass General Hospital, corresponds to MGH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_bib_BWH: string, Unique Medical Record Number for Brigham and Women's Hospital, corresponds to BWH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_bib_FH: string, Unique Medical Record Number for Faulkner Hospital, corresponds to FH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_bib_SRH: string, Unique Medical Record Number for Spaulding Rehabilitation Hospital, corresponds to SRH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_bib_NWH: string, Unique Medical Record Number for Newton-Wellesley Hospital, corresponds to NWH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_bib_NSMC: string, Unique Medical Record Number for North Shore Medical Center, corresponds to NSMC_MRN in RPDR. Data is formatted using pretty_mrn().
ID_bib_MCL: string, Unique Medical Record Number for McLean Hospital, corresponds to MCL_MRN in RPDR. Data is formatted using pretty_mrn().
ID_bib_MEE: string, Unique Medical Record Number for Mass Eye and Ear, corresponds to MEE_MRN in RPDR. Data is formatted using pretty_mrn().
ID_bib_DFC: string, Unique Medical Record Number for Dana Farber Cancer center, corresponds to DFC_MRN in RPDR. Data is formatted using pretty_mrn(). Legacy data.
ID_bib_WDH: string, Unique Medical Record Number for Wentworth-Douglass Hospital, corresponds to WDH_MRN in RPDR. Data is formatted using pretty_mrn(). Legacy data.
bib_subject_ID: string, Biobank unique patient identifier, corresponds to Subject_ID in RPDR. ID is not formatted.
bib_subject_ID: string, This will always default to Biobank, corresponds to Registry Name in RPDR.

Examples

## Not run: 
#Using defaults
d_bib <- load_bib(file = "test_Bib.txt")

#Use sequential processing
d_bib <- load_bib(file = "test_Bib.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_bib <- load_bib(file = "test_Bib.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads contact information into R.

Description

Loads patient contact, insurance, and PCP information into the R environment.

Usage

load_con(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = TRUE
)

Arguments

file

string, full file path to Con.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to TURE only for Con.txt, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with contact information data.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_con_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from con datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_con_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from condatasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_con_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
ID_con_loc_list: string, if prevalence of IDs in Patient_ID_List > perc, then they are included in the output. Data is formatted using pretty_mrn().
name_last: string, Patient's last name, corresponds to Last_Name in RPDR.
name_first: string, Patient's first name, corresponds to First_Name in RPDR.
name_middle: string, Patient's middle name or initial, corresponds to Middle_Name in RPDR.
name_previous: string, Any alternate names on record for this patient, corresponds to Previous_Name in RPDR.
SSN: string, Social Security Number, corresponds to SSN in RPDR.
VIP: character, Special patient statuses as defined by the EMPI group, corresponds to VIP in RPDR.
address1: string, Patient's current address, corresponds to address1 in RPDR.
address2: string, Additional address information, corresponds to address2 in RPDR.
city: string, City of residence, corresponds to City in RPDR.
state: string, State of residence, corresponds to State in RPDR.
country_con: string, Country of residence from con datasource, corresponds to Country in RPDR.
zip_con: numeric, Mailing zip code of primary residence from con datasource, corresponds to Zip in RPDR. Formatted to 5 character zip codes using pretty_numbers().
direct_contact_consent: boolean, Indicates whether the patient has given permission to contact them directly through the RODY program, corresponds to Direct_Contact_Consent in RPDR. Legacy variable.
research_invitations: boolean, Indicates if a patient can be invited to participate in research, corresponds to Research_Invitations in RPDR.
phone_home: number, Patient's home phone number, corresponds to Home_Phone in RPDR. Formatted to 10 digit phone numbers using pretty_numbers().
phone_day: number, Phone number where the patient can be reached during the day, corresponds to Day_Phone in RPDR. Formatted to 10 digit phone numbers using pretty_numbers().
insurance1: string, Patient's primary health insurance carrier and subscriber ID information, corresponds to Insurance_1 in RPDR.
insurance2: string, Patient's secondary health insurance carrier and subscriber ID information, if any, corresponds to Insurance_2 in RPDR.
insurance3: string, Patient's tertiary health insurance carrier and subscriber ID information, if any, corresponds to Insurance_3 in RPDR.
primary_care_physician: string, Comma-delimited list of all primary care providers on record for this patient per institution, along with contact information (if available), corresponds to Primary_Care_Physician in RPDR.
primary_care_physician_resident: string, Comma-delimited list of any Resident primary care providers on record for this patient per institution, along with contact information (if available), corresponds to Resident _Primary_Care_Physician in RPDR.

Examples

## Not run: 
#Using defaults
d_con <- load_con(file = "test_Con.txt")

#Use sequential processing
d_con <- load_con(file = "test_Con.txt", nThread = 1)

#Use parallel processing and parse data in
#MRN_Type and MRN columns (default in load_con) and keep all IDs
d_con <- load_con(file = "test_Con.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads demographic information into R for new demographic tables following changes in the beginning of 2022.

Description

Loads patient demographic and vital status information into the R environment. Since version 0.2.2 of the software this function supports the new demographics table data definitions.

Usage

load_dem(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Dem.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with demographic information data.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_dem_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information. from dem datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_dem_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network. from dem datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_dem_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
gender_legal_sex: string, Patient's legal sex, corresponds to Gender_Legal_Sex in RPDR.
sex_at_birth: string, Patient’s sex at time of birth, corresponds to Sex_at_Birth in RPDR.
gender_identity: string, Patient's personal conception of their gender, corresponds to Gender_Identity in RPDR.
time_date_of_birth: POSIXct, Patient's date of birth, corresponds to Date_of_Birth. Converted to POSIXct format.
age: string, Patient's current age (or age at death), corresponds to Age in RPDR.
language: string, Patient's preferred spoken language, corresponds to Language in RPDR.
language_group: string, Patient's preferred language: English or Non-English, corresponds to Language_Group in RPDR.
race_1: string, Patient's primary race, corresponds to Race1 in RPDR.
race_2: string, Patient's primary race if more than one race, corresponds to Race2 in RPDR.
race_group: string, Patient's Race Group as determined by Race1 and Race2, corresponds to Race_Group in RPDR.
ethnic_group: string, Patient's Ethnicity: Hispanic or Non Hispanic, corresponds to Ethnic_Group in RPDR.
marital: string, Patient's current marital status, corresponds to Marital_Status in RPDR.
religion: string, Patient-identified religious preference, corresponds to Religion in RPDR.
veteran: string, Patient's current military veteran status, corresponds to Is_a_veteran in RPDR.
country_dem: string, Patient's current country of residence from dem datasource, corresponds to Country in RPDR.
zip_dem: string, Mailing zip code of patient's primary residence from dem datasource, corresponds to Zip_code in RPDR.Formatted to 5 character zip codes.
vital_status: string, Identifies if the patient is living or deceased. This data is updated monthly from the Partners registration system and the Social Security Death Master Index, corresponds to Vital_Status in RPDR. Punctuation marks are removed.
time_date_of_death: POSIXct, Recorded date of death from source in 'Vital_Status'. Date of death information obtained solely from the Social Security Death Index will not be reported until 3 years after death due to privacy concerns. If the value is independently documented by a Partners entity within the 3 year window then the date will be displayed. corresponds to Date_of_Death in RPDR. Converted to POSIXct format.

Examples

## Not run: 
#Using defaults
d_dem <- load_dem(file = "test_Dem.txt")

#Use sequential processing
d_dem <- load_dem(file = "test_Dem.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_dem <- load_dem(file = "test_Dem.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads demographic information into R for demographics tables before 2022.

Description

Loads patient demographic and vital status information into the R environment. Since version 0.2.2 of the software, this function supports the old demographics table data definitions and is identical to the load_dem function of previous versions of the software.

Usage

load_dem_old(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Dem.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with demographic information data.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_dem_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information. from dem datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_dem_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network. from dem datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_dem_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
gender: string, Patient's legal sex, corresponds to Gender in RPDR.
time_date_of_birth: POSIXct, Patient's date of birth, corresponds to Date_of_Birth in RPDR. Converted to POSIXct format.
age: string, Patient's current age (or age at death), corresponds to Age in RPDR.
language: string, Patient's preferred spoken language, corresponds to Language in RPDR.
race: string, Patient's primary race, corresponds to Race in RPDR.
marital: string, Patient's current marital status, corresponds to Marital_Status in RPDR.
religion: string, Patient-identified religious preference, corresponds to Religion in RPDR.
veteran: string, Patient's current military veteran status, corresponds to Is_a_veteran in RPDR.
country_dem: string, Patient's current country of residence from dem datasource, corresponds to Country in RPDR.
zip_dem: string, Mailing zip code of patient's primary residence from dem datasource, corresponds to Zip_code in RPDR.Formatted to 5 character zip codes.
vital_status: string, Identifies if the patient is living or deceased. This data is updated monthly from the Partners registration system and the Social Security Death Master Index, corresponds to Vital_Status in RPDR. Punctuation marks are removed.
time_date_of_death: POSIXct, Recorded date of death from source in 'Vital_Status'. Date of death information obtained solely from the Social Security Death Index will not be reported until 3 years after death due to privacy concerns. If the value is independently documented by a Partners entity within the 3 year window then the date will be displayed. corresponds to Date_of_Death in RPDR. Converted to POSIXct format.

Examples

## Not run: 
#Using defaults
d_dem <- load_dem_old(file = "test_Dem.txt")

#Use sequential processing
d_dem <- load_dem_old(file = "test_Dem.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_dem <- load_dem_old(file = "test_Dem.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads diagnoses into R.

Description

Loads diagnoses information into the R environment, both Dia and Dea files.

Usage

load_dia(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Dia.txt or Dea.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with diagnoses information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_dia_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from dia datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_dia_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from dia datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_dia_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_dia: POSIXct, Date when the diagnosis was noted, corresponds to Date in RPDR. Converted to POSIXct format.
dia_name: string, Name of the diagnosis, diagnosis-related group, or phenotype. For more information on available Phenotypes visit https://phenotypes.partners.org/phenotype_list.html, corresponds to Diagnosis_Name in RPDR.
dia_code: string, Diagnosis, diagnosis-related group, or phenotype code, corresponds to Code in RPDR.
dia_code_type: string, Standardized classification system or custom grouping associated with the diagnosis code, corresponds to Code_type in RPDR.
dia_flag: string, Qualifier for the diagnosis, if any, corresponds to Diagnosis_flag in RPDR.
dia_enc_num: string, Unique identifier of the record/visit. This values includes the source system, hospital, and a unique identifier within the source system, corresponds to Encounter_number in RPDR.
dia_provider: string, Provider of record for the encounter where the diagnosis was entered, corresponds to Provider in RPDR.
dia_clinic: string, Specific department/location where the patient encounter took place, corresponds to Clinic in RPDR.
dia_hosp: string, Facility where the encounter occurred, corresponds to Hospital in RPDR.
dia_inpatient: string, Identifies whether the diagnosis was noted during an inpatient or outpatient encounter, corresponds to Inpatient_Outpatient in RPDR. Punctuation marks removed.

Examples

## Not run: 
#Using defaults
d_dia <- load_dia(file = "test_Dia.txt")

#Use sequential processing
d_dia <- load_dia(file = "test_Dia.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_dea <- load_dia(file = "test_Dea.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads encounter information into R.

Description

Loads encounter-level detail information into the R environment, both Enc and Exc files.

Usage

load_enc(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Enc.txt or Exc.txt

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with encounter information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_enc_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from enc datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_enc_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from enc datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_enc_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
enc_numb: string, Unique identifier of the record/visit. This values includes the source system, hospital, and a unique identifier within the source system, corresponds to Encounter_number in RPDR.
time_enc_admit: POSIXct, Date when the patient was admitted or entered the facility, corresponds to Admit_Date in RPDR. Converted to POSIXct format.
time_enc_disch: POSIXct, Date when the patient was discharged or left the facility, corresponds to Discharge_Date in RPDR. Converted to POSIXct format.
enc_status: string, Billing account-related notes about the encounter. This will not be populated for all encounters, corresponds to Encounter_Status in RPDR.
enc_hosp: string, Facility where the encounter occurred, corresponds to Hospital in RPDR.
enc_inpatient: string, Classifies the type of encounter as either Inpatient or Outpatient. ED visits are currently classified under the 'Outpatient' label, corresponds to Inpatient_or_Outpatient in RPDR.
enc_service: string, Hospital service line assigned to the encounter, corresponds to Service_Line in RPDR.
enc_attending: string, The attending provider associated with the encounter. For Epic professional billing, this is the billing provider, corresponds to Attending_MD in RPDR.
enc_length: numeric, Length of stay for the encounter, corresponds to LOS_days in RPDR.
enc_clinic: string, Specific department/location where the encounter occured, corresponds to Clinic_Name in RPDR.
enc_admit_src: string, Location where the patient was admitted when entering the hospital/clinic, corresponds to Admit_Source in RPDR.
enc_pat_type: string, Provides information regarding the specific patient classifications and status of the patient visit. This field is only populated for McLean Hospital encounters, corresponds to Patient_Type in RPDR.
enc_ref_disp: string, Location where the patient has been directed for treatment or follow-up by a staff member. This field is only populated for McLean Hospital encounters, corresponds to Referrer_Discipline in RPDR.
enc_disch_disp: string, Patient's anticipated location or status following the encounter, corresponds to Discharge_Disposition in RPDR.
enc_pay: string, Payors responsible for the hospital account. Multiple payors (primary, secondary, etc.) may be listed, corresponds to Payor in RPDR.
enc_diag_admit: string, Initial working diagnosis documented by the admitting or attending physician, corresponds to Admitting_Diagnosis in RPDR.
enc_diag_princ: string, Condition established, after study, to be chiefly responsible for occasioning the admission of the patient to the hospital for care, corresponds to Principle_Diagnosis in RPDR.
enc_diag_1: string, Additional diagnoses associated with this encounter or visit, corresponds to Diagnosis_1 in RPDR.
enc_diag_2: string, Additional diagnoses associated with this encounter or visit, corresponds to Diagnosis_2 in RPDR.
enc_diag_3: string, Additional diagnoses associated with this encounter or visit, corresponds to Diagnosis_3 in RPDR.
enc_diag_4: string, Additional diagnoses associated with this encounter or visit, corresponds to Diagnosis_4 in RPDR.
enc_diag_5: string, Additional diagnoses associated with this encounter or visit, corresponds to Diagnosis_5 in RPDR.
enc_diag_6: string, Additional diagnoses associated with this encounter or visit, corresponds to Diagnosis_6 in RPDR.
enc_diag_7: string, Additional diagnoses associated with this encounter or visit, corresponds to Diagnosis_7 in RPDR.
enc_diag_8: string, Additional diagnoses associated with this encounter or visit, corresponds to Diagnosis_8 in RPDR.
enc_diag_9: string, Additional diagnoses associated with this encounter or visit, corresponds to Diagnosis_9 in RPDR.
enc_diag_10: string, Additional diagnoses associated with this encounter or visit, corresponds to Diagnosis_10 in RPDR.
enc_diag_group: string, Diagnosis-Related Group for the encounter, in the following format: SYSTEM:CODE - Description, corresponds to DRG in RPDR.

Examples

## Not run: 
#Using defaults
d_enc <- load_enc(file = "test_Enc.txt")

#Use sequential processing
d_enc <- load_enc(file = "test_Enc.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_exc <- load_enc(file = "test_Exc.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads laboratory results into R.

Description

Loads laboratory results into the R environment, both Lab and Clb files.

Usage

load_lab(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Lab.txt or Clb.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with laboratory exam information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_lab_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from lab datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_lab_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from lab datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_lab_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_lab_result: POSIXct, Date when the specimen was collected, corresponds to Seq_Date_Time in RPDR. Converted to POSIXct format.
lab_group: string, Higher-level grouping concept used to consolidate similar tests across hospitals, corresponds to Group_ID in RPDR.
lab_loinc: string, Standardized LOINC code for the laboratory test, corresponds to Loinc_Code in RPDR.
lab_testID: string, Internal identifier for the test used by the source system, corresponds to Test_ID in RPDR.
lab_descript: string, Name of the lab test, corresponds to Test_Description in RPDR.
lab_result: string, Result value for the test, corresponds to Result in RPDR.
lab_result_txt: string, Additional information included with the result. This can include instructions for interpretation or comments from the laboratory, corresponds to Result_Text in RPDR.
lab_result_abn: string, Flag for identifying if values are outside of normal ranges or represent a significant deviation from previous values, corresponds to Abnormal_Flag in RPDR.
lab_result_unit: string, Units associated with the result value, corresponds to Reference_Unit in RPDR.
lab_result_range: string, Normal or therapeutic range for this value, corresponds to Reference_Range in RPDR.
lab_result_toxic: string, Reference range of values defined as being toxic to the patient, corresponds to Toxic_Range in RPDR.
lab_spec: string, Type of specimen collected to perform the test, corresponds to Specimen_Type in RPDR.
lab_spec_txt: string, Free-text information about the specimen, its collection or its integrity, corresponds to Specimen_Text in RPDR.
lab_correction: string, Free-text information about any changes made to the results, corresponds to Correction_Flag in RPDR.
lab_status: string, Flag which indicates whether the procedure is pending or complete, corresponds to Test_Status in RPDR.
lab_ord_pys: string, Name of the ordering physician, corresponds to Ordering_Doc in RPDR.
lab_accession: string, Internal tracking number assigned to the specimen for identification in the lab, corresponds to Accession in RPDR.
lab_source: string, Database source, either CDR (Clinical Data Repository) or RPDR (internal RPDR database), corresponds to Source in RPDR.

Examples

## Not run: 
#Using defaults
d_lab <- load_lab(file = "test_Lab.txt")

#Use sequential processing
d_lab <- load_lab(file = "test_Lab.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_clb <- load_lab(file = "test_Clb.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads LMR note documents into R.

Description

Loads notes from the LMR legacy EHR system.

Usage

load_lno(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Lno.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with LMR notes information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_lno_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from lno datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_lno_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from lno datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_lno_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_lno: POSIXct, Date when the report was filed, corresponds to LMRNote_Date in RPDR. Converted to POSIXct format.
lno_rec_id: string, Internal identifier for this report within the LMR system, corresponds to Record_Id in RPDR.
lno_status: string, Completion status of the note, corresponds to Status in RPDR.
lno_author: string, Name of user who created the note, corresponds to Author in RPDR.
lno_author_mrn: string, Author's user identifier within the LMR system, corresponds to Author_MRN in RPDR.
lno_COD: string, Hospital-specific user code of the note author. The first character is a hospital-specific prefix, corresponds to COD in RPDR.
lno_hosp: string, Facility where the encounter occurred, corresponds to Institution in RPDR.
lno_subject: string, Type of note. This value is derived from the "Subject" line of the narrative text, corresponds to Subject in RPDR.
lno_rep_txt: string, Full narrative text of the note, corresponds to Comments in RPDR.

Examples

## Not run: 
#Using defaults
d_lno <- load_lno(file = "test_Lno.txt")

#Use sequential processing
d_lno <- load_lno(file = "test_Lno.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_lno <- load_lno(file = "test_Lno.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads match control data into R.

Description

Loads match control tables into the R environment.

Usage

load_mcm(
  file,
  sep = ":",
  id_length = "standard",
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1
)

Arguments

file

string, full file path to Mcm.txt.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

string, indicating whether to modify MRN length based-on required values id_length = standard, or to keep lengths as is id_length = asis.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

Value

data table, with matching data.

ID_case_PMRN: string, Epic PMRN value for a patient in the index cohort, corresponds to Case_Patient_EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_case_EMPI: string, EMPI value for a patient in the index cohort, corresponds to Case_Patient_EMPI in RPDR. Data is formatted using pretty_mrn().
ID_control_PMRN: string, Epic PMRN value for a patient matched to a case in the index cohort, corresponds to Control_Patient_EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_control_EMPI: string, EMPI value for a control patient matched to a case in the index cohort, corresponds to Control_Patient_EMPI in RPDR. Data is formatted using pretty_mrn().
match_strength: string, Number of similar data points between the index patient and the control patient. This number corresponds to the number of controls (Age, Gender, etc.) chosen during the match control query creation process, corresponds to Match_Strength in RPDR.

Examples

## Not run: 
#Using defaults
d_mcm <- load_mcm(file = "test_Mcm.txt")

#Use sequential processing
d_mcm <- load_mcm(file = "test_Mcm.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_mcm <- load_mcm(file = "test_Mcm.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads medication order detail into R.

Description

Loads medication order detail information into the R environment, both Med and Mee files.

Usage

load_med(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Med.txt or Mee.txt

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with medication order information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_med_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from enc datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_med_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from enc datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_med_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
med_enc_numb: string, Unique identifier of the record/visit, displayed in the following format: Source System - Institution Number, corresponds to Encounter_number in RPDR.
time_med: POSIXct, Completion status of the requested test/transfusion. Converted to POSIXct format, corresponds to Medication_Date in RPDR.
time_med_detail: string, To clarify when patients may have stopped taking a medication, this column provides the statuses of 'Listed' or 'Removed'. This is provided on pre-Epic (LMR) medication dates (1997-2017). The 'Listed' value denotes that a medication was on the patient's medication list on the date indicated. The 'Removed' value denotes that a medication was removed from a patient's medication list on the date indicated. Corresponds to Medication_Date_Detail in RPDR.
med: string, Name of the medication. This may be appended with the source system in the case of OnCall and LMR medications, corresponds to Medication in RPDR.
med_code: string, Medication code associated with the "Code_type" value, corresponds to Code in RPDR.
med_code_type: string, Standardized classification system or custom source value used to identify the medication, corresponds to Code_Type in RPDR.
med_quant: string, Number of units of the medication ordered, corresponds to Quantity in RPDR.
med_prov: string, Ordering provider for the medication, corresponds to Provider in RPDR.
med_clinic: string, Specific department/location where the medication was ordered or administered, corresponds to Clinic in RPDR.
med_hosp: string, Facility where the medication was ordered or administered, corresponds to Hospital in RPDR.
med_inpatient: string, Identifies whether the medication was ordered with an Inpatient or Outpatient indication, corresponds to Inpatient_Outpatient in RPDR.
med_add_info: string, Additional administration information about the medication, corresponds to Additional_Info in RPDR.

Examples

## Not run: 
#Using defaults
d_med <- load_med(file = "test_Med.txt")

#Use sequential processing
d_med <- load_med(file = "test_Med.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_mee <- load_med(file = "test_Mee.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads microbiology results into R.

Description

Loads microbiology results into the R environment.

Usage

load_mic(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE,
  format_orig = FALSE
)

Arguments

file

string, full file path to Mic.txt

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

format_orig

boolean, should report be returned in its original formatting or should white spaces used for formatting be removed. Defaults to FALSE.

Value

data table, with microbiology information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_mic_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from mic datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_mic_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from mic datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_mic_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_mic: POSIXct, Date when the specimen was received by the laboratory, corresponds to Microbiology_Date_Time in RPDR. Converted to POSIXct format.
mic_org_code: string, Internal identifier for the organism used by the source system, corresponds to Organism_Code in RPDR.
mic_org_name: string, Name of the organism identified or tested, corresponds to Organism_Name in RPDR.
mic_org_text: string, Full narrative text of the test and results, including sensitivities, corresponds to Organism_Text in RPDR.
mic_org_comment: string, Free-text information about the organism or result, corresponds to Organism_Comment in RPDR.
mic_test_code: string, Internal identifier for the test used by the source system, corresponds to Test_Code in RPDR.
mic_test_name: string, Name of the assay to be performed, or the results of a culture, corresponds to Test_Name in RPDR.
mic_test_status: string, Status of the results, i.e. preliminary or final, corresponds to Test_Status in RPDR.
mic_test_comment: string, Free-text information about the test and results, corresponds to Test_Comments in RPDR.
mic_spec: string, Type of specimen collected to perform the test, corresponds to Specimen_Type in RPDR.
mic_spec_txt: string, Free-text information about the specimen, its collection or its integrity, corresponds to Specimen_Comments in RPDR.
mic_accession: string, Internal tracking number assigned to the specimen for identification in the microbiology lab, corresponds to Microbiology_Number in RPDR.

Examples

## Not run: 
#Using defaults
d_mic <- load_mic(file = "test_Mic.txt")

#Use sequential processing
d_mic <- load_mic(file = "test_Mic.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_mic <- load_mic(file = "test_Mic.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads MRN data into R.

Description

Loads patient identifiers for Partners institutions, including hospital-specific MRNs into the R environment.

Usage

load_mrn(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Mrn.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept. Not used for loading mrn data.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with MRN data.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_mrn_INCOMING: string, Patient identifier, usually the EMPI, corresponds to IncomingId in RPDR. Data is formatted using pretty_mrn().
ID_mrn_INCOMING_SITE: string, Source of identifier, e.g. EMP for Enterprise Master Patient Index, MGH for Mass General Hospital, corresponds to IncomingSite in RPDR.
ID_mrn_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information, corresponds to Enterprise_Master_Patient_Index in RPDR. Data is formatted using pretty_mrn().
ID_mrn_MGH: string, Unique Medical Record Number for Mass General Hospital, corresponds to MGH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_BWH: string, Unique Medical Record Number for Brigham and Women's Hospital, corresponds to BWH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_FH: string, Unique Medical Record Number for Faulkner Hospital, corresponds to FH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_SRH: string, Unique Medical Record Number for Spaulding Rehabilitation Hospital, corresponds to SRH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_NWH: string, Unique Medical Record Number for Newton-Wellesley Hospital, corresponds to NWH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_NSMC: string, Unique Medical Record Number for North Shore Medical Center, corresponds to NSMC_MRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_MCL: string, Unique Medical Record Number for McLean Hospital, corresponds to MCL_MRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_MEE: string, Unique Medical Record Number for Mass Eye and Ear, corresponds to MEE_MRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_DFC: string, Unique Medical Record Number for Dana Farber Cancer center, corresponds to DFC_MRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_WDH: string, Unique Medical Record Number for Wentworth-Douglass Hospital, corresponds to WDH_MRN in RPDR. Data is formatted using pretty_mrn().
ID_mrn_STATUS: string, Status of the record, corresponds to Status in RPDR.

Examples

## Not run: 
#Using defaults
d_mrn <- load_mrn(file = "test_Mrn.txt")

#Use sequential processing
d_mrn <- load_mrn(file = "test_Mrn.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_mrn <- load_mrn(file = "test_Mrn.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads note documents into R.

Description

Loads documents information into the R environment, which are:

Cardiology:: "car"
Discharge:: "dis"
Endoscopy:: "end"
History & Physical:: "hnp"
Operative:: "opn"
Pathology:: "pat"
Progress:: "prg"
Pulmonary:: "pul"
Radiology:: "rad"
Visit:: "vis"

Usage

load_notes(
  file,
  type,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE,
  load_report = TRUE,
  format_orig = FALSE
)

Arguments

file

string, full file path to given type of note i.e. Hnp.txt.

type

string, the type of note to be loaded. May be on of: "car", "dis", "end", "hnp", "opn", "pat", "prg", "pul", "rad" or "vis".

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

load_report

boolean, should the report text be returned in the data table. Defaults to TRUE. However, be aware that some notes may take up more memory than available on the machine.

format_orig

boolean, should report be returned in its original formatting or should white spaces used for formatting be removed. Defaults to FALSE.

Value

data table, with notes information. abc stands for the three letter abbreviation of the given type of note.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_abc_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from abc datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_abc_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from abc datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_abc_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
abc_rep_num: string, Source-specific identifier used to reference the report, corresponds to Report_Number in RPDR.
time_abc: POSIXct, Date when the report was filed, corresponds to Report_Date_Time in RPDR. Converted to POSIXct format.
abc_rep_desc: string, Type of report or procedure documented in the report, corresponds to Report_Description in RPDR.
abc_rep_status: string, Completion status of the note/report, corresponds to Report_Status in RPDR.
abc_rep_type: string, See specification in RPDR data dictionary, corresponds to Report_Type in RPDR.
abc_rep_txt: string, Full narrative text contained in the note/report, corresponds to Report_Text in RPDR. Only provided if load_report is TRUE.

Examples

## Not run: 
#Using defaults
d_hnp <- load_notes(file = "test_Hnp.txt", type = "hnp")

#Use sequential processing
d_hnp <- load_notes(file = "test_Hnp.txt", type = "hnp", nThread = 1, format_orig = TRUE)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_hnp <- load_notes(file = "test_Hnp.txt", type = "hnp", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads helath history information into R.

Description

Loads vital signs, social history, immunizations, and various other health history details into the R environment.

Usage

load_phy(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Phy.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with health history information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_phy_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from phy datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_phy_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from phy datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_phy_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_phy: POSIXct, Date when the diagnosis was noted, corresponds to Date in RPDR. Converted to POSIXct format.
phy_name: string, Type of clinical value/observation recorded, corresponds to Concept_Name in RPDR.
phy_code: string, Source-specific identifier for the specific type of clinical observation, corresponds to Code in RPDR.
phy_code_type: string, Source system for the value, corresponds to Code_type in RPDR.
phy_result: string, Value associated with the clinical observation. Note: BMI results are calculated internally in the RPDR, corresponds to Results in RPDR.
phy_unit: string, Units associated with the clinical observation, corresponds to Units in RPDR.
phy_provider: string, Provider of record for the encounter where the observation was recorded, corresponds to Providers in RPDR.
phy_clinic: string, Specific department/location where the patient observation was recorded, corresponds to Clinic in RPDR.
phy_hosp: string, Facility where the observation was recorded, corresponds to Hospital in RPDR.
phy_inpatient: string, Classifies the type of encounter where the observation was entered, corresponds to Inpatient_Outpatient in RPDR.
phy_enc_num: string, Unique identifier of the record/visit. This values includes the source system and a unique identifier within the source system, corresponds to Encounter_number in RPDR.

Examples

## Not run: 
#Using defaults
d_phy <- load_phy(file = "test_Phy.txt")

#Use sequential processing
d_phy <- load_phy(file = "test_Phy.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_phy <- load_phy(file = "test_Phy.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads procedures into R.

Description

Loads Clinical procedure information into the R environment, both Prc and Pec files.

Usage

load_prc(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Prc.txt or Pec.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with procedural information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_prc_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from prc datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_prc_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from prc datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_prc_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_prc: POSIXct, Date when the procedure was performed, corresponds to Date in RPDR. Converted to POSIXct format.
prc_name: string, Name of the procedure or operation performed, corresponds to Procedure_Name in RPDR.
prc_code: string, Procedure code associated with the "Code_type" value, corresponds to Code in RPDR.
prc_code_type: string, Standardized classification system or custom source value associated with the procedure code, corresponds to Code_type in RPDR.
prc_flag: string, Qualifier for the diagnosis, corresponds to Procedure_Flag in RPDR.
prc_quantity: string, Number of the procedures that were ordered for this record, corresponds to Quantity in RPDR.
prc_provider: string, Provider identifies the health care clinician performing the procedure, corresponds to Provider in RPDR.
prc_clinic: string, Specific department/location where the procedure was ordered or performed, corresponds to Clinic in RPDR.
prc_hosp: string, Facility where the procedure was ordered or performed, corresponds to Hospital in RPDR.
prc_inpatient: string, classifies the type of encounter where the procedure was performed or ordered.
prc_enc_num: string, Unique identifier of the record/visit, displayed in the following format: Source System - Institution Number, corresponds to Encounter_number in RPDR.

Examples

## Not run: 
#Using defaults
d_prc <- load_prc(file = "test_Prc.txt")

#Use sequential processing
d_prc <- load_prc(file = "test_Prc.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_pec <- load_prc(file = "test_Pec.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads providers information into R.

Description

Loads providers information into the R environment.

Usage

load_prv(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = TRUE
)

Arguments

file

string, full file path to Prv.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to TURE only for Con.txt, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with provider information data.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_con_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from con datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_con_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from condatasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_con_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_prv_last_seen: POSIXct, Date when the patient was last seen by the provider, corresponds to Last_Seen_Date in RPDR.
prv_name: string, Full name of the provider, corresponds to Provider_Name in RPDR.
prv_rank: string, Provides a quantitative value of provider's level of interaction with the patient. This is calculated using the number of CPT codes for face-to-face visits that the provider has billed for in relation to the patient, corresponds to Provider_Rank in RPDR.
prv_ID: string, Identification code for the provider, including the source institution, corresponds to Provider_ID in RPDR.
prv_ID_CMP: string, Corporate Provider Master ID. This is the unique identifier for a provider across the MGB network, corresponds to CPM_Id in RPDR.
prv_spec: string, Comma-delimited list of the provider's specialties, corresponds to Specialties in RPDR.
prv_pcp: string, Available for BWH and MGH PCPs only. Flag indicating whether the provider is listed as the patient's Primary Care Physician, corresponds to Is_PCP in RPDR.
prv_dep: string, Provider's department, corresponds to Enterprise_service in RPDR.
prv_address1: string, Address of the provider's primary practice, corresponds to Address_1 in RPDR.
prv_address2: string, Additional address information, corresponds to Address_2 in RPDR.
prv_city: string, City of the provider's primary practice, corresponds to City in RPDR.
prv_state: string, State of the provider's primary practice, corresponds to State in RPDR.
prv_zip: string, Mailing zip code of provider's primary practice, corresponds to Zip in RPDR.
prv_phone: string, Telephone number of the provider's primary practice, corresponds to Phone_Ext in RPDR.
prv_fax: string, Fax number of the provider's primary practice, corresponds to Fax in RPDR.
prv_email: string, Primary e-mail address for the provider, corresponds to Email in RPDR.

Examples

## Not run: 
#Using defaults
d_prv <- load_prv(file = "test_Prv.txt")

#Use sequential processing
d_prv <- load_prv(file = "test_Prv.txt", nThread = 1)

#Use parallel processing and parse data in
#MRN_Type and MRN columns (default in load_con) and keep all IDs
d_prv <- load_prv(file = "test_Prv.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads patient data information into R.

Description

Loads patient data information into the R environment.

Usage

load_ptd(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Ptd.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with patient data information information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_ptd_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from ptd datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_ptd_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from ptd datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_ptd_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_ptd_start: POSIXct, Date item was initiated in the record, corresponds to Start_Date in RPDR. Converted to POSIXct format.
time_ptd_end: POSIXct, Date item was finalized in the record, corresponds to End_Date in RPDR. Converted to POSIXct format.
ptd_desc: string, Name of the item being reported, corresponds to Description in RPDR.
ptd_result: string, Result of the item being reported, corresponds to Result in RPDR.
ptd_type: string, Describes the type of data being reported, corresponds to Patient_Data_Type in RPDR.
ptd_enc_num: string, Unique identifier of the record/visit. This values includes the source system and a unique identifier within the source system, corresponds to Encounter_number in RPDR.

Examples

## Not run: 
#Using defaults
d_ptd <- load_ptd(file = "test_Phy.txt")

#Use sequential processing
d_ptd <- load_ptd(file = "test_Phy.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_ptd <- load_ptd(file = "test_Phy.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads radiology procedures data into R.

Description

Loads radiology procedures information into the R environment.

Usage

load_rdt(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Rdt.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with radiological exam information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_rdt_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from rdt datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_rdt_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from rdt datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_rdt_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_rdt_exam: POSIXct, Date of the radiology exam, corresponds to Date in RPDR. Converted to POSIXct format.
rdt_mode: string, Modality of the exam, corresponds to Mode in RPDR.
rdt_group: string, Higher-level grouping concept used to consolidate similar procedures across hospitals, corresponds to Group in RPDR.
rdt_test_code: string, Internal identifier for the procedure used by the source system, corresponds to Test_Code in RPDR.
rdt_test_desc: string, Full name of the exam/study performed, corresponds to Test_Description in RPDR.
rdt_accession: string, Identifier assigned to the report or procedure for Radiology tracking purposes, corresponds to Accession_Number in RPDR.
rdt_provider: string, Ordering or authorizing provider for the study, corresponds to Provider in RPDR.
rdt_clinic: string, Specific department/location where the procedure was ordered or performed, corresponds to Clinic in RPDR.
rdt_hosp: string, Facility where the order was entered, corresponds to Hospital in RPDR.
rdt_inpatient: string, Classifies the type of encounter where the procedure was performed, corresponds to Inpatient_Outpatient in RPDR.

Examples

## Not run: 
#Using defaults
d_rdt <- load_rdt(file = "test_Rdt.txt")

#Use sequential processing
d_rdt <- load_rdt(file = "test_Rdt.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_rdt <- load_rdt(file = "test_Rdt.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads reason for visit data into R.

Description

Loads reason for visit information into the R environment.

Usage

load_rfv(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Rfv.txt.

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with reason for visit information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_rfv_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from dia datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_rfv_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from rfv datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_rfv_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_rfv_start: POSIXct, Start date of the encounter, corresponds to Start_Date in RPDR. Converted to POSIXct format.
time_rfv_end: POSIXct, End date of the encounter, corresponds to End_Date in RPDR. Converted to POSIXct format.
rfv_provider: string, Primary provider for the encounter, corresponds to Provider in RPDR.
rfv_hosp: string, Facility where the encounter occurred, corresponds to Hospital in RPDR.
rfv_clinic: string, Specific department/location where the patient encounter took place, corresponds to Clinic in RPDR.
rfv_chief_complaint: string, Description of the chief complaint/reason for visit, corresponds to Chief_Complaint in RPDR.
rfv_concept_id: string, Epic identifier for the chief complaint/reason for visit, corresponds to Concept_id in RPDR.
rfv_comment: string, Free-text comments regarding the chief complain/reason for visit, corresponds to Comments in RPDR.
rfv_enc_numb: string, Unique identifier of the record/visit. This values includes the source system, hospital, and a unique identifier within the source system, corresponds to Encounter_number in RPDR.

Examples

## Not run: 
#Using defaults
d_rfv <- load_rfv(file = "test_Rfv.txt")

#Use sequential processing
d_rfv <- load_rfv(file = "test_Rfv.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_rfv <- load_rfv(file = "test_Rfv.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Loads transfusion results into R.

Description

Loads transfusion results into the R environment.

Usage

load_trn(
  file,
  merge_id = "EMPI",
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  na = TRUE,
  identical = TRUE,
  nThread = parallel::detectCores() - 1,
  mrn_type = FALSE
)

Arguments

file

string, full file path to Trn.txt

merge_id

string, column name to use to create ID_MERGE column used to merge different datasets. Defaults to EPIC_PMRN, as it is the preferred MRN in the RPDR system.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Data present in perc x 100% of patients are kept.

na

boolean, whether to remove columns with only NA values. Defaults to TRUE.

identical

boolean, whether to remove columns with identical values. Defaults to TRUE.

nThread

integer, number of threads to use to load data.

mrn_type

boolean, should data in MRN_Type and MRN be parsed. Defaults to FALSE, as it is not advised to parse these for all data sources as it takes considerable time.

Value

data table, with transfusion information.

ID_MERGE: numeric, defined IDs by merge_id, used for merging later.
ID_trn_EMPI: string, Unique Partners-wide identifier assigned to the patient used to consolidate patient information from trn datasource, corresponds to EMPI in RPDR. Data is formatted using pretty_mrn().
ID_trn_PMRN: string, Epic medical record number. This value is unique across Epic instances within the Partners network from trn datasource, corresponds to EPIC_PMRN in RPDR. Data is formatted using pretty_mrn().
ID_trn_loc: string, if mrn_type == TRUE, then the data in MRN_Type and MRN are parsed into IDs corresponding to locations (loc). Data is formatted using pretty_mrn().
time_trn: POSIXct, Date when the transfusion was administered or test was performed, corresponds to Transaction_Date_Time in RPDR. Converted to POSIXct format.
trn_descript: string, The type of procedure or product administered, corresponds to Test_Description in RPDR.
trn_result: string, Results of the test or transaction/lot number of transfusion, corresponds to Results in RPDR.
trn_result_abn: string, Denotes an abnormal finding or value, corresponds to Abnormal_Flag in RPDR.
trn_comment: string, Free-text comments about the status of the test/transfusion, corresponds to Comments in RPDR.
trn_status: string, Completion status of the requested test/transfusion, corresponds to Status_Flag in RPDR.
trn_accession: string, Identifier assigned to the test/transfusion for tracking purposes by the blood bank, corresponds to Accession in RPDR.

Examples

## Not run: 
#Using defaults
d_trn <- load_trn(file = "test_Trn.txt")

#Use sequential processing
d_trn <- load_trn(file = "test_Trn.txt", nThread = 1)

#Use parallel processing and parse data in MRN_Type and MRN columns and keep all IDs
d_trn <- load_trn(file = "test_Trn.txt", nThread = 20, mrn_type = TRUE, perc = 1)

## End(Not run)

Parse IDs from a string of delimited list of values.

Description

Creates columns corresponding to MRNs in the string of delimited list of values. If the string and the numeric part of the MRN are present in the same column, then supply the column to str. If the string portion and the numeric portion is in different columns, then supply the string part to str and the numeric part to num.

Usage

parse_ids(
  str,
  num = NULL,
  sep = ":",
  id_length = "standard",
  perc = 0.6,
  nThread = parallel::detectCores() - 1
)

Arguments

str

vector, delimited list of MRN string values.

num

vector, delimited list of MRN numeric values.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

perc

numeric, a number between 0-1 indicating which parsed ID columns to keep. Columns present in perc x 100% of patients have are kept.

nThread

integer, number of threads to use by dopar for parallelization. If it is set to 1, then no parallel backends are created and the function is executed sequentially.

Value

data table, with columns corresponding to MRNs in the string of delimited list of values.

Converts MRN integer to string compatible with RPDR.

Description

Adds or removes zeros from integers to comply with MRN code standards for given institution and adds institution prefix.

Usage

pretty_mrn(v, prefix = "MGH", sep = ":", id_length = "standard", nThread = 1)

Arguments

v

vector, integer or sting vector with MRNs.

prefix

string or vector, hospital ID from where the MRNs are from. Defaults to MGH. If a vector is provided then it must be the same length as v. This allows to potentially use different prefixes for different IDs using the same vector of values.

sep

string, divider between hospital ID and MRN. Defaults to :.

id_length

nThread

integer, number of threads to use by dopar for parallelization. If it is set to 1, then no parallel backends are created and the function is executed sequentially.

Value

vector, with characters formatted to specified lengths. If length of the ID does not match the required length, then leading zeros are added to the ID. If the ID is longer then the required length, then numerals from the beginning of the ID are cut off until it is the required length.

Examples

## Not run: 
mrns <- sample(1e4:1e7, size = 10) #Simulate MRNs

#MGH format
pretty_mrn(v = mrns, prefix = "MGH")

#BWH format
pretty_mrn(v = mrns, prefix = "BWH")

#Multiple sources using space as a separator
pretty_mrn(v = mrns[1:3], prefix = c("MGH", "BWH", "EMPI"), sep = " ")

#Keeping the length of the IDs despite not adhering to the requirements
pretty_mrn(v = mrns, prefix = "EMPI", id_length = "asis")

## End(Not run)

Converts numerical codes to universal format specified by length.

Description

Creates numerical strings with given lengths by removing additional characters from the back and adding leading zeros if necessary.

Usage

pretty_numbers(v, length_final = 5, remove_from_back = 4)

Arguments

v

vector, integer or sting vector with numerical values.

length_final

numeric, the length of the final string. Defaults to 5 for zip code conversions.

remove_from_back

numeric, the number of digits to remove from the back of the string. If NULL, then removes characters from back more than specified in length_final. Defaults to 4 for zip code conversions by removing the add-on codes.

Value

vector, with characters formatted accordingly.

Removes spaces, special characters and capitals from string vector.

Description

Removes paces, special characters and capitals from string vector and converts unknowns to NA.

Usage

pretty_text(
  v,
  remove_after = FALSE,
  remove_punc = FALSE,
  remove_white = FALSE,
  add_na = TRUE
)

Arguments

v

vector, integer or sting vector with numerical values.

remove_after

boolean whether to remove text after -. Defaults to FALSE.

remove_punc

boolean, whether to remove punctuation marks. Defaults to FALSE.

remove_white

boolean, whether to remove white spaces. Defaults to FALSE.

add_na

boolean, whether to change text indicating NA to NA values in R. Defaults to TRUE.

Value

vector, with characters formatted accordingly.

Delete columns with all NA or all identical data.

Description

Delete columns where all data elements are NA or the same.

Usage

remove_column(dt, na = TRUE, identical = TRUE)

Arguments

dt

data.table, to manipulate.

na

boolean, to delete columns where all data elements are NA.

identical

boolean, to delete columns where all data elements are the same.

Value

data table, with data.