Type: | Package |
Title: | Automated Feature Selection from 'caret' |
Version: | 0.9.4.4 |
Date: | 2018-05-08 |
Depends: | R (≥ 3.1.0), caret, gsubfn, hmeasure, utils, parallel |
Suggests: | ada, arm, Boruta, bst, C50, car, caTools, class, Cubist, e1071, earth (≥ 2.2-3), elasticnet, ellipse, evtree, extraTrees, fastICA, foba, gam, gbm (≥ 2.1), glmnet (≥ 1.8), hda, HDclassif, Hmisc, ipred, kernlab, kknn, klaR, kohonen, KRLS, lars, leaps, LogicReg, MASS, mboost, mda, mgcv, mlbench, neuralnet, nnet, nodeHarvest, obliqueRF, pamr, partDSA, party (≥ 0.9-99992), penalized, penalizedLDA, pls, pROC, proxy, qrnn, quantregForest, randomForest, RANN, relaxo, rFerns, rocc, rpart, rrcov, RRF, rrlda, RSNNS, RWeka (≥ 0.4-1), sda, sparseLDA (≥ 0.1-1), spls, stepPlr, superpc |
Maintainer: | Jakub Szlek <j.szlek@uj.edu.pl> |
License: | GPL-2 | GPL-3 |
Description: | Automated feature selection using variety of models provided by 'caret' package. This work was funded by Poland-Singapore bilateral cooperation project no 2/3/POL-SIN/2012. |
Packaged: | 2018-05-08 06:50:59 UTC; kuba |
NeedsCompilation: | no |
Repository: | CRAN |
Date/Publication: | 2018-05-08 08:15:17 UTC |
Author: | Jakub Szlek [aut, cre], Aleksander Mendyk [ctb] |
Automated feature selection caret (fscaret)
Description
This package provide fast and automated feature selection based on caret package modeling methods.
The main advantage of this extension is that it requires minimum user involvement.
Also the variety of used methods in combination with the scaling according to RMSE or MSE obtained from models profit the user.
The idea is based on the assumption that the variety of models will balance the roughness of calculations (default model settings are applied).
On Windows OS the time limiting function is off, multicore functionalaity is enabled via parLapply() function of package 'parallel'.
Acknowledgments:
This work was funded by Poland-Singapore bilateral cooperation project no 2/3/POL-SIN/2012
Details
Package: | fscaret |
Type: | Package |
Version: | 0.9.4.2 |
Date: | 2017-12-07 |
License: | GPL-2 | GPL-3 |
Author(s)
Jakub Szlek <j.szlek@uj.edu.pl>
Contributions from Aleksander Mendyk, also stackoverflow and r-help@r-project.org mailing list community.
Maintainer: Jakub Szlek <j.szlek@uj.edu.pl>.
References
Kuhn M. (2008) Building Predictive Models in R Using the caret Package Journal of Statistical Software 28(5) http://www.jstatsoft.org/.
Szlek J, Paclawski A, Lau R, Jachowicz R, Mendyk A. Heuristic modeling of macromolecule release from PLGA microspheres. International Journal of Nanomedicine.
2013:8(1); 4601 - 4611. http://www.dovepress.com/international-journal-of-nanomedicine-journal.
See Also
train
, trainControl
, rfeControl
by Max Kuhn <Max.Kuhn at pfizer.com> and predict
base utilities
MSE
Description
Function calculates mean squared error as predicted vs. observed
Usage
MSE(vect1, vect2, rows_no)
Arguments
vect1 |
Numeric vector of predicted values |
vect2 |
Numeric vector of observed values |
rows_no |
Number of observations |
Author(s)
Jakub Szlek and Aleksander Mendyk
RMSE
Description
Function calculates root mean squared error.
Usage
RMSE(vect1, vect2, rows_no)
Arguments
vect1 |
Numeric vector of predicted values |
vect2 |
Numeric vector of observed values |
rows_no |
Number of observations |
Author(s)
Aleksander Mendyk
classVarImp
Description
The function uses the caret package advantage to perform fitting of numerous classification models.
Usage
classVarImp(model, xTrain, yTrain, xTest,
fitControl, myTimeLimit, no.cores,
lk_col, supress.output)
Arguments
model |
Chosed models as called from function fscaret(), argument Used.funcClassPred. |
xTrain |
Training data set, data frame of input vector |
yTrain |
Training data set, vector of observed outputs, must be in binary form 0/1. |
xTest |
Testing data set, data frame of input vector |
fitControl |
Fitting controls passed to caret function |
myTimeLimit |
Time limit in seconds for single model fitting |
no.cores |
Number of used cores for calculations |
lk_col |
Number of columns for whole data set (inputs + output) |
supress.output |
If TRUE output of models are supressed. |
Author(s)
Jakub Szlek and Aleksander Mendyk
References
Kuhn M. (2008) Building Predictive Models in R Using the caret Package Journal of Statistical Software 28(5) http://www.jstatsoft.org/.
dataPreprocess
Description
The functionality is realized in two main steps:
Check for near zero variance predictors and flag as near zero if:
the percentage of unique values is less than 20
the ratio of the most frequent to the second most frequent value is greater than 20,
Check for susceptibility to multicollinearity
Calculate correlation matrix
Find variables with correlation 0.9 or more and delete them
Usage
dataPreprocess(trainMatryca_nr, testMatryca_nr, labelsFrame, lk_col, lk_row, with.labels)
Arguments
trainMatryca_nr |
Input training data matrix |
testMatryca_nr |
Input testing data matrix |
labelsFrame |
Transposed data frame of column names |
lk_col |
Number of columns |
lk_row |
Number of rows |
with.labels |
If with.labels=TRUE, additional data frame with preprocessed inputs corresponding to original data set column numbers as output is generated |
Author(s)
Jakub Szlek and Aleksander Mendyk
References
Kuhn M. (2008) Building Predictive Models in R Using the caret Package Journal of Statistical Software 28(5) http://www.jstatsoft.org/.
Examples
library(fscaret)
# Create data sets and labels data frame
trainMatrix <- matrix(rnorm(150*120,mean=10,sd=1), 150, 120)
# Adding some near-zero variance attributes
temp1 <- matrix(runif(150,0.0001,0.0005), 150, 12)
# Adding some highly correlated attributes
sampleColIndex <- sample(ncol(trainMatrix), size=10)
temp2 <- matrix(trainMatrix[,sampleColIndex]*2, 150, 10)
# Output variable
output <- matrix(rnorm(150,mean=10,sd=1), 150, 1)
trainMatrix <- cbind(trainMatrix,temp1,temp2, output)
colnames(trainMatrix) <- paste("X",c(1:ncol(trainMatrix)),sep="")
# Subset test data set
testMatrix <- trainMatrix[sample(round(0.1*nrow(trainMatrix))),]
labelsDF <- data.frame("Labels"=paste("X",c(1:(ncol(trainMatrix)-1)),sep=""))
lk_col <- ncol(trainMatrix)
lk_row <- nrow(trainMatrix)
with.labels = TRUE
testRes <- dataPreprocess(trainMatrix, testMatrix,
labelsDF, lk_col, lk_row, with.labels)
summary(testRes)
# Selected attributes after data set preprocessing
testRes$labelsDF
# Training and testing data sets after preprocessing
testRes$trainMatryca
testRes$testMatryca
Example testing data set
Description
The data set after preprocessing, which resulted in 29 inptus. Original data set was obtained in literature survey with 298 inputs. Input: chemical descriptors and characteristics of 8 PLGA microparicles formulation. Output: mean particle size of PLGA microparticles Number of attributes 29, single output.
Usage
data(dataset.test)
Format
data.frame
Details
Literature survey yielded 68 formulations of PLGA microspheres with protein as active pharmaceuticla ingridient. In vitro release profiles as well as formulation characteristics and composition were derived from articles. Chemical descriptors were obtained using Marvin ChemAxon software (cxcalc plugin). The final data base consisted of 298 inputs and single output mean particle size.
Source
Kang F, Singh J. Effect of additives on the release of a model protein from PLGA microspheres. AAPS PharmSciTech 2001(2)4, 1-7
Zhou XL et al. Pharmacokinetic and pharmacodynamic profiles of recombinant human erythropoietin-loaded poly(lactic-co-glycolic acid) microspheres in rats. ActaPharmSinica 2012(33), 137-144
Dongmei F et al. Mesoporous Silicon-PLGA Composite Microspheres for the Double Controlled Release of Biomolecules for Orthopedic Tissue Engineering. Adv Funct Mater 2012(22), 282-293.
Kim T.H. et al. Pegylated recombinant human epidermal growth factor (rhEGF) for sustained release from biodegradable PLGA microspheres. Biomater 2002,23, 2311-2317.
Blanco D et al. Protein encapsulation and release from poly(lactide-co-glycolide) microspheres: effect of the protein and polymer properties and of the co-encapsulation of surfactants. Eur J Pharm Biopharm. 1998, 45, 285-294.
Morita T et al. Applicability of various amphiphilic polymers to the modification of protein release kinetics from biodegradable reservoir-type microspheres. Eur J Pharm Biopharm. 2001, 51, 45-53.
Mok H et al. Water free microencapsulation of proteins within PLGA microparticles by spray drying using PEG assisted protein solubilization technique in organic solvent. Eur J Pharm Biopharm. 2008, 70, 137-144.
Buske J et al. Influence of PEG in PEG-PLGA microspheres on particle properties and protein release. Eur J Pharm Biopharm. 2012, 81, 57-63.
Corrigan OI et al. Quantifying drug release from PLGA nanoparticulates. Eur J Pharm Sci. 2009, 37, 477-485.
Puras G. et al. Encapsulation of A-beta-(1-15) in PLGA microparticles enhances serum antibody response in mice immunized by subcutaneous and intranasal routes. Eur J Pharm Sci. 2011 44, 200-206
Tran VT et al. Protein loaded PLGA PEG PLGA microspheres A tool for cell therapy. Eur J Pharm Sci. 2012, 45, 128-137.
Kim HK et al. Microencapsulation of dissociable human growth hormone aggregates within poly(D,L-lactic-co-glycolic acid) microparticles for sustained release. Int J Pharm. 2001, 229, 107-116
Han Y et al. Insulin nanoparticle preparation and encapsulation into poly(lactic-co-glycolic acid) microspheres by using an anhydrous system. Int J Pharm. 2009, 378, 159-166
Liu Q et al. In vitro and in vivo study of thymosin alpha1 biodegradable in situ forming poly(lactide-co-glycolide) implants. Int J Pharm. 2010, 397, 122-129.
He J et al. Stabilization and encapsulation of recombinant human erythropoietin into PLGA microspheres using human serum albumin as a stabilizer. Int J Pharm. 2011, 416, 69-76.
Gasper MM et al. Formulation of L-asparaginase-loaded poly(lactide-co-glycolide) nanoparticles: influence of polymer properties on enzyme loading, activity and in vitro release. J Control Release. 1998, 52, 53-62.
Kawashima Y et al. Pulmonary delivery of insulin with nebulized DL-lactide/glycolide copolymer (PLGA) nanospheres to prolong hypoglycemic effect. J Control Release. 1999, 62, 279-287.
Geng Y et al. Formulating erythropoietin-loaded sustained-release PLGA microspheres without protein aggregation. J Control Release. 2008, 130, 259-265.
Ungaro F et al. Insulin-loaded PLGA/cyclodextrin large porous particles with improved aerosolization properties: in vivo deposition and hypoglycaemic activity after delivery to rat lungs. J Control Release. 2009, 135(1), 25-34.
Iwata M et al. In vitro and in vivo release properties of brilliant blue and tumour necrosis factor-alpha (TNF-alpha) from poly(D,L-lactic-co-glycolic acid) multiphase microspheres. J Microencapsul. 1999, 16(6), 777-792.
Jiang HL et al. Improvement of protein loading and modulation of protein release from poly(lactide-co-glycolide) microspheres by complexation of proteins with polyanions. J Microencapsul. 2004, 21(6), 615-624
Pirooznia N et al. Encapsulation of alpha-1 antitrypsin in PLGA nanoparticles: in vitro characterization as an effective aerosol formulation in pulmonary diseases. J Nanobiotechnology. 2012, 10(1), 20-35.
Castellanos IJ et al. Effect of cyclodextrins on alpha-chymotrypsin stability and loading in PLGA microspheres upon S/O/W encapsulation. J Pharm Sci. 2006, 95(4), 849-858.
Brodbeck KJ et al. Sustained release of human growth hormone from PLGA solution depots. Pharm Res. 2009, 16(12), 1825-1829.
Example training data set
Description
The data set after preprocessing, which resulted in 29 inptus. Original data set was obtained in literature survey with 298 inputs. Input: chemical descriptors and characteristics of 8 PLGA microparicles formulation. Output: mean particle size of PLGA microparticles Number of attributes 29, single output.
Usage
data(dataset.train)
Format
data.frame
Details
Literature survey yielded 68 formulations of PLGA microspheres with protein as active pharmaceuticla ingridient. In vitro release profiles as well as formulation characteristics and composition were derived from articles. Chemical descriptors were obtained using Marvin ChemAxon software (cxcalc plugin). The final data base consisted of 298 inputs and single output mean particle size.
Source
Kang F, Singh J. Effect of additives on the release of a model protein from PLGA microspheres. AAPS PharmSciTech 2001(2)4, 1-7
Zhou XL et al. Pharmacokinetic and pharmacodynamic profiles of recombinant human erythropoietin-loaded poly(lactic-co-glycolic acid) microspheres in rats. ActaPharmSinica 2012(33), 137-144
Dongmei F et al. Mesoporous Silicon-PLGA Composite Microspheres for the Double Controlled Release of Biomolecules for Orthopedic Tissue Engineering. Adv Funct Mater 2012(22), 282-293.
Kim T.H. et al. Pegylated recombinant human epidermal growth factor (rhEGF) for sustained release from biodegradable PLGA microspheres. Biomater 2002,23, 2311-2317.
Blanco D et al. Protein encapsulation and release from poly(lactide-co-glycolide) microspheres: effect of the protein and polymer properties and of the co-encapsulation of surfactants. Eur J Pharm Biopharm. 1998, 45, 285-294.
Morita T et al. Applicability of various amphiphilic polymers to the modification of protein release kinetics from biodegradable reservoir-type microspheres. Eur J Pharm Biopharm. 2001, 51, 45-53.
Mok H et al. Water free microencapsulation of proteins within PLGA microparticles by spray drying using PEG assisted protein solubilization technique in organic solvent. Eur J Pharm Biopharm. 2008, 70, 137-144.
Buske J et al. Influence of PEG in PEG-PLGA microspheres on particle properties and protein release. Eur J Pharm Biopharm. 2012, 81, 57-63.
Corrigan OI et al. Quantifying drug release from PLGA nanoparticulates. Eur J Pharm Sci. 2009, 37, 477-485.
Puras G. et al. Encapsulation of A-beta-(1-15) in PLGA microparticles enhances serum antibody response in mice immunized by subcutaneous and intranasal routes. Eur J Pharm Sci. 2011 44, 200-206
Tran VT et al. Protein loaded PLGA PEG PLGA microspheres A tool for cell therapy. Eur J Pharm Sci. 2012, 45, 128-137.
Kim HK et al. Microencapsulation of dissociable human growth hormone aggregates within poly(D,L-lactic-co-glycolic acid) microparticles for sustained release. Int J Pharm. 2001, 229, 107-116
Han Y et al. Insulin nanoparticle preparation and encapsulation into poly(lactic-co-glycolic acid) microspheres by using an anhydrous system. Int J Pharm. 2009, 378, 159-166
Liu Q et al. In vitro and in vivo study of thymosin alpha1 biodegradable in situ forming poly(lactide-co-glycolide) implants. Int J Pharm. 2010, 397, 122-129.
He J et al. Stabilization and encapsulation of recombinant human erythropoietin into PLGA microspheres using human serum albumin as a stabilizer. Int J Pharm. 2011, 416, 69-76.
Gasper MM et al. Formulation of L-asparaginase-loaded poly(lactide-co-glycolide) nanoparticles: influence of polymer properties on enzyme loading, activity and in vitro release. J Control Release. 1998, 52, 53-62.
Kawashima Y et al. Pulmonary delivery of insulin with nebulized DL-lactide/glycolide copolymer (PLGA) nanospheres to prolong hypoglycemic effect. J Control Release. 1999, 62, 279-287.
Geng Y et al. Formulating erythropoietin-loaded sustained-release PLGA microspheres without protein aggregation. J Control Release. 2008, 130, 259-265.
Ungaro F et al. Insulin-loaded PLGA/cyclodextrin large porous particles with improved aerosolization properties: in vivo deposition and hypoglycaemic activity after delivery to rat lungs. J Control Release. 2009, 135(1), 25-34.
Iwata M et al. In vitro and in vivo release properties of brilliant blue and tumour necrosis factor-alpha (TNF-alpha) from poly(D,L-lactic-co-glycolic acid) multiphase microspheres. J Microencapsul. 1999, 16(6), 777-792.
Jiang HL et al. Improvement of protein loading and modulation of protein release from poly(lactide-co-glycolide) microspheres by complexation of proteins with polyanions. J Microencapsul. 2004, 21(6), 615-624
Pirooznia N et al. Encapsulation of alpha-1 antitrypsin in PLGA nanoparticles: in vitro characterization as an effective aerosol formulation in pulmonary diseases. J Nanobiotechnology. 2012, 10(1), 20-35.
Castellanos IJ et al. Effect of cyclodextrins on alpha-chymotrypsin stability and loading in PLGA microspheres upon S/O/W encapsulation. J Pharm Sci. 2006, 95(4), 849-858.
Brodbeck KJ et al. Sustained release of human growth hormone from PLGA solution depots. Pharm Res. 2009, 16(12), 1825-1829.
feature selection caret
Description
Main function for fast feature selection. It utilizes other functions as regPredImp or impCalc to obtain results in a list of data frames.
Usage
fscaret(trainDF, testDF, installReqPckg = FALSE, preprocessData = FALSE,
with.labels = TRUE, classPred = FALSE, regPred = TRUE, skel_outfile = NULL,
impCalcMet = "RMSE&MSE", myTimeLimit = 24 * 60 * 60, Used.funcRegPred = NULL,
Used.funcClassPred = NULL, no.cores = NULL, method = "boot", returnResamp = "all",
missData=NULL, supress.output=FALSE, saveModel=FALSE, lvlScale=FALSE, ...)
Arguments
trainDF |
Data frame of training data set, MISO (multiple input single output) type |
testDF |
Data frame of testing data set, MISO (multiple input single output) type |
installReqPckg |
If TRUE prior to calculations it installs all required packages, please be advised to be logged as root (admin) user |
preprocessData |
If TRUE data preprocessing is performed prior to modeling |
with.labels |
If TRUE header of the input files are read |
classPred |
If TRUE classification models are applied. Please be advised that importance is scaled according to F-measure regardless impCalcMet settings. |
regPred |
If TRUE regression models are applied |
skel_outfile |
Skeleton output file, e.g. skel_outfile=c("_myoutput_") |
impCalcMet |
Variable importance calculation scaling according to RMSE and MSE, for both please enter impCalcMet="RMSE&MSE" |
myTimeLimit |
Time limit in seconds for single model development |
Used.funcRegPred |
Vector of regression models to be used, for all available models please enter Used.funcRegPred="all" |
Used.funcClassPred |
Vector of classification models to be used, for all available models please enter Used.funcClassPred="all" |
no.cores |
Number of cores to be used for modeling, if NULL all available cores are used, should be numeric type or NULL |
method |
Method passed to fitControl of caret package |
returnResamp |
Returned resampling method passed to fitControl of caret package |
missData |
Handling of missing data values. Possible values: "delRow" - delete observations with missing values, "delCol" - delete attributes with missing values, "meanCol" - replace missing values with column mean. |
supress.output |
If TRUE output of modeling phase by caret functions are supressed. Only info which model is currently calculated and resulting variable importance. |
saveModel |
Logical value [TRUE/FALSE] if trained model should be embedded in final model. |
lvlScale |
Logical value [TRUE/FALSE] if additional scaling should be applied. For more information plase refer to impCalc(). |
... |
Additional arguments, preferably passed to fitControl of caret package |
Value
$ModelPred |
List of outputs from caret model fitting |
$VarImp |
Data frames of variable importance and corresponding trained models |
$PPlabels |
Data frame of resulting preprocessed data set with original input numbers and names |
$PPTrainDF |
Training data set after preprocessing |
$PPTestDF |
Testing data set after preprocessing |
$VarImp$model |
Trained models |
Note
Be advised when using fscaret function as it requires hard disk operations for saving fitted models and data frames. Files are written in R temp session folder, for more details see tempdir(), getwd() and setwd()
Author(s)
Jakub Szlek and Aleksander Mendyk
References
Kuhn M. (2008) Building Predictive Models in R Using the caret Package Journal of Statistical Software 28(5) http://www.jstatsoft.org/.
Examples
if((Sys.info()['sysname'])!="SunOS"){
library(fscaret)
# Load data sets
data(dataset.train)
data(dataset.test)
requiredPackages <- c("R.utils", "gsubfn", "ipred", "caret", "parallel", "MASS")
if(.Platform$OS.type=="windows"){
myFirstRES <- fscaret(dataset.train, dataset.test, installReqPckg=FALSE,
preprocessData=FALSE, with.labels=TRUE, classPred=FALSE,
regPred=TRUE, skel_outfile=NULL,
impCalcMet="RMSE&MSE", myTimeLimit=4,
Used.funcRegPred=c("lm"), Used.funcClassPred=NULL,
no.cores=1, method="boot", returnResamp="all",
supress.output=TRUE,saveModel=FALSE)
} else {
myCores <- 2
myFirstRES <- fscaret(dataset.train, dataset.test, installReqPckg=FALSE,
preprocessData=FALSE, with.labels=TRUE, classPred=FALSE,
regPred=TRUE, skel_outfile=NULL,
impCalcMet="RMSE&MSE", myTimeLimit=4,
Used.funcRegPred=c("lm","ppr"), Used.funcClassPred=NULL,
no.cores=myCores, method="boot", returnResamp="all",
supress.output=TRUE,saveModel=FALSE)
}
# Results
myFirstRES
}
Classification methods used.
Description
Vector of all classification methods used in solving problems by caret
Usage
data(funcClassPred)
Format
vector
All regression methods used
Description
Vector of all regression methods used in solving problems by caret
Usage
data(funcRegPred)
Format
vector
impCalc
Description
impCalc function is designed to scale variable importance according to MSE and RMSE calculations. It also stores the raw MSE, RMSE, F-measure and developed models if saveModel=TRUE. impCalc is low-level function, it shouldn't be used alone unless user has trained models from caret package stored in RData files.
Usage
impCalc(skel_outfile, xTest, yTest, lk_col,
labelsFrame,with.labels,regPred,classPred,saveModel,lvlScale)
Arguments
skel_outfile |
Skeleton name of output file |
xTest |
Input vector of testing data set |
yTest |
Output vector of testing data set |
lk_col |
Number of columns of whole data set |
labelsFrame |
Labels to sort variable importance |
with.labels |
Pass with.labels argument. It is advised to ALWAYS use labels as in some cases VarImp returns importance in descending values. If you insist turning with.labels FALSE, then make sure data base contains pure data and you read it (read.csv) to data.frame with option header=FALSE. |
regPred |
Indicating if regression predictions are computed. Logical value [TRUE/FALSE]. If regPred is set TRUE, then classPred should be set FALSE. |
classPred |
Indicating if classification predictions are computed. Possible values TRUE/FALSE. If classPred is set TRUE, then regPred should be set FALSE. Please be advised that importance is scaled according to F-measure. |
saveModel |
Logical value [TRUE/FALSE] if trained model should be embedded in final model. |
lvlScale |
Indicating if use additional scaling. The option is especially usefull when large number of features are getting NA's or are not included in feature ranking. It levels the scores of the features taking the overall number of features. Default value is FALSE. Logical value [TRUE/FALSE]. |
Details
impCalc function lists RData files in working directory assuming there are only models derived by caret. In a loop function loads models and tries to get the variable importance.
Author(s)
Jakub Szlek and Aleksander Mendyk
Examples
## Not run:
#
# Hashed to comply with new CRAN check
#
library(fscaret)
# Load dataset
data(dataset.train)
data(dataset.test)
# Make objects
trainDF <- dataset.train
testDF <- dataset.test
model <- c("lm","Cubist")
fitControl <- trainControl(method = "boot", returnResamp = "all")
myTimeLimit <- 5
no.cores <- 2
supress.output <- TRUE
skel_outfile <- paste("_default_",sep="")
mySystem <- .Platform$OS.type
with.labels <- TRUE
redPred <- TRUE
classPred <- FALSE
saveModel <- FALSE
lvlScale <- FALSE
if(mySystem=="windows"){
no.cores <- 1
}
# Scan dimensions of trainDF [lk_row x lk_col]
lk_col = ncol(trainDF)
lk_row = nrow(trainDF)
# Read labels of trainDF
labelsFrame <- as.data.frame(colnames(trainDF))
labelsFrame <-cbind(c(1:ncol(trainDF)),labelsFrame)
# Create a train data set matrix
trainMatryca_nr <- matrix(data=NA,nrow=lk_row,ncol=lk_col)
row=0
col=0
for(col in 1:(lk_col)) {
for(row in 1:(lk_row)) {
trainMatryca_nr[row,col] <- (as.numeric(trainDF[row,col]))
}
}
# Pointing standard data set train
xTrain <- data.frame(trainMatryca_nr[,-lk_col])
yTrain <- as.vector(trainMatryca_nr[,lk_col])
#--------Scan dimensions of trainDataFrame1 [lk_row x lk_col]
lk_col_test = ncol(testDF)
lk_row_test = nrow(testDF)
testMatryca_nr <- matrix(data=NA,nrow=lk_row_test,ncol=lk_col_test)
row=0
col=0
for(col in 1:(lk_col_test)) {
for(row in 1:(lk_row_test)) {
testMatryca_nr[row,col] <- (as.numeric(testDF[row,col]))
}
}
# Pointing standard data set test
xTest <- data.frame(testMatryca_nr[,-lk_col])
yTest <- as.vector(testMatryca_nr[,lk_col])
# Calling low-level function to create models to calculate on
myVarImp <- regVarImp(model, xTrain, yTrain, xTest,
fitControl, myTimeLimit, no.cores, lk_col,
supress.output, mySystem)
myImpCalc <- impCalc(skel_outfile, xTest, yTest,
lk_col,labelsFrame,with.labels,redPred,classPred,saveModel,lvlScale)
## End(Not run)
imputeMean
Description
Secondary function imputes the mean to columns with NA data.
Usage
impute.mean(x)
Arguments
x |
a vector to calculate mean |
Author(s)
Jakub Szlek and Aleksander Mendyk
Examples
library(fscaret)
# Make sample matrix
testData <- matrix(data=rep(1:5),ncol=10,nrow=15)
# Replace random values with NA's
n <- 15
replace <- TRUE
set.seed(1)
rand.sample <- sample(length(testData), n, replace=replace)
testData[rand.sample] <- NA
# Print out input matrix
testData
# Record cols with missing values
missing.colsTestMatrix <- which(colSums(is.na(testData))>0)
for(i in 1:length(missing.colsTestMatrix)){
rowToReplace <- missing.colsTestMatrix[i]
testData[,rowToReplace] <- impute.mean(testData[,rowToReplace])
}
# Print out matrix with replaced NA's by column mean
testData
installPckg
Description
Function installs the packages that are listed in data(requiredPackages). The function is called within fscaret function. If argument "installReqPckg = TRUE" the function installs required packages.
Usage
installPckg(requiredPackages)
Arguments
requiredPackages |
Vector of packages to be installed |
Details
Be advised setting "installReqPckg = TRUE" installs packages in your home directory (.R). To install packages for all users please login as root (admin).
Author(s)
Jakub Szlek and Aleksander Mendyk
regVarImp
Description
The function uses the caret package advantage to perform fitting of numerous regression models.
Usage
regVarImp(model, xTrain, yTrain, xTest,
fitControl, myTimeLimit, no.cores,
lk_col, supress.output)
Arguments
model |
Chosed models as called from function fscaret(), argument Used.funcRegPred. |
xTrain |
Training data set, data frame of input vector |
yTrain |
Training data set, vector of observed outputs |
xTest |
Testing data set, data frame of input vector |
fitControl |
Fitting controls passed to caret function |
myTimeLimit |
Time limit in seconds for single model fitting |
no.cores |
Number of used cores for calculations |
lk_col |
Number of columns for whole data set (inputs + output) |
supress.output |
If TRUE output of models are supressed. |
Author(s)
Jakub Szlek and Aleksander Mendyk
References
Kuhn M. (2008) Building Predictive Models in R Using the caret Package Journal of Statistical Software 28(5) http://www.jstatsoft.org/.
requiredPackages
Description
Character vector of names of required packages to fully take advantage of fscaret
Usage
data(requiredPackages)
Format
vector
Examples
data(requiredPackages)
timeout
Description
This function limits elapsed time spent on single model development. It uses low-level functions of parallel packege and sets the fork process with time limit. If the result is not returned within set time, it kills fork. Function shouldn't be called from R console. The function is not used under Windows OS. Only Unix-like systems have fork functionality.
Usage
timeout(..., seconds)
Arguments
... |
Expression to be time limited |
seconds |
Number of seconds |
Author(s)
Original code by Jeroen Ooms <jeroen.ooms at stat.ucla.edu> of OpenCPU package. Modifications by Jakub Szlek and Aleksander Mendyk.