--- title: "dr4pl usage examples" author: "Hyowon An and Justin T. Landis" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{dr4pl usage examples} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, fig.width = 7) ``` ##Installing packages Our R package **dr4pl** is now available on CRAN! You can download it by the command **install.packages("dr4pl")**. Note that the argument **repos** can be omitted. Let us now load R packages that we need to use in our examples. ```{r} library(drc) library(dr4pl) library(ggplot2) ``` **dr4pl** is an abbreviation for **D**ose-**R**esponse data analysis using **4** **P**arameter **L**ogistic (4PL) model. 4PL models are typically used models in pharmacological and toxicological experiments. ##Errors drawn by the R package 'drc' The most well-known R package for dose-response modelling is **drc**. However, that R package has drawn errors in many data sets generated by Dittmer Lab. Let's first consider the first error case **drc_error_1**. ```{r} head(drc_error_1) ggplot(drc_error_1, aes(x = Dose, y = Response)) + # Data name, variable names geom_point() + # Put data points on the scatter plot scale_x_log10() + # Change the x-axis scale to log 10 scale ggtitle("Error Case 1") # Set the title ``` As you can see, this data set contains an extreme outlier at the dosage level 1e-05. These outliers are often generated by measurement errors or experimental failure. If you were to fit a 4PL model with **drc**, you would recieve the following. ```{r} tryCatch({ drm(Response~Dose, data = drc_error_1, fct = LL.4()) }, warning = function(war) { # warning handler picks up where error was generated print(paste(sep = " ", war)) }, error = function(err) { # error handler picks up where error was generated print(paste(sep = " ", err)) }) ``` Note that the error message just indicates convergence failure without explanation about the reasons. Instead of giving up, you can try to use **dr4pl**. ```{r} dr4pl.error.1 <- dr4pl(Response~Dose, data = drc_error_1) plot(dr4pl.error.1, text.title = "Error plot #1") ``` The fitted curve does not look bad. Unlike **drc**, convergence failure does not draw an error but presents FALSE value in an indicator variable. ```{r} dr4pl.error.1$convergence # FALSE indicates convergence failure ``` Once convergence failure happens, **dr4pl** returns a child object which is obtained by robust estimation. ```{r} dr4pl.robust.1 <- dr4pl.error.1$dr4pl.robust class(dr4pl.error.1) class(dr4pl.robust.1) dr4pl.robust.1$convergence ``` The parent object *dr4pl.error.1* and the child object *dr4pl.robust.1* have the same class *dr4pl*. Note that the child object *dr4pl.robust.1* indicates that convergence failure did not happen when robust estimation is applied. In the following diagnosis plot, outliers are denoted by red triangles. ```{r} plot(dr4pl.robust.1, indices.outlier = dr4pl.robust.1$idx.outlier) ``` Let's next consider the second error case **drc_error_2**. ```{r} ggplot(drc_error_2, aes(x = Dose, y = Response)) + geom_point() + scale_x_log10() + ggtitle("Error Case 2") ``` The most important problem with Error Case 2 is that whether its trend is decreasing or increasing is ambiguous. **dr4pl**'s default parameter *trend* automizes this and will find the best fit based on optimization. Let us see how this example plays out with the data set **drc_error_2**. ```{r} dr4pl.error.2 <- dr4pl(Response~Dose, data = drc_error_2, method.init = "logistic") plot(dr4pl.error.2, text.title = "Trend is default") ``` The parameter *trend* helps force the fit to either *increasing* or *decreasing* in the event that you are confident that your response follows that trend. We may choose to force a decreasing curve, with other apporpriate parameters. ```{r} dr4pl.error.2 <- dr4pl(Response~Dose, data = drc_error_2, trend = "decreasing", method.init = "logistic", method.robust = "absolute") plot(dr4pl.error.2, text.title = "Trend forced to decrease") ``` Note that the fit now is decreasing. Let's consider the data set **drc_error_4**. ```{r} ggplot(drc_error_4, aes(x = Dose, y = Response)) + geom_point() + scale_x_log10() + ggtitle("drc_error_4") ``` This data set has two outliers in the largest dosage level. This data set also exemplifies the **support problem** as well. A **support problem** indicates that a data set does not have enough data points on the right side of the IC50 parameter. This can result in difficulty in sometimes obtaining estimates of the lower asymptote and slope and sometimes even convergence failure. Let's try to apply **drc** to this data set. ```{r} tryCatch({ drm(Response~Dose, data = drc_error_4, fct = LL.4()) }, warning = function(war) { # warning handler picks up where error was generated print(paste(sep = " ", war)) }, error = function(err) { # error handler picks up where error was generated print(paste(sep = " ", err)) }) ``` As expected, **drc** is not able to fit a 4PL model to *drc_error_4*. Now let's use **dr4pl**. ```{r} dr4pl.error.4 <- dr4pl(Response~Dose, data = drc_error_4, method.init = "logistic") dr4pl.error.4$convergence plot(dr4pl.error.4, text.title = "Error plot #4") ``` The package drc draws errors with each one of these cases. However dr4pl is able generate a curve despite outliers and a support problem in each error case. In each case we were able to modify the title and axis names with *text.title*, *text.x*, or *text.y*. We are also able to bring attention to outlier points by passing a vector of the indices to the *indices.outlier* argument. ##Robust Estimation **dr4pl** provides several methods of robust estimation. Consider you want to plot the dr4pl data set **sample_data_5**. Let's try and apply the default parameters. ```{r} dr4pl.Mead.5 <- dr4pl(Response~Dose, data = sample_data_5) plot(dr4pl.Mead.5, text.title = "Fit with Mead's method") ``` After producing a plot with the default parameters, you may be confident in producing a better fit with other parameters. This is where the arguments *method.init* and *method.robust* come into play, which you may have noticed that these arguments where used in the error case examples. The default paramter to *method.init* is *Mead*. This uses Mead's method when approximating the theta parameters. The alternative to using Mead's method is *logistic*, which uses a logistic method to obtain initial parameter estimates. Let's see how our plot will change when we use the logistic method. ```{r} dr4pl.logistic.5 <- dr4pl(Response~Dose, data = sample_data_5, method.init = "logistic") plot(dr4pl.logistic.5, text.title = "Fit with the logistic method") ``` Mead's method seems to generate tighter fit with this data set. This is due to a characteristic of Mead's method that it focuses more on the slope and lower asymptote parameters than the logistic method. Now let's talk about *method.robust*. *method.robust* allows the user to choose the loss function used during optimization. The default loss function used is the squared loss function. The user has three other loss functions to select from: *absolute*, *Huber*, and *Tukey*. Lets see how this plot will change when using *absolute* versus *Tukey*. ```{r} dr4pl.absolute.5 <- dr4pl(Response~Dose, data = sample_data_5, method.robust = "absolute") plot(dr4pl.absolute.5, text.title = "Fit with the absolute loss") dr4pl.Tukey.5 <- dr4pl(Response~Dose, data = sample_data_5, method.robust = "Tukey") plot(dr4pl.Tukey.5, text.title = "Fit with Tukey's biweight loss") ``` When selecting *Tukey* for the argument **method.robust**, the generated fit is less weighted by the largest dosage level which is an outlier response of zero. When selecting *absolute* for the argument **method.robust**, the generated fit is more weignted by the largest dosage level. It may take several attempts to find the best parameters for each data set. ##Other functionalities of dr4pl Let's look at other uses for this code. Consider the following data set **sample_data_3**. ```{r} dr4pl.3 <- dr4pl(Response~Dose, data = sample_data_3) plot(dr4pl.3, text.title = "Sample data #3") ``` Once you produce a curve you feel is representative of your data, you may get the parameters of the curve by using the summary function on the dr4pl variable. ```{r} summary.dr4pl.3 <- summary(dr4pl.3) summary.dr4pl.3$coefficients ``` It is possible that you are interested in more than just the IC50 variable. Use the IC() function to produce the respective dose values. ```{r} values <- IC(dr4pl.3, c(10, 30, 50, 70, 90)) values ``` Finally, you can easily edit your plot further with basic ggplot2 additional controls. Let's see how **dr4pl** handles the data set **chekcweed0** in **drc**. As can be seen in the figure, a user can set the x-axis and y-axis labels and also the x-axis tick labels. ```{r chickweed0} dr4pl.chickweed <- dr4pl(count~time, data = chickweed0, trend = "increasing") plot(dr4pl.chickweed, text.x = "Time", text.y = "Count", text.title = "Dataset chickweed0", breaks.x = c(25, 100, 175, 250)) ```