---
title: "The forest.plot and table.plot functions"
author: "Greg Cicconetti"
date: '2022-08-15'
output: html_document
editor_options: 
  chunk_output_type: console
---
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{The forest.plot and table.plot functions}
%\VignetteDepends{tidyverse, figuRes2}

# The figuRes2::forest.plot and figuRes2::table.plot functions

The forest.plot, dot.plot and table.plot functions share some similarities. These are simple figures to describe, but labeling, idiosyncrasies in data structure, and aesthetic considerations complicate their construction. To get these figures to look appealing, one must have a good understanding of the incoming data structure and iterate towards a final product. These can be very time consuming in terms of design and execution.

This first example will demonstrate that pre-processing and post-processing are essential steps. First, start a session:

```{r, results='hide'}
remove(list=ls())
require(figuRes2)
require(survival)
require(ggplot2)
require(scales)
require(stringr)
require(plyr)
require(grid)
require(gridExtra)
require(reshape2)
require(gtable)
default.settings()
```

## Example 1: Labels for each line segments

Suppose we are handed the following data set with the task of producing a 1x2 panel of 2 graphics - a forest plot on the left and a table plot on the left. Let's inspect the data:

```{r}
data(forest.data)
working.df <- forest.data
head(working.df)
dim(working.df)
```

We have a data.frame with 89 rows and 35 different subgroup analyses.  Clearly, 89 line segments on a single page would be a bit too much.  Suppose we plan to have 15-20 rows displayed on figures and therefore will ultimately need to partition this dataset into 5 or 6 smaller data.frames. For the present example then, we work with the first 16 rows. Determining how to divide up the remaining rows is an exercise left to the reader.  

Our smaller working data.frame becomes:

```{r}
working.df.1 <- working.df[1:16,]
working.df.1 
```

Suppose this data.frame is sorted as we'd like to see it in the forest plot. (If not, accomplish this with additional pre-processing!) Namely, we'd like the top rows reporting line segments associated with _Qual. diag.: Prior MI_ and the bottoms rows reporting line segments for _CV risk factor: Current or previous smoker_. The lower and upper endpoints of the line segments are associated with columns low and high containing the endpoints of 95% confidence intervals for the hazard ratio.  The following items need to be added to the data.frame in order to make use of the forest.plot, table.plot and dot.plot functions.  Columns need to be created for the following aspects of the graph:

* rank at which line segments are plotted
* color to be associated with the line segments and points
* ranks for the y-axis labels
* labels for the y-axis

First, we assign ranks for the line segments.

```{r}
working.df.1$rank <- rev(1:16)
```

Next we assign a column for color.  In this example, the color of all line segments will be the same, so we are creating a dummy column holding a factor with a single value.  (In the next example, we'll see multiple colors.)

```{r}
working.df.1$category <- factor(0)
```

In this example, each line segment will have a label associated with it. As such, the following step is superfluous; we could just as well reuse the rank column.

```{r}
working.df.1$label.rank <- rev(1:16)
```

The actual labels to be used can be deduced from the data.frame. These will need to be a combination of values from subgroup and level columns.   

```{r}
working.df.1$labels <- paste(working.df.1$subgroup, working.df.1$level)
working.df.1
```

We will return to fine tuning these labels in post-processing because of the need for mathematical symbol for _less than or equal to_; in the absence of this issue, an alternative attack would be to coerce the labels column into a factor and rename the levels at this stage.

### Building the forest plot graphic

```{r, fig.cap= "First Pass at a Forest Plot"}
p1 <- forest.plot(parent.df = working.df.1, 
            y.rank.col = "rank",  # line segment's y-axis rank
            Point.Est = "hr",     # line segment's dot
            lower.lim = "low",     # line segment's lower endpoint
            upper.lim = "high",    # line segment's upper endpoint
            y.label.rank.col = "label.rank",  # label's y-axis rank
            y.label.col = "labels", # label's text value
            x.label = "Estimate", 
            y.label = NULL,
            log.trans = TRUE, 
            x.limits = c(0.21, 5), 
            x.ticks = 2^(-2:2), 
            category.color = "category", # This colors the points and line segments
            background.palette = c("red", "blue"), 
            category.palette = c("red", "blue"), 
            shape.palette = c(16, 16), 
            flip.palette = FALSE) 
print(p1)
```

### Post-processing the forest plot graphic
The following is a necessarily manual task.
```{r, fig.cap= "Second Pass: Label fix"}
p2 <- p1 + scale_y_continuous(
  breaks = p1$data$LABEL.RANKS,
  labels = c(
    "Prior Myocardial Infaction: No",
    "Yes",
    "Prior coronary revasc.: No",
    "Yes",
    "Multivessel CHD: No",
    "Yes",
    "CHD Event Relative to Randomization: Recent",
    "Remote",
    expression(paste("Age ", phantom() >= 60,": No")),
    "Yes",
    "Diabetes req. pharm.: No",
    "Yes",
    "HDL-C < 40 mg/dL: No",
    "Yes",
    "Current or previous smoker: Yes",
    "No"))
print(p2)
```


### Building the table plot graphic
We turn to the corresponding table plot.
```{r, fig.cap= "First Pass at a Table plot"}
t1 <- table.plot(
    parent.df = working.df.1,
    y.rank.col= "rank",
    category.color= "category",
    text.col1 = "hr",
    text.col2 = "low",
    text.col3 = "high",
    text.col4 = NULL,
    text.size = 3,
    xtick.labs = c("Estimate", "LCI", "UCI"),
    x.label= "Text",
    y.label= "Item",
    y.label.rank.col = "label.rank",  #  this identifies the y-axis values for labels
    y.label.col = "subgroup", 
    category.palette = c("red", "blue"))
print(t1)
```

Since we're planning to juxtapose the table plot and the forest plot, we can suppress the labels here.  In practice, it is worth verifying that labels in the forest and table plots agree before suppressing the labels. Note that arguments associated with y.label and y.label.rank.col are simply set to NULL. In addition teh x.label is set to white space.  (A good exercise is to see what results when you step through the remainder of this exercise with NULL used in place of white space in the x.label argument.) Finally, the category.palette's first argument is changed from red to grey40.
```{r, fig.cap= "Second pass at table plot"}
t2 <- table.plot(
    parent.df = working.df.1,
    y.rank.col= "rank",
    category.color= "category",
    text.col1 = "hr",
    text.col2 = "low",
    text.col3 = "high",
    text.col4 = NULL,
    text.size=3,
    xtick.labs = c("Estimate", "LCI", "UCI"),
    x.label= "",
    y.label=NULL,
    y.label.rank.col = "label.rank",  #  this identifies the y-axis values for labels
    y.label.col = NULL, 
    category.palette = c("grey40", "blue"))
print(t2)
```


### Assembling the page
Here's a first pass at assembling the forest plot figure, allocating 50% of available width to the forest plot graphic and table plot raphic.
```{r, fig.cap= "An assembled forest plot figure"}
build.page(interior.h = c(1),
           interior.w = c(1/2, 1/2),
           ncol=2, nrow=1, interior=list(p2+ggtitle(""), t2+ggtitle("")) )
annotate.page(override = "", title=list("Title Line 1", "","","",""))
```

Perhaps allocating more space for the figure and less space for the table would look better:

```{r, fig.cap= "Experimenting with the width: 60%/40%"}
build.page(interior.h = c(1),
           interior.w = c(.6, .4),
           ncol=2, nrow=1, interior=list(p2+ggtitle(""), t2+ggtitle("")) )
annotate.page(override = "", title=list("Title Line 1", "","","",""))
```

Pushing to an extreme.
```{r, fig.cap= "An assembled forest plot figure"}
build.page(interior.h = c(1),
           interior.w = c(.8, .2),
           ncol=2, nrow=1, interior=list(p2+ggtitle(""), t2+ggtitle("")) )
annotate.page(override = "", title=list("Title Line 1", "","","",""))
```

Recall comments in previous sections about altering plot.margins to decrease the padding between p2 and t2.

### Manipulating the vertical placement of line segments
Suppose we want to separate the subgroups a bit better.  The user has control over this when defining the rank columns.
```{r, fig.cap= "Manipulating the vertical placement of line segments"}
working.df.1$rank2 <- working.df.1$rank + duplicated(working.df.1$subgroup)*.5
p3 <- forest.plot(parent.df = working.df.1, 
            y.rank.col = "rank2",  # line segment's y-axis rank
            Point.Est = "hr",     # line segment's dot
            lower.lim = "low",     # line segment's lower endpoint
            upper.lim = "high",    # line segment's upper endpoint
            y.label.rank.col = "rank2",  # label's y-axis rank
            y.label.col = "labels", # label's text value
            x.label = "Estimate", 
            y.label = NULL,
            log.trans = TRUE, 
            x.limits = c(0.21, 5), 
            x.ticks = 2^(-2:2), 
            category.color = "category", # This colors the points and line segments
            background.palette = c("red", "blue"), 
            category.palette = c("red", "blue"), 
            shape.palette = c(16, 16), 
            flip.palette = FALSE) 

# This step is same as before, with swap in the breaks argument
p4 <- p3 + scale_y_continuous(
  breaks = p3$data$RANK,
  labels = c(
    "Prior MI: No",
    "Yes",
    "Prior Coronary Revasc.: No",
    "Yes",
    "Multivessel CHD: No",
    "Yes",
    "CHD Event Relative to Randomization: Recent",
    "Remote",
    expression(paste("Age ", phantom() >= 60,": No")),
    "Yes",
    "Diabetes req. pharm.: No",
    "Yes",
    "HDL-C < 40 mg/dL: No",
    "Yes",
    "Current or previous smoker: No",
    "Yes"))
# This is same as t2, save swap of rank for rank2
t3 <- table.plot(
    parent.df = working.df.1,
    y.rank.col = "rank2",
    category.color = "category",
    text.col1 = "hr",
    text.col2 = "low",
    text.col3 = "high",
    text.col4 = NULL,
    text.size=3,
    xtick.labs = c("Estimate", "LCI", "UCI"),
    x.label= "",
    y.label =NULL,
    y.label.rank.col = "rank2",  #  this identifies the y-axis values for labels
    y.label.col = NULL, 
    category.palette = c("grey40", "blue"))

build.page(interior.h = c(1),
           interior.w = c(.8, .2),
           ncol = 2, nrow = 1, interior=list(p4 + ggtitle(""), t3+ggtitle("")) )
annotate.page(override = "", title=list("Title Line 1", "","","",""))
```