Title: | Implementation of the Q-Q Boxplot |
Version: | 0.3.0 |
Description: | A system to implement the Q-Q boxplot. It is implemented as an extension to 'ggplot2'. The Q-Q boxplot is an amalgam of the boxplot and the Q-Q plot and allows the user to rapidly examine summary statistics and tail behavior for multiple distributions in the same pane. As an extension of the 'ggplot2' implementation of the boxplot, possible modifications to the boxplot extend to the Q-Q boxplot. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.1 |
Imports: | ggplot2, grid |
Depends: | R (≥ 3.3) |
Suggests: | knitr, rmarkdown, dplyr, gridExtra, testthat (≥ 3.0.0), vdiffr (≥ 0.3.3), scales |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2022-11-20 03:17:28 UTC; jsr6q |
Author: | Jordan Rodu [aut, cre] |
Maintainer: | Jordan Rodu <jordan.rodu@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2022-11-20 03:30:02 UTC |
Simulated normal dataset with mean=5 and variance=1
Description
A dataset that contains simulated data to reproduce a figure in our manuscript
Usage
comparison_dataset
Format
A vector
Source
simulations
Log expression data for select genes
Description
A dataset that contains log expression data for randomly selected genes for two patients, one with autism and one control.
Usage
expression_data
Format
A data frame with 1200 rows and 3 variables:
- gene
gene identifier (not meaningful)
- specimen
autism or control
- log_count
the logged gene expression count
...
Source
https://www.ebi.ac.uk/gxa/experiments/E-GEOD-30573/Results
A modification of the boxplot with information about the tails
Description
A modification of the boxplot with information about the tails
Usage
geom_qqboxplot(
mapping = NULL,
data = NULL,
stat = "qqboxplot",
position = "dodge2",
...,
outlier.colour = NULL,
outlier.color = NULL,
outlier.fill = NULL,
outlier.shape = 19,
outlier.size = 1.5,
outlier.stroke = 0.5,
outlier.alpha = NULL,
notch = FALSE,
notchwidth = 0.5,
varwidth = FALSE,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
Arguments
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
specifies the stat function to use |
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function. |
... |
Other arguments passed on to |
outlier.colour , outlier.color , outlier.fill , outlier.shape , outlier.size , outlier.stroke , outlier.alpha |
Default aesthetics for outliers. Set to In the unlikely event you specify both US and UK spellings of colour, the US spelling will take precedence. Sometimes it can be useful to hide the outliers, for example when overlaying
the raw data points on top of the boxplot. Hiding the outliers can be achieved
by setting |
notch |
If |
notchwidth |
For a notched box plot, width of the notch relative to
the body (defaults to |
varwidth |
If |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
Value
Returns an object of class GeomQqboxplot
, (inherits from Geom
, ggproto
),
that renders the data for the Q-Q boxplot.
Description
The Q-Q boxplot inherits its summary statistics from the boxplot. See
geom_boxplot()
for details. The Q-Q boxplot differs from the boxplot
by using more informative whiskers than the regular boxplot.
The vertical position of the whiskers can be interpreted as it is in the boxplot, and the maximal vertical value is chosen as it is done in the regular boxplot. The horizontal positioning of the whiskers indicates the deviation of the data set of interest from some reference data set (specified as either a theoretical distribution or an actual data set). Taking the central vertical axis of the boxplot as being zero, deviations to the right indicate that those values are larger than the corresponding data points in the reference data set, where two data points correspond if their quantiles match. Deviations to the left indicate that the values are smaller than their corresponding data points. Consider a situation where your data set has fatter tails than the normal distribution. When the reference distribution is the normal distribution, then the whiskers below the box will be left of the central axis (the left tail values are smaller than they ought to be) and the whiskers above the box will be right of the central axis (the right tail values are larger than the ought to be).
In order to compare the data set of interest to the reference data set, they must be on the same scale. The Q-Q boxplot uses Tukey's g-h distribution to determine the appropriate scaling factor.
Much of the code here is a modification of the geom_boxplot() code.
Examples
p <- ggplot2::ggplot(simulated_data, ggplot2::aes(factor(group,
levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16",
"t distribution, df=8", "t distribution, df=4")), y=y))
p + geom_qqboxplot()
p + geom_qqboxplot(reference_dist = "norm")
p + geom_qqboxplot(compdata = comparison_dataset)
# geom_qqboxplot inherits all arguments from geom_boxplot, e.g.:
p + geom_qqboxplot(notch = TRUE)
p + geom_qqboxplot(varwidth=TRUE)
p + geom_qqboxplot(ggplot2::aes(color = group)) + ggplot2::guides(color=FALSE)
World Bank indicator data for Labor Force participation rates
Description
A dataset that contains participation rates (%) for ages 15-24, separated by gender, and measured in the years 2008, 2012, and 2017
Usage
indicators
Format
A data frame with 612 rows and 7 variables:
- Country Name
name of country
- Country Code
unique country identifier (string)
- Series Name
Specifies male/female
- Series Code
unique identifier for series
- year
year for data
- indicator
participation rate in percents
- log_indicator
the log of the participation rate
...
Source
https://www.worldbank.org/en/home
Neuron population firing data
Description
A dataset that contains populations of neurons from CA1 and LM and their firing rates for three situations: base firing rate, dot motion, and drifting gradient. Each row represents a neuron
Usage
population_brain_data
Format
A data frame with 13731 rows and 3 variables:
- ecephys_structure_acronym
acronym for population location
- fr_type
situation under which firing rate was recorded
- rate
the firing rate
...
Source
https://allensdk.readthedocs.io/en/latest/visual_coding_neuropixels.html
qqboxplot
package
Description
Create qq-boxplots
Simulated t-distributions to show use of q-q boxplots
Description
A dataset that contains simulated data to reproduce the simulated data figures used in our manuscript
Usage
simulated_data
Format
A data frame with 4500 rows and 2 variables:
- y
a value simulated from a distribution
- group
a string specifying the distribution from which the y value is drawn
...
Source
simulations
Neuron spiking data for neural tuning orientation
Description
A dataset that contains the number of spikes for neurons across several possible orientations of a grating
Usage
spike_data
Format
A data frame with 12800 rows and 5 variables:
- orientation
1 to 8, specifies the orientation of the grating
- nspikes
number of spikes for a single trial of 1.28 seconds for a particular orientation
- region
region of the brain where the neuron is located
...
Source
Compute values for the Q-Q Boxplot
Description
Compute values for the Q-Q Boxplot
Usage
stat_qqboxplot(
mapping = NULL,
data = NULL,
geom = "qqboxplot",
position = "dodge2",
...,
coef = 1.5,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE,
reference_dist = "norm",
confidence_level = 0.95,
numboots = 500,
qtype = 7,
compdata = NULL
)
Arguments
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
geom |
specifies the geom function to use |
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function. |
... |
Other arguments passed on to |
coef |
Length of the whiskers as multiple of IQR. Defaults to 1.5. |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
reference_dist |
Specifies theoretical reference distribution. |
confidence_level |
Sets confidence level for deviation whisker confidence bands |
numboots |
specifies the number of bootstrap draws for bootstrapped CIs needed only if compdata is not NULL |
qtype |
an integer between 1 and 9 indicating which one of the quantile algorithms to use. |
compdata |
specifies a data set to use as the reference distribution. If compdata is not NULL, the argument reference_dist will be ignored. |
Value
Returns an object of class StatQqboxplot
, (inherits from Geom
, ggproto
),
that helps to render the data for geom_qqboxplot()
.
Computed variables
stat_qqboxplot()
provides the following variables, some of which depend on the orientation:
- width
width of boxplot
- ymin or xmin
lower whisker = smallest observation greater than or equal to lower hinge - 1.5 * IQR
- lower or xlower
lower hinge, 25% quantile
- notchlower
lower edge of notch = median - 1.58 * IQR / sqrt(n)
- middle or xmiddle
median, 50% quantile
- notchupper
upper edge of notch = median + 1.58 * IQR / sqrt(n)
- upper or xupper
upper hinge, 75% quantile
- ymax or xmax
upper whisker = largest observation less than or equal to upper hinge + 1.5 * IQR