\documentclass{article} %\VignetteIndexEntry{The compare package fundamentals} %\VignettePackage{compare} \title{More details about the compare package} \author{Paul Murrell} \SweaveOpts{keep.source=TRUE} \usepackage{Rd} \begin{document} \maketitle <>= library(compare) @ This document provides more information about how the \pkg{compare} package works. It should only be read after the accompanying vignette ``Introduction to the compare package''. We will look in more detail at how the \code{compare()} function works, plus we will explore some of the lower-level functions that the \code{compare()} function calls. \subsection*{The \code{"comparison"} class} The value returned by \code{compare()} is an object of class \code{"comparison"}. Importantly, this is not simply a logical value. In order to test whether the comparison was successful overall, for example as the condition in an \code{if} statement, you must use the \code{isTRUE()} function. <<>>= isTRUE(compare(1:10, 1:10)) @ The \code{transforms()} function can be used to extract the vector of transformations from a \code{"comparison"} object. <<>>= obj1 <- c("a", "a", "b", "c") obj1 <<>>= obj2 <- factor(obj1) obj2 <<>>= transforms(compare(obj1, obj2[1:3], allowAll=TRUE)) @ \subsection*{Order and persistence of transformations} The \code{compare()} function performs transformations in a particular order. Any rounding of numeric values, trimming or upper-casing of strings, sorting or dropping of factor levels, reordering of dimensions for arrays, matrices, and tables, and reordering of data frame columns or list components occurs as part of the test for equality. This means that these transformations are only available if \code{equal=TRUE}, but also means that these transformations will be applied again after every other transformation that is attempted. The remaining transformations occur in the following order: coerce, shorten, ignore sort order, ignore case of names, ignore names altogether, ignore attributes altogether. The justification for this order is that sorting only makes sense once the objects are of the same class and size, attributes are less fundamental than the core data structure so they should come later, and names are just a special case of general attributes, so they should come earlier. In general, a transformation is persistent. For example, if coercion has been attempted, but the objects being compared are not the same after coercion, then sorting the objects will be attempted and \emph{this sorting will occur on the coerced objects}. The exception to this rule is that any transformations applied during a test for equality are \emph{not} persistent. This is because tests for equality are repeated after every other transformation. The following manufactured example demonstrates these rules about the order and persistence of transformations. In this case, the comparison object is a different class from the model object, the comparison is in a different order from the model, \emph{and} the comparison needs to be rounded to be equal to the model. <<>>= compare(as.numeric(1:10), as.character(10:1 + .1), round=TRUE, coerce=TRUE, ignoreOrder=TRUE) @ First of all, the objects are tested for equality (\code{equal=TRUE} by default). In this case, despite the fact that \code{round=TRUE}, no rounding occurs because the comparison is not numeric. The objects are not the same, but \code{coerce=TRUE} so the comparison object is coerced to a numeric object \emph{and the two objects are checked again for equality}. Furthermore, because \code{round=TRUE}, the coerced comparison values are rounded as well. The objects are still not the same, but because \code{ingoreOrder=TRUE} the comparison object is now sorted. This sorting occurs on the \emph{coerced} but \emph{not rounded} comparison object because equality transformations are \emph{not} persistent, but all other transformations \emph{are} persistent. The coerced and sorted comparison object is then tested for equality with the model object, which again involves rounding the comparison object, and this time the objects are the same. The overall result is success and the transformations that lead to success were coercion, followed by sorting, followed by rounding. If the order imposed by \code{compare()} is not appropriate, it is possible to resort to performing the transformations individually (see the next section). \section*{Extending the \pkg{compare} package} The \code{compare()} function is built on a set of functions that perform the individual transformations. This means that it should be relatively straightforward to produce an alternative to \code{compare()} by simply reordering the calls to transform functions \emph{and} it should be relatively easy to extend the comparisons by writing new transformation functions. \subsection*{Standalone comparisons} The \code{compare()} function is built upon a set of standalone comparison functions. Table \ref{table:comparefuns} lists the standalone functions currently available. \begin{table} \caption{\label{table:comparefuns}Standalone comparison functions upon which the compare() function is built.} \begin{center} \begin{tabular}{l p{.6\textwidth}} Name & Description \\\hline \code{compareIdentical()} & Test whether two objects are identical. \\[2mm] \code{compareEqual()} & Compare whether two objects are equal. \\ & Rounds numeric values, trims leading and trailing whitespace and ignores case in strings, drops unused levels and ignores level order in factors, ignores dimension order in arrays (and matrices and tables), orders columns or components by name (ignoring case) for lists and data frames. \\[2mm] \code{compareCoerce()} & If necessary, coerce comparison object to class of model object, then compare for equality. \\ & Orders columns or components by name (ignoring case) for lists and data frames. \\[2mm] \code{compareShorten()} & If necessary, shorten the longer of the model and comparison objects so that the two objects are the same ``size''. \\ & For arrays, drops entire dimensions, for matrices, forces comparison to be two-dimensional, and for tables, collapses (sums) across extra dimensions. For data frames will drop columns and rows. Orders columns or components by name (ignoring case) for lists and data frames. \\[2mm] \code{compareIgnoreOrder()} & If necessary, sort both model and comparison objects, then compare for equality. \\ & Orders columns or components by name (ignoring case) for lists and data frames. \\[2mm] \code{compareIgnoreNameCase()} & If necessary, upper cases name attributes of both model and comparison, then compare for equality. \\ & For data frames, upper cases rownames as well as column names. Orders columns or components by name (ignoring case) for lists and data frames. \\[2mm] \code{compareIgnoreNames()} & If necessary, drops name attributes from both model and comparison then compare for equality. \\ & Orders columns or components by name (ignoring case) for lists and data frames. \\[2mm] \code{compareIgnoreAttrs()} & If necessary, drop all attributes from both model and comparison, then compare for equality. \\ \hline \end{tabular} \end{center} \end{table} The \code{compareIdentical()} function is just a wrapper for \code{identical()} that returns a \code{"comparison"} object as the result (so that it can play nicely with the other comparison functions. The \code{compareEqual()} function relaxes the comparison (depending on the arguments it is given) and allows for minor differences. All of the other comparison functions call \code{compareIdentical()}, and if that fails, and \code{equal=TRUE}, they call \code{compareEqual()}. These functions allow individual transformations to be attempted in isolation. A basic design feature of these functions is that they attempt to check whether they need to perform their transformation before they do it so that only necessary or potentially beneficial trasformations are attempted. For example, if two objects are of the same class, \code{compareCoerce()} will \emph{not} attempt to coerce the comparison object. A more general design feature is that the transformations have been broken into separate functions on the basis of orthogonality. Every effort is made to make the transformations independent in the sense that one transformation does not make any subsequent transformation impossible or nonsensical. As an example, a transformation that sorts objects is completely compatible with later dropping the names attributes from the objects. All of these comparison functions are generic so that appropriate methods can be written for different classes. There are methods for the basic vector types and for arrays, matrices, tables, data frames, and lists. By having them as standalone functions, new methods can easily be developed and incorporated. In general, comparisons involving recursive objects (data frames and lists) will transform columns or components instead of or as well as transforming the overall object. For example, \code{compareCoerce().data.frame} coerces the overall comparison object to a data frame \emph{and} all columns of the comparison to the same classes as the corresponding columns of the model data frame. These recursive comparison methods should provide the option of reordering the columns or components by name before performing transformations on the columns or components. All comparisons other than \code{compareIdentical()} and \code{compareEqual()} should return a ``partial'' result as well as a full record of transformations and transformed objects. This partial result will contain transformations and transformed objects from the primary transformation for that function, but \emph{not} any subsequent transformations due to \code{compareEqual()}. This allows for calling a sequence of standalone comparison functions without repeatedly recording unsuccessful transformations performed by \code{compareEqual()}. The utility function \code{same()} takes care of this issue automatically. \subsection*{More about the \code{"comparison"} class} In addition to the overall result and the transformations attempted during a comparison, the \code{"comparison"} class also contains a copy of the transformed model and comparison objects. These objects provide a record of what the original objects look like after a \emph{successful} transformation (i.e., when the transformed objects are equal). In addition to this information, a set of ``partial'' results are also stored. These are the transformations, plus the transformed model and comparison objects, \emph{without} any transformations that were carried out as part of the test for equality. These objects are useful following an \emph{unsuccessful} comparison, as the basis for further transformations. \subsection*{Combining comparisons} The \code{compare()} function works by calling the standalone comparison functions in sequence until a successful result is achieved (or all possible transformations have been attempted). The calls are arranged in the following general format: \begin{verbatim} comp <- comparisonA(model, object, ...) if (!comp$result && doComparisonB) { comp <- comparisonB(comp$tMpartial, comp$tCpartial, comp$partialTransform, ...) } \end{verbatim} If the first comparison is successful, the full result is returned, so all transformations, including anything by \code{compareEqual()} are reported. However, if the first comparison fails, only the partial result is passed on; \code{compareEqual()} will be called again and will have another chance to try all of its transformations. This basic pattern can easily be implemented ``by hand'' to produce a specific subset and a specific ordering of standalone comparisons, or as the basis of a variation on the \code{compare()} function. It is also a simple pattern to follow if new transformations need to be inserted into or added onto the current set available in \code{compare()}. \section*{A worked example} This section describes the addition of comparison methods for \code{"list"} objects to give an idea of the design considerations that have gone into the existing code and as a template for the possible addition of methods for further classes. This should be read in conjunction with the relevant source code. There is no need for an \code{identical.list()} method. The default, which calls \code{identical()} works already for lists and does the appropriate thing; tests whether the model and comparison objects are exactly the same. The \code{compareEqual()} method differs from \code{compareIdentical()} because it allows the model and comparison objects to have minor differences. In the case of a list, this means that we will allow two lists to be the equal if they consist of the same number of components \emph{and} the names of the components are the same, even though they may be in a different order and they may even differ in terms of case. Of course, the actual components themselves must also be equal, so this function calls \code{compareEqual()} for each pair of model-component and comparison-component. For additional flexibility, the argument \code{recurseFun} actually specifies which function is called on the pairs of components. This allows \code{compare()} to specify itself as the recursion function. The relaxation in terms of order and case of component names is not ``on'' by default; arguments are provided to enable these relaxations. NOTE that the uppercasing of names and reordering of names is NOT recorded in the partial results (i.e., these transformations are NOT persistent). The result is a \code{"multipleComparison"}, which includes not only the overall result, but also a breakdown per component of which components were equal. The first step in \code{compareCoerce.list()} is to make sure that the comparison object is a list, coercing if necessary. After that, things run very similarly to \code{compareEqual()}; we reorder components by name if possible and necessary, then we attempt to compare all components of the model and comparison, coercing components as necessary. The one major difference is that any transformations---coercion and reordering of components---\emph{are} persistent this time, so become part of the partial result. The \code{compareShorten()} method starts to get a bit easier. Again, we reorder components if necessary, but then all we need to do is drop any extra components and call the \code{same()} function to do the identical/equal check on the resulting model and comparison (which are now of the same length). \code{compareIgnoreOrder()} is even easier; we just reorder the components as before, then call \code{same()}. Likewise \code{compareIgnoreNameCase}. For \code{compareIgnoreNames()} there's just the additional action of dropping names attributes \emph{after} ordering by name (in case \emph{some} names are in common, but other names conflict). The default \code{compareIgnoreAttrs()} will do for lists. Finally, for the \code{compare()} function, we need to add the \code{ignoreComponentOrder} argument so that this can be included in a general comparison, with this argument defaulting to the current setting of \code{allowAll}. \end{document}