\documentclass[a4paper]{article} \usepackage[round]{natbib} \usepackage{boxedminipage} % for compare.tex \usepackage{hyperref} % for \url \bibliographystyle{abbrvnat} \usepackage{Sweave} %\VignetteIndexEntry{Introduction to the compare package} %\VignettePackage{compare} \newcommand{\pkg}[1]{{\bfseries #1}} \newcommand{\code}[1]{{\ttfamily #1}} \newcommand{\R}{{\sffamily R}} \newcommand{\dfn}[1]{\emph{#1}} \begin{document} \title{Comparing Non-Identical Objects\\ Introducing the `compare' package} \author{by Paul Murrell} \maketitle The \pkg{compare} package provides functions for comparing two \R{} objects for equality, while allowing for a range of ``minor'' differences. Objects may be reordered, rounded, or resized, they may have names or attributes removed, or they may even be coerced to a new class if necessary in order to achieve equality. The results of comparisons report not just whether the objects are the same, but also include a record of any modifications that were performed. This package was developed for the purpose of partially automating the marking of coursework involving \R{} code submissions, so functions are also provided to convert the results of comparisons into numeric grades and to provide feedback for students. \section*{Motivation} STATS 220 is a second year university course run by the Department of Statistics at the University of Auckland.\footnote{\url{http://www.stat.auckland.ac.nz/courses/stage2/\#STATS220}} The course covers a range of ``Data Technologies'', including HTML, XML, databases, SQL, and, as a general purpose data processing tool, \R{}. In addition to larger assignments, students in the course must complete short exercises in weekly computer labs. For the \R{} section of the course, students must write short pieces of \R{} code to produce specific \R{} objects. Figure \ref{figure:lab} shows two examples of basic, introductory exercises. \begin{figure*} \begin{boxedminipage}{\linewidth} \begin{minipage}[t]{.45\linewidth} \begin{enumerate} \item Write R code to create the three {\bfseries vectors} and the {\bfseries factor} shown below, with names {\tt id}, {\tt age}, {\tt edu}, and {\tt class}. You should end up with objects that look like this: \begin{verbatim} > id [1] 1 2 3 4 5 6 > age [1] 30 32 28 39 20 25 > edu [1] 0 0 0 0 0 0 > class [1] poor poor poor middle [5] middle middle Levels: middle poor \end{verbatim} \end{enumerate} \end{minipage}\hfill% \begin{minipage}[t]{.45\linewidth} \begin{enumerate} \setcounter{enumi}{1} \item Combine the objects from Question 1 together to make a {\bfseries data frame} called {\tt IndianMothers}. You should end up with an object that looks like this: \begin{verbatim} > IndianMothers id age edu class 1 1 30 0 poor 2 2 32 0 poor 3 3 28 0 poor 4 4 39 0 middle 5 5 20 0 middle 6 6 25 0 middle \end{verbatim} \end{enumerate} \end{minipage}\hspace*{\fill} \end{boxedminipage} \caption{\label{figure:lab}Two simple examples of the exercises that STATS 220 students are asked to perform.} \end{figure*} The students submit their answers to the exercises as a file containing \R{} code, which means that it is possible to recreate their answers by calling \code{source()} on the submitted files. At this point, the \R{} objects generated by the students' code can be compared with a set of model \R{} objects in order to establish whether the students' answers are correct. How this comparison occurs is the focus of this article. \subsection*{Black and white comparisons} The simplest and most strict test for equality between two objects in the base \R{} system \citep{R} is provided by the function \code{identical()}. This returns \code{TRUE} if the two objects are \emph{exactly} the same, otherwise it returns \code{FALSE}. The problem with this function is that it is very strict indeed and will fail for objects that are, for all practical purposes, the same. The classic example is the comparison of two real (floating-point) values, as demonstrated in the following code, where differences can arise simply due to the limitations of how numbers are represented in computer memory \citep[see R FAQ 7.31,][]{rfaq}. \begin{Schunk} \begin{Sinput} > identical(0.3 - 0.2, 0.1) \end{Sinput} \begin{Soutput} [1] FALSE \end{Soutput} \end{Schunk} Using the function to test for equality would clearly be unreasonably harsh when marking any student answer that involves calculating a numeric result. The \code{identical()} function, by itself, is not sufficient for comparing student answers with model answers. \subsection*{Shades of grey} The recommended solution to the problem mentioned above of comparing two floating-point values is to use the \code{all.equal()} function. This function allows for ``insignificant'' differences between numeric values, as shown below. \begin{Schunk} \begin{Sinput} > all.equal(0.3 - 0.2, 0.1) \end{Sinput} \begin{Soutput} [1] TRUE \end{Soutput} \end{Schunk} This makes \code{all.equal()} a much more appropriate function for comparing student answers with model answers. What is less well-known about the \code{all.equal()} function is that it also works for comparing other sorts of \R{} objects, besides numeric vectors, \emph{and} that it does more than just report equality between two objects. If the objects being compared have differences, then \code{all.equal()} does not simply return \code{FALSE}. Instead, it returns a character vector containing messages that describe the differences between the objects. The following code gives a simple example, where \code{all.equal()} reports that the two character vectors have different lengths, and that, of the two pairs of strings that can be compared, one pair of strings does not match. \begin{Schunk} \begin{Sinput} > all.equal(c("a", "b", "c"), c("a", "B")) \end{Sinput} \end{Schunk} {\footnotesize \begin{Schunk} \begin{Soutput} [1] "Lengths (3, 2) differ (string compare on first 2)" [2] "1 string mismatch" \end{Soutput} \end{Schunk} } % {\small This feature is actually very useful for marking student work. Information about whether a student's answer is correct is useful for determining a raw mark, but it is also useful to have information about what the student did wrong. This information can be used as the basis for assigning partial marks for an answer that is close to the correct answer, and for providing feedback to the student about where marks were lost. The \code{all.equal()} function has some useful features that make it a helpful tool for comparing student answers with model answers. However, there is an approach that can perform better than this. The \code{all.equal()} function looks for equality between two objects and, if that fails, provides information about the sort of differences that exist. An alternative approach, when two objects are not equal, is to try to \dfn{transform} the objects to make them equal, and report on which transformations were necessary in order to achieve equality. As an example of the difference between these approaches, consider the two objects below: a character vector and a factor. \begin{Schunk} \begin{Sinput} > obj1 <- c("a", "a", "b", "c") > obj1 \end{Sinput} \begin{Soutput} [1] "a" "a" "b" "c" \end{Soutput} \end{Schunk} \begin{Schunk} \begin{Sinput} > obj2 <- factor(obj1) > obj2 \end{Sinput} \begin{Soutput} [1] a a b c Levels: a b c \end{Soutput} \end{Schunk} The \code{all.equal()} function reports that these objects are different because they differ in terms of their fundamental mode---one has attributes and the other does not---and because each object is of a different class. \begin{Schunk} \begin{Sinput} > all.equal(obj1, obj2) \end{Sinput} \end{Schunk} {\footnotesize \begin{Schunk} \begin{Soutput} [1] "Modes: character, numeric" [2] "Attributes: < target is NULL, current is list >" [3] "target is character, current is factor" \end{Soutput} \end{Schunk} } % {\small The alternative approach would be to allow various transformations of the objects to see if they can be transformed to be the same. The following code shows this approach, which reports that the objects are equal, if the second one is coerced from a factor to a character vector. This is more information than was provided by \code{all.equal()} and, in the particular case of comparing student answers to model answers, it tells us a lot about how close the student got to the right answer. \begin{Schunk} \begin{Sinput} > library(compare) > compare(obj1, obj2, allowAll=TRUE) \end{Sinput} \begin{Soutput} TRUE coerced from to \end{Soutput} \end{Schunk} Another limitation of \code{all.equal()} is that it does not report on some other possible differences between objects. For example, it is possible for a student to have the correct values for an \R{} object, but have the values in the wrong order. Another common mistake is to get the case wrong in a set of string values (e.g., in a character vector or in the \code{names} attribute of an object). In summary, while \code{all.equal()} provides some desirable features for comparing student answers to model answers, we can do better by allowing for a wider range of differences between objects and by taking a different approach that attempts to transform the student answer to be the same as the model answer, if at all possible, while reporting which transformations were necessary. The remainder of this article describes the \pkg{compare} package, which provides functions for producing these sorts of comparisons. \section*{The \code{compare()} function} The main function in the \pkg{compare} package is the \code{compare()} function. This function checks whether two objects are the same and, if they are not, carries out various transformations on the objects and checks them again to see if they are the same after they have been transformed. By default, \code{compare()} only succeeds if the two objects are identical (using the \code{identical()} function) \emph{or} the two objects are numeric and they are equal (according to \code{all.equal()}). If the objects are not the same, no transformations of the objects are considered. In other words, by default, \code{compare()} is simply a convenience wrapper for \code{identical()} and \code{all.equal()}. As a simple example, the following comparison takes account of the fact that the values being compared are numeric and uses \code{all.equal()} rather than \code{identical()}. \begin{Schunk} \begin{Sinput} > compare(0.3 - 0.2, 0.1) \end{Sinput} \begin{Soutput} TRUE \end{Soutput} \end{Schunk} \subsection*{Transformations} The more interesting uses of \code{compare()} involve specifying one or more of the arguments that allow transformations of the objects that are being compared. For example, the \code{coerce} argument specifies that the second argument may be coerced to the class of the first argument. This allows for more flexible comparisons such as between a factor and a character vector. \begin{Schunk} \begin{Sinput} > compare(obj1, obj2, coerce=TRUE) \end{Sinput} \begin{Soutput} TRUE coerced from to \end{Soutput} \end{Schunk} It is important to note that there is a definite order to the objects; the \dfn{model} object is given first and the \dfn{comparison} object is given second. Transformations attempt to make the comparison object like the model object, though in a number of cases (e.g., when ignoring the case of strings) the model object may also be transformed. In the example above, the comparison object has been coerced to be the same class as the model object. The following code demonstrates the effect of reversing the order of the objects in the comparison. Now the character vector is being coerced to a factor. \begin{Schunk} \begin{Sinput} > compare(obj2, obj1, coerce=TRUE) \end{Sinput} \begin{Soutput} TRUE coerced from to \end{Soutput} \end{Schunk} Of course, transforming an object is not guaranteed to produce identical objects if the original objects are genuinely different. \begin{Schunk} \begin{Sinput} > compare(obj1, obj2[1:3], coerce=TRUE) \end{Sinput} \begin{Soutput} FALSE coerced from to \end{Soutput} \end{Schunk} Notice, however, that even though the comparison failed, the result still reports the transformation that was attempted. This result indicates that the comparison object was converted from a factor (to a character vector), but it \emph{still} did not end up being the same as the model object. A number of other transformations are available in addition to coercion. For example, differences in length, like in the last case, can also be ignored. \begin{Schunk} \begin{Sinput} > compare(obj1, obj2[1:3], + shorten=TRUE, coerce=TRUE) \end{Sinput} \begin{Soutput} TRUE coerced from to shortened model \end{Soutput} \end{Schunk} It is also possible to allow values to be sorted, or rounded, or to convert all character values to upper case (i.e., ignore the case of strings). Table \ref{table:transforms} provides a complete list of the transformations that are currently allowed (in version 0.2 of \pkg{compare}) and the arguments that are used to enable them. A further argument to the \code{compare()} function, \code{allowAll}, controls the default setting for most of these transformations, so specifying \code{allowAll=TRUE} is a quick way of enabling all possible transformations. Specific transformations can still be \emph{excluded} by explicitly setting the appropriate argument to \code{FALSE}. \begin{table*} \begin{center} \caption{\label{table:transforms}Arguments to the \code{compare()} function that control which transformations are attempted when comparing a model object to a comparison object.} \begin{tabular}{l p{.5\textwidth}} Argument & Meaning \\ \hline \code{equal} & Compare objects for ``equality'' as well as ``identity'' (e.g., use \code{all.equal()} if model object is numeric). \\[2mm] \code{coerce} & Allow coercion of comparison object to class of model object. \\[2mm] \code{shorten} & Allow either the model or the comparison to be shrunk so that the objects have the same ``size''.\\[2mm] \code{ignoreOrder} & Ignore the original order of the comparison and model objects; allow both comparison object and model object to be sorted.\\[2mm] \code{ignoreNameCase} & Ignore the case of the \code{names} attribute for both comparison and model objects; the \code{name} attributes for both objects are converted to upper case. \\[2mm] \code{ignoreNames} & Ignore any differences in the \code{names} attributes of the comparison and model objects; any \code{names} attributes are dropped. \\[2mm] \code{ignoreAttrs} & Ignore all attributes of both the comparison and model objects; all attributes are dropped.\\[2mm] \code{round${}^*$} & Allow numeric values to be rounded; either \code{FALSE} (the default), or an integer value giving the number of decimal places for rounding, or a function of one argument, e.g., \code{floor}. \\[2mm] \code{ignoreCase${}^*$} & Ignore the case of character vectors; both comparison and model are converted to upper case. \\[2mm] \code{trim${}^*$} & Ignore leading and trailing spaces in character vectors; leading and trailing spaces are trimmed from both comparison and model.\\[2mm] \code{ignoreLevelOrder${}^*$} & Ignore original order of levels of factor objects; the levels of the comparison object are sorted to the order of the levels of the model object.\\[2mm] \code{dropLevels${}^*$} & Ignore any unused levels in factors; unused levels are dropped from both comparison and model objects. \\[2mm] \code{ignoreDimOrder} & Ignore the order of dimensions in array, matrix, or table objects; the dimensions are reordered by name. \\[2mm] \code{ignoreColOrder} & Ignore the order of columns in data frame objects; the columns in the comparison object are reordered to match the model object.\\[2mm] \code{ignoreComponentOrder} & Ignore the order of components in a list object; the components are reordered by name. \\ \hline \multicolumn{2}{l}{${}^*$These transformations only occur if \code{equal=TRUE}}\\ \end{tabular} \end{center} \end{table*} The \code{equal} argument is a bit of a special case because it is \code{TRUE} by default, whereas almost all others are \code{FALSE}. The \code{equal} argument is also especially influential because objects are compared after every transformation and this argument controls what sort of comparison takes place. Objects are always compared using \code{identical()} first, which will only succeed if the objects have exactly the same representation in memory. If the test using \code{identical()} fails and \code{equal=TRUE}, then a more lenient comparison is also performed. By default, this just means that numeric values are compared using \code{all.equal()}, but various other arguments can extend this to allow things like differences in case for character values (see the asterisked arguments in Table \ref{table:transforms}). The \code{round} argument is also special because it always defaults to \code{FALSE}, even if \code{allowAll=TRUE}. This means that the \code{round} argument must be specified explicitly in order to enable rounding. The default is set up this way because the value of the \code{round} argument is either \code{FALSE} or an integer value specifying the number of decimal places to round to. For this argument, the value \code{TRUE} corresponds to rounding to zero decimal places. Finally, there is an additional argument \code{colsOnly} for comparing data frames. This argument controls whether transformations are only applied to columns (and not to rows). For example, by default, a data frame will only allow columns to be dropped, but not rows, if \code{shorten=TRUE}. Note, however, that \code{ignoreOrder} means ignore the order of \emph{rows} for data frames and \code{ignoreColOrder} must be used to ignore the order of columns in comparisons involving data frames. \subsection*{The \code{compareName()} function} The \code{compareName()} function offers a slight variation on the \code{compare()} function. For this function, only the \emph{name} of the comparison object is specified, rather than an explicit object. The advantage of this is that it allows for variations in case in the names of objects. For example, a student might create a variable called \code{indianMothers} rather than the desired \code{IndianMothers}. This case-insensitivity is enabled via the \code{ignore.case} argument. Another advantage of this function is that it is possible to specify, via the \code{compEnv} argument, a particular environment to search within for the comparison object (rather than just the current workspace). This becomes useful when checking the answers from several students because each student's answers may be generated within a separate environment in order to avoid any interactions between code from different students. The following code shows a simple demonstration of this function, where a comparison object is created within a temporary environment and the name of the comparison object is upper case when it should be lowercase. \begin{Schunk} \begin{Sinput} > tempEnv <- new.env() > with(tempEnv, X <- 1:10) > compareName(1:10, "x", compEnv=tempEnv) \end{Sinput} \begin{Soutput} TRUE renamed object \end{Soutput} \end{Schunk} Notice that, as with the transformations in \code{compare()}, the \code{compareName()} function records whether it needed to ignore the case of the name of the comparison object. \subsection*{A pathological example} This section shows a manufactured example that demonstrates some of the flexibility of the \code{compare()} function. We will compare two data frames that have a number of simple differences. The model object is a data frame with three columns: a numeric vector, a character vector, and a factor. \begin{Schunk} \begin{Sinput} > model <- + data.frame(x=1:26, + y=letters, + z=factor(letters), + row.names=letters, + stringsAsFactors=FALSE) \end{Sinput} \end{Schunk} The comparison object contains essentially the same information, except that there is an extra column, the column names are uppercase rather than lowercase, the columns are in a different order, the \code{y} variable is a factor rather than a character vector, and the \code{z} variable is a character variable rather than a factor. The \code{y} variable and the row names are also uppercase rather than lowercase. \begin{Schunk} \begin{Sinput} > comparison <- + data.frame(W=26:1, + Z=letters, + Y=factor(LETTERS), + X=1:26, + row.names=LETTERS, + stringsAsFactors=FALSE) \end{Sinput} \end{Schunk} The \code{compare()} function can detect that these two objects are essentially the same as long as we reorder the columns (ignoring the case of the column names), coerce the \code{y} and \code{z} variables, drop the extra variable, ignore the case of the \code{y} variable, and ignore the case of the row names. \begin{Schunk} \begin{Sinput} > compare(model, comparison, allowAll=TRUE) \end{Sinput} \begin{Soutput} TRUE renamed reordered columns [Y] coerced from to [Z] coerced from to shortened comparison [Y] ignored case renamed rows \end{Soutput} \end{Schunk} Notice that we have used \code{allowAll=TRUE} to allow \code{compare()} to attempt all possible transformations at its disposal. \section*{Comparing files of \R{} code} Returning now to the original motivation for the \pkg{compare} package, the \code{compare()} function provides an excellent basis for determining not only whether a student's answers are correct, but also how much incorrect answers differ from the model answer. As described earlier, submissions by students in the STATS 220 course consist of files of \R{} code. Marking these submissions consists of using \code{source()} to run the code, then comparing the resulting objects with model answer objects. With approximately 100 students in the STATS 220 course, with weekly labs, and with multiple questions per lab, each of which may contain more than one \R{} object, there is a reasonable marking burden. Consequently, there is a strong incentive to automate as much of the marking process as possible. \subsection*{The \code{compareFile()} function} The \code{compareFile()} function can be used to run \R{} code from a specific file and compare the results with a set of model answers. This function requires three pieces of information: the name of a file containing the ``comparison code'', which is run within a local environment, using \code{source()}, to generate the comparison values; a vector of ``model names'', which are the names of the objects that will be looked for in the local environment after the comparison code has been run; and the model answers, either as the name of a binary file to \code{load()}, or as the name of a file of \R{} code to \code{source()}, or as a list object containing the ready-made model answer objects. Any argument to \code{compare()} may also be included in the call. Once the comparison code has been run, \code{compareName()} is called for each of the model names and the result is a list of \code{"comparison"} objects. As a simple demonstration, consider the basic questions shown in Figure \ref{figure:lab}. The model names in this case are the following: \begin{Schunk} \begin{Sinput} > modelNames <- c("id", "age", + "edu", "class", + "IndianMothers") \end{Sinput} \end{Schunk} One student's submission for this exercise is in a file called \code{student1.R}, within a directory called \code{Examples}. The model answer is in a file called \code{model.R} in the same directory. We can evaluate this student's submission and compare it to the model answer with the following code: \begin{Schunk} \begin{Sinput} > compareFile(file.path("Examples", + "student1.R"), + modelNames, + file.path("Examples", + "model.R")) \end{Sinput} \begin{Soutput} $id TRUE $age TRUE $edu TRUE $class FALSE $IndianMothers FALSE object not found \end{Soutput} \end{Schunk} This provides a strict check and shows that the student got the first three problems correct, but the last two wrong. In fact, the student's code completely failed to generate an object with the name \code{IndianMothers}. We can provide extra arguments to allow transformations of the student's answers, as in the following code: \begin{Schunk} \begin{Sinput} > compareFile(file.path("Examples", + "student1.R"), + modelNames, + file.path("Examples", + "model.R"), + allowAll=TRUE) \end{Sinput} \begin{Soutput} $id TRUE $age TRUE $edu TRUE $class TRUE reordered levels $IndianMothers FALSE object not found \end{Soutput} \end{Schunk} This shows that, although the student's answer for the \code{class} object was not perfect, it was pretty close; it just had the levels of the factor in the wrong order. \subsection*{The \code{compareFiles()} function} The \code{compareFiles()} function builds on \code{compareFile()} by allowing a vector of comparison file names. This allows a whole set of student submissions to be tested at once. The result of this function is a list of lists of \code{"comparison"} objects and a special print method provides a simplified view of this result. Continuing the example from above, the \code{Examples} directory contains submissions from a further four students. We can compare all of these submissions with the model answers and produce a summary of the results with a single call to \code{compareFiles()}. The appropriate code and output are shown in Figure \ref{figure:comparefiles}. \begin{figure*} \begin{Schunk} \begin{Sinput} > files <- list.files("Examples", + pattern="^student[0-9]+[.]R$", + full.names=TRUE) > results <- compareFiles(files, + modelNames, + file.path("Examples", "model.R"), + allowAll=TRUE, + resultNames=gsub("Examples.|[.]R", "", files)) > results \end{Sinput} \end{Schunk} {\small \begin{Schunk} \begin{Soutput} id age edu class IndianMothers student1 TRUE TRUE TRUE TRUE reordered levels FALSE object not found student2 TRUE TRUE TRUE TRUE TRUE student3 TRUE TRUE TRUE TRUE coerced from to FALSE object not found student4 TRUE TRUE TRUE TRUE coerced from to TRUE renamed object student5 TRUE TRUE TRUE FALSE object not found FALSE object not found \end{Soutput} \end{Schunk} } \caption{\label{figure:comparefiles}% Using the \code{compareFiles()} function to run \R{} code from several files and compare the results to model objects. The result of this sort of comparison can easily get quite wide, so it is often useful to print the result with \code{options(width)} set to some large value and using a small font, as has been done here.} \end{figure*} The results show that most students got the first three problems correct. They had more trouble getting the fourth problem right, with one getting the factor levels in the wrong order and two others producing a character vector rather than a factor. Only one student, \code{student2}, got the final problem exactly right and only one other, \code{student4}, got essentially the right answer, though this student spelt the name of the object wrong. \section*{Assigning marks and\\giving feedback} The result returned by \code{compareFiles()} is a list of lists of comparison results, where each result is itself a list of information including whether two objects are the same and a record of how the objects were transformed during the comparison. This represents a wealth of information with which to assess the performance of students on a set of \R{} exercises, but it can be a little unwieldly to deal with. The \pkg{compare} package provides further functions that make it easier to deal with this information for the purpose of determining a final mark and for the purpose of providing comments for each student submission. In order to determine a final mark, we use the \code{questionMarks()} function to specify which object names are involved in a particular question, to provide a maximum mark for the question, and to specify a set of rules that determine how many marks should be deducted for various deviations from the correct answers. The \code{rule()} function is used to define a marking rule. It takes an object name, a number of marks to deduct if the comparison for that object is \code{FALSE}, plus any number of transformation rules. The latter are generated using the \code{transformRule()} function, which associates a regular expression with a number of marks to deduct. If the regular expression is matched in the record of transformations for a comparison, then the appropriate number of marks are deducted. A simple example, based on the second question in Figure \ref{figure:lab}, is shown below. This specifies that the question only involves an object named \code{IndianMothers}, that there is a maximum mark of 1 for this question, and that 1 mark is deducted if the comparison is \code{FALSE}. \begin{Schunk} \begin{Sinput} > q2 <- + questionMarks("IndianMothers", + maxMark=1, + rule("IndianMothers", 1)) \end{Sinput} \end{Schunk} The first question from Figure \ref{figure:lab} provides a more complex example. In this case, there are four different objects involved and the maximum mark is 2. The rules below specify that any \code{FALSE} comparison drops a mark \emph{and} that, for the comparison involving the object named \code{"class"}, a mark should also be deducted if coercion was necessary to get a \code{TRUE} result. \begin{Schunk} \begin{Sinput} > q1 <- + questionMarks( + c("id", "age", "edu", "class"), + maxMark=2, + rule("id", 1), + rule("age", 1), + rule("edu", 1), + rule("class", 1, + transformRule("coerced", 1))) \end{Sinput} \end{Schunk} Having set up this marking scheme, marks are generated using the \code{markQuestions()} function, as shown by the following code. \begin{Schunk} \begin{Sinput} > markQuestions(results, q1, q2) \end{Sinput} \begin{Soutput} id-age-edu-class IndianMothers student1 2 0 student2 2 1 student3 1 0 student4 1 1 student5 1 0 \end{Soutput} \end{Schunk} For the first question, the third and fourth students lose a mark because of the coercion, and the fifth student loses a mark because he has not generated the required object. A similar suite of functions are provided to associate comments, rather than mark deductions, with particular transformations. The following code provides a simple demonstration. \begin{Schunk} \begin{Sinput} > q1comments <- + questionComments( + c("id", "age", "edu", "class"), + comments( + "class", + transformComment( + "coerced", + "'class' is a factor!"))) > commentQuestions(results, q1comments) \end{Sinput} \begin{Soutput} id-age-edu-class student1 "" student2 "" student3 "'class' is a factor!" student4 "'class' is a factor!" student5 "" \end{Soutput} \end{Schunk} In this case, we have just generated feedback for the students who generated a character vector instead of the desired factor in Question 1 of the exercise. \section*{Summary, discussion, and\\future directions} The \pkg{compare} package is based around the \code{compare()} function, which compares two objects for equality and, if they are not equal, attempts to transform the objects to make them equal. It reports whether the comparison succeeded overall and provides a record of the transformations that were attempted during the comparison. Further functions are provided on top of the \code{compare()} function to facilitate marking exercises where students in a class submit \R{} code in a file to create a set of \R{} objects. This article has given some basic demonstrations of the use of the \pkg{compare()} package for comparing objects and marking student submissions. The package could also be useful for the students themselves, both to check whether they have the correct answer and to provide feedback about how their answer differs from the model answer. More generally, the \code{compare()} function may have application wherever the \code{identical()} and \code{all.equal()} functions are currently in use. For example, it may be useful when debugging code and for performing regression tests as part of a quality control process. Obvious extensions of the \pkg{compare} package include adding new transformations and providing comparison methods for other classes of objects. More details about how the package works and how these extensions might be developed are discussed in the vignette, ``Fundamentals of the Compare Package'', which is installed as part of the \pkg{compare} package. \section*{Acknowledgements} Many thanks to the editors and anonymous reviewers for their useful comments and suggestions, on both this article and the \pkg{compare} package itself. \bibliography{compare} \end{document}