% -*- mode: noweb; noweb-default-code-mode: R-mode; -*- %\VignetteIndexEntry{The hyper2 package} %\VignetteDepends{hyper2} %\VignetteKeywords{hyper2,dirichlet,Bradley-Terry, reified Bradley=Terry} %\VignettePackage{hyper2} \documentclass[nojss]{jss} \usepackage{dsfont} \usepackage{bbm} \usepackage{amsfonts} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amssymb} \usepackage{yfonts} \usepackage{wrapfig} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% just as usual \author{Robin K. S. Hankin\\University of Stirling} \title{Partial rank data with the \pkg{hyper2} package: likelihood functions for generalized Bradley-Terry models} %% for pretty printing and a nice hypersummary also set: \Plainauthor{Robin K. S. Hankin} \Plaintitle{A Generalization of the Dirichlet Distribution} \Shorttitle{A Generalization of the Dirichlet Distribution} %% an abstract and keywords \Abstract{ Here I present the \pkg{hyper2} package for generalized Bradley-Terry models and analyze two situations: single scull rowing, and the competitive cooking game show MasterChef Australia. A number of natural statistical hypotheses may be tested straightforwardly using the software. To cite the package in publications, use \citet{hankin2017_rnw}, on which this work is based. } \Keywords{Dirichlet distribution, hyperdirichlet distribution, combinatorics, \proglang{R}, multinomial distribution, constrained optimization, Bradley-Terry} \Plainkeywords{Dirichlet distribution, hyperdirichlet distribution, combinatorics, R, multinomial distribution, constrained optimization, Bradley-Terry} %% publication information %% NOTE: This needs to filled out ONLY IF THE PAPER WAS ACCEPTED. %% If it was not (yet) accepted, leave them commented. %% \Volume{13} %% \Issue{9} %% \Month{September} %% \Year{2004} %% \Submitdate{2004-09-29} %% \Acceptdate{2004-09-29} %% The address of (at least) one author should be given %% in the following format: \Address{ Robin K. S. Hankin\\ University of Stirling\\ E-mail: \email{hankin.robin@gmail.com}\\ } %% It is also possible to add a telephone and fax number %% before the e-mail in the following format: %% Telephone: +43/1/31336-5053 %% Fax: +43/1/31336-734 %% for those who use Sweave please include the following line (with % symbols): %% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newcommand{\bolda}{\mathbf a} \newcommand{\boldp}{\mathbf p} \newcommand{\boldV}{\mathbf V} \newcommand{\boldalpha}{\mbox{\boldmath$\alpha$}} \newcommand{\boldzero}{\mathbf 0} \newcommand{\mcf}{\mathcal F} \newcommand{\mcm}{\mathcal M} \newcommand{\mcs}{\mathcal S} \newcommand{\mcw}{\mathcal W} \newcommand{\pG}{p_{\mbox{\frakfamily g}}} \SweaveOpts{} \begin{document} \SweaveOpts{concordance=TRUE} \section{Introduction} <>= library("hyper2") set.seed(0) @ <>= calc_from_scratch <- FALSE @ <>= ignore <- require(magrittr,quietly=TRUE) @ \section{Introduction: the Bradley-Terry model} \setlength{\intextsep}{0pt} \begin{wrapfigure}{r}{0.2\textwidth} \begin{center} \includegraphics[width=1in]{\Sexpr{system.file("help/figures/hyper2.png",package="hyper2")}} \end{center} \end{wrapfigure} The Bradley-Terry model for datasets involving paired comparisons has wide uptake in the {R} community. However, existing functionality\footnote{In theory, the \pkg{hyperdirichlet} package~\citep{hankin2010} provides similar functionality but it is slow and inefficient. It is limited to a small number of players and cannot cope with the examples considered here. Also, the name-based system of \pkg{hyper2} is considerably more powerful and natural than the number-based system of \pkg{hyperdirichlet}.} is restricted to paired comparisons. The canonical problem is to consider~$n$ players who compete against one another; the basic inference problem is to estimate numbers~$\mathbf{p}=\left({p_1,\ldots,p_n}\right)$, $p_i\geqslant 0$, $\sum p_i=1$ which correspond to player ``strengths''. Information about the~$p_i$ may be obtained from the results of paired comparisons between the players. Applications are legion. The technique is widely used in a competitive sport context~\citep{turner2012}, in which matches are held between two opposing individuals or teams. It can also be applied to consumer choice experiments in which subjects are asked to choose a favourite from among two choices~\citep{hatzinger2012}, in which case the~$p_i$ are known as ``worth parameters''. If player~$i$ competes against player~$j$, and wins with probability~$P_{ij}$ then the likelihood function for~$p_1,\ldots p_n$ corresponding to a win for~$i$ is~$\frac{p_i}{p_i+p_j}$. As~\cite{turner2012} point out, this may be expressed as \[ \operatorname{logit}\left(P_{ij}\right)=\log p_i-\log p_j \] \noindent and this fact may be used to estimate~$\mathbf{p}$ using generalized linear models. However, consider the case where three competitors, $i$, $j$, and~$k$ compete. The probability that~$i$ wins is then~$\frac{p_i}{p_i+p_j+p_k}$~\citep{luce1959}; but there is no simple way to translate this likelihood function into a GLM. However, working directly with the likelihood function for~$\mathbf{p}$ has several advantages which are illustrated below. The resulting likelihood functions may readily be generalized to accommodate more general order statistics, as in a race. In addition, likelihood functions may be specified for partial order statistics; also, observations in which a \emph{loser} is identified may be given a likelihood function using natural {R} idiom. \subsection{Further generalizations} Observing the winner~$w$ from a preselected set of competitors~$\mathcal{C}$ has a likelihood function of~$p_w/\sum_{i\in\mathcal{C}} p_i$. But consider a more general situation in which two disjoint teams $\mathcal{A}$ and~$\mathcal{B}$ compete; this would have likelihood~$\sum_{i\in\mathcal{A}} p_i/\sum_{i\in\mathcal{A}\cup\mathcal{B}} p_i$. Such datasets motivate consideration of likelihood functions~$\mathcal{L}\left(\cdot\right)$ with \begin{equation}\label{hyper2likelihood} \mathcal{L}\left(\mathbf{p}\right)= \prod_{s\in \mathcal{O}}\left({\sum_{i\in s}}p_i\right)^{n_s} \end{equation} \noindent where~$\mathcal{O}$ is a set of observations and~$s$ a subset of~$\left[n\right]=\left\{1,2,\ldots,n\right\}$; numbers~$n_s$ are integers which may be positive or negative. The approach adopted by the~\pkg{hyperdirichlet} package is to store each of the~$2^n$ possible subsets of~$\left[n\right]$ together with an exponent: \begin{equation}\label{hyperdirichletlikelihood} \prod_{s\in 2^{\left[n\right]}}\left({\sum_{i\in s}}p_i\right)^{n_s}. \end{equation} \noindent but this was noted as being needlessly memory intensive and slow; it is limited, in practice, to~$n\leqslant 9$. Use of the more efficient equation~\ref{hyper2likelihood} necessitates some mechansim of keeping track of which subsets of~$\left[n\right]$ have nonzero powers. \section{The {hyper2} package} One such mechanism is furnished by the \code{C++} Standard Template Library's ``map'' class \citep{musser2009} to store and retrieve elements. A \emph{map} is an associative container that stores values indexed by a key, which is used to sort and uniquely identify the values. In the package, the key is a (STL) \code{set} of strictly positive integers~$\leqslant n$. The relevant \code{typedef} statements are: \begin{verbatim} typedef set bracket; typedef map hyper2; \end{verbatim} \noindent In the \code{STL}, a map object stores keys and associated values in whatever order the software considers to be most propitious. This allows faster access and modification times but the order in which the maps, and indeed the elements of a set, are stored is not defined. In the case of likelihood functions such as Equation~\ref{hyper2likelihood}, this does not affect the value of the expressions, as multiplication and addition are associative and commutative operations. The \pkg{hyper2} package follows \code{disordR} discipline \citep{hankin2022_disordR}. \section{The package in use} \begin{table} \centering \begin{tabular}{|ccc|c|}\hline Topalov & Anand & Karpov & total\\ \hline 22 & 13 & - & 35\\ - & 23 & 12 & 35\\ 8 & - & 10 & 18 \\ \hline 30 & 36 & 22 & 88 \\ \hline \end{tabular} \caption{Results of 88 chess matches \label{chess} (dataset \code{chess} in the \pkg{aylmer} package) between three Grandmasters; entries show number of games won up to 2001 (draws are discarded). Topalov beats Anand 22-13; Anand beats Karpov 23-12; and Karpov beats Topalov 10-8} \end{table} Consider the \code{Chess} dataset of the \pkg{hyperdirchlet} package, in which matches between three chess players are tabulated (table~\ref{chess}). The Bradley-Terry model~\citep{bradley1952} is appropriate here~\citep{caron2012}, and the \pkg{hyper2} package provides a likelihood function for the strengths of the players, $p_1,p_2,p_3$ with~$p_1+p_2+p_3=1$. A likelihood function might be \[ \frac{p_1^{30}p_2^{36}p_3^{22}}{ \left(p_1+p_2\right)^{35} \left(p_2+p_3\right)^{35} \left(p_1+p_3\right)^{18} } \] Using the \pkg{hyper2} package, the \proglang{R} idiom to create this likelihood function is as follows: <>= chess <- hyper2() chess["Topalov"] <- 30 chess["Anand" ] <- 36 chess["Karpov" ] <- 22 chess[c("Topalov","Anand" )] <- -35 chess[c("Anand","Karpov" )] <- -35 chess[c("Karpov","Topalov")] <- -18 chess @ The internal representation of \code{hyper2} objects is that of a \proglang{map}, an associative array implemented as part of the standard template library (\proglang{STL}) of \proglang{C++}. It is thus an unordered collection of (\emph{key}, \emph{value}) pairs. In this case the \emph{value} is the power and the \emph{key} is the content of the bracket, which is a subset of $\left\lbrace\mbox{Topalov},\mbox{Anand},\mbox{Karpov}\right\rbrace$. The \proglang{stl} provides a \proglang{set} class for efficient manipulation of such objects, and this is used in the package. One side-effect of using this system is that the order of the bracket-power key-value pairs is not preserved; above, note how the terms appear in an essentially random order. Also note that the use of the \proglang{set} class means that the \proglang{R} idiom is insensitive to the order of the terms within a bracket: <>= chess[c("Karpov","Topalov")] chess[c("Topalov","Karpov")] @ The package can calculate log-likelihoods: <>= loglik(c(1/3,1/3),chess) @ [the first argument of function \code{loglik()} is a vector of length 2, third element of $\boldp$ being the ``fillup'' value~\citep{aitchison1986}]; the gradient of the log-likelihood is given by function \code{gradient()}: <>= gradient(chess,c(1/3,1/3)) @ Such functionality allows the rapid location of the maximum likelihood estimate for~$\boldp$: <>= maxp(chess) @ \section{Men's single sculling in the 2016 Summer Olympic Games}%' In this section, I will take results from the 2016 Summer Olympic Games and create a likelihood function for the finishing order in Men's single sculling. In Olympic sculling, up to six individual competitors race a small boat called a scull over a course of length 2\,km; the object is to cross the finishing line first. Note that actual timings are irrelevant, given the model, as the sufficient statistic is the order in which competitors cross the finishing line. The 2016 Summer Olympics is a case in point: the gold and silver medallists finished less than~5 milliseconds apart, corresponding to a lead of~$\sim 2.5\,\mathrm{cm}$. We may use a generalization of the Bradley-Terry model due to~\citet{luce1959}, in which the probability of competitor~$i$ winning in a field of~$j=1,\ldots, n$ is \[ \frac{p_i}{p_1+\cdots +p_n}.\] However, there is information in the whole of the finishing order, not just the first across the line. Once the winner has been identified, then the runner-up may plausibly be considered to be the winner among the remaining competitors; and so on down the finishing order. Without loss of generality, if the order of finishing were~$1,2,3,4,5,6$, then a suitable likelihood function would be, following~\cite{plackett1975}: \begin{equation}\label{competitors_1_to_6_likelihood} \frac{p_1}{p_1+p_2+p_3+p_4+p_5+p_6}\cdot \frac{p_2}{p_2+p_3+p_4+p_5+p_6}\cdot \frac{p_3}{p_3+p_4+p_5+p_6}\cdot \frac{p_4}{p_4+p_5+p_6}\cdot \frac{p_5}{p_5+p_6}\cdot \frac{p_6}{p_6} \end{equation} We may represent the result of Heat~1 as \[ \mathrm{fournier}\succ \mathrm{cabrera}\succ \mathrm{bhokanal}\succ \mathrm{saensuk}\succ \mathrm{kelmelis}\succ \mathrm{teilemb} \] (a full list of the finishing order for all~25 events is given in the package as \code{rowing.txt}). Incorporating the information from Heat~1 into a likelihood function corresponding to equation~\ref{competitors_1_to_6_likelihood} is straightforward using the \code{rankvec\_likelihood()} function: <<2016_olympics_heat1_first>>= heat1 <- c("fournier", "cabrera", "bhokanal", "saensuk", "kelmelis", "teilemb") H <- rankvec_likelihood(heat1) H @ (variable \code{heat1} shows the finishing order for Heat~1: Fournier came first, then Cabrera, etc). We can add the information from heat~2 easily: <<2016_olympics_heat1_first>>= heat2 <- c("drysdale", "molnar", "esquivel", "garcia", "khafaji", "monasterio") H <- H + rankvec_likelihood(heat2) head(H) @ Again observe that object \code{H} includes its terms in no apparent order. Although it would be possible to incorporate information from subsequent heats in a similar manner, the package includes a ready-made dataset, \code{rowing}: <>= head(rowing) # see rowing.Rd @ Finding the maximum likelihood estimate for the parameter~$p_\mathrm{banna},\ldots,p_\mathrm{zambrano}$ is straightforward using the \code{maxp()} function, provided with the package~(Figure~\ref{maxp_sculls2016}). The optimization routine has access to derivatives which means that the estimate is found very quickly. \begin{figure}[htbp] \begin{center} <>= dotchart(rowing_maxp) @ \caption{Maximum likelihood estimate for the strengths of the~32 competitors\label{maxp_sculls2016} in the Men's singles sculls in the 2016 Summer Olympics } \end{center} \end{figure} Figure~\ref{maxp_sculls2016} shows very directly that the competitor with the highest strength is Drysdale, the gold medallist for this event. The bronze and silver medallists were Synek and Martin respectively, whose estimated strengths were second and third highest in the field. \section{MasterChef Australia} MasterChef Australia is a game show in which amateur cooks compete for a title~\citep{wikipedia_masterchef2017}. From a statistical perspective the contest is interesting because the typical show format is to identify the \emph{weakest} player, who is then eliminated from the competition. Here, results from MasterChef Australia Series~6~\citep{wikipedia_masterchef_australia_series6} will be analysed; an extended discussion of the data used is given in the package at~\code{masterchef.Rd}. We wish to make inferences about the contestants' generalized Bradley-Terry strengths~$p_1,\ldots,p_n$, $\sum p_i=1$. One informative event was a team challenge in which the contestants were split randomly into two teams, red and blue: <>= team_red <- c("Jamie","Tracy","Ben","Amy","Renae","Georgia") team_blue <- c("Brent","Laura","Emelia","Colin","Kira","Tash") @ We may represent the fact that the red team won as \begin{equation}\label{redteamwon} \left\{ \mbox{Jamie}+\mbox{Tracy}+\mbox{Ben}+ \mbox{Amy}+\mbox{Renae}+\mbox{Georgia} \right\} \succ \left\{ \mbox{Brent}+\mbox{Laura}+\mbox{Emelia}+ \mbox{Colin}+\mbox{Kira}+\mbox{Tash} \right\} \end{equation} The likelihood function for the observation given in equation~\ref{redteamwon} would be \begin{equation}\label{redbluelikelihood} \frac{p_\mathrm{Jamie}+p_\mathrm{Tracy}+p_\mathrm{Ben}+p_\mathrm{Amy}+p_\mathrm{Renae}+p_\mathrm{Georgia}}{ p_\mathrm{Jamie}+p_\mathrm{Tracy}+p_\mathrm{Ben}+p_\mathrm{Amy}+p_\mathrm{Renae}+p_\mathrm{Georgia}+p_\mathrm{Brent}+p_\mathrm{Laura}+p_\mathrm{Emelia}+p_\mathrm{Colin}+p_\mathrm{Kira}+p_\mathrm{Tash}} \end{equation} To generate a likelihood function in \proglang{R}, we need to set up a \code{hyper2} object with appropriate contestants: <>= H <- hyper2(pnames = c( "Amy", "Ben", "Brent", "Colin", "Emelia", "Georgia", "Jamie", "Kira", "Laura", "Renae", "Sarah", "Tash", "Tracy")) H @ Object \code{H} is a uniform likelihood function. The package \proglang{R} idiom for incorporating likelihood from Equation~\ref{redbluelikelihood} is straightforward and natural: <>= H[team_red] <- +1 H[c(team_red,team_blue)] <- -1 H @ (Sarah did not take part). The above idiom makes it possible to define likelihood functions for observations that have a peculiar probability structure, and I give two examples below. One event involved eight competitors who were split randomly into four teams of two. The show format was specified in advance as follows: The teams were to be judged, and placed in order. The two top teams were to be declared safe, and the other two teams sent to an elimination trial from which an individual winner and loser were identified, the loser being obliged to leave the competition. The format for this event is also typical in MasterChef. The observation was that Laura and Jamie's team won, followed by Emelia and Amy, then Brent and Tracy. Ben and Renae's team came last: \begin{equation} \left\{{\mathrm{Laura} + \mathrm{Jamie}}\right\}\succ \left\{{\mathrm{Emelia} + \mathrm{Amy} }\right\}\succ \left\{{\mathrm{Brent} + \mathrm{Tracy}}\right\}\succ \left\{{\mathrm{Ben} + \mathrm{Renae}}\right\} \end{equation} A plausible likelihood function can be generated using the standard assumption~\citep{hankin2010} that the competitive strength of a team is the sum of the strengths of its members. We can then combine this assumption with the order statistic technique of~\cite{plackett1975}: \begin{align} \frac{p_\mathrm{Laura}+p_\mathrm{Jamie} }{ p_\mathrm{Laura} + p_\mathrm{Jamie} + p_\mathrm{Emelia} + p_\mathrm{Amy} + p_\mathrm{Brent} + p_\mathrm{Tracy} + p_\mathrm{Ben} + p_\mathrm{Renae} }\cdot\nonumber\\ \frac{ p_\mathrm{Emelia} + p_\mathrm{Amy} }{ p_\mathrm{Emelia} + p_\mathrm{Amy} + p_\mathrm{Brent} + p_\mathrm{Tracy} + p_\mathrm{Ben} + p_\mathrm{Renae} }\cdot \frac{ p_\mathrm{Brent} + p_\mathrm{Tracy} }{ p_\mathrm{Brent} + p_\mathrm{Tracy} + p_\mathrm{Ben} + p_\mathrm{Renae} } \end{align} We would like to incorporate information from this observation into object~\code{H}, which is a likelihood function for the two-team challenge discussed above. The corresponding \proglang{R} idiom is natural: <>= blue <- c("Laura","Jamie") # first yellow <- c("Emelia","Amy") # second green <- c("Brent","Tracy") # third red <- c("Ben","Renae") # fourth H[blue] <- 1 H[c(blue,yellow,green,red)] <- -1 H[yellow] <- 1 H[c(yellow,green,red)] <- -1 H[green] <- 1 H[c(green,red)] <- -1 H @ We may incorporate subsequent observations relating to the elimination trial among the four competitors comprising the losing two teams. The observation was that Laura won, and Renae came last, being eliminated. We might write \begin{equation}\label{renae_eliminated} \left\{ \mbox{Laura} \right\} \succ \left\{{\mbox{Brent},\mbox{Tracey},\mbox{Ben}}\right\} \succ \left\{\mbox{Renae}\right\}, \end{equation} which indicates that Laura came first, then Brent/Tracey/Ben in some order, then Renae came last. For this observation a likelihood function, following~\cite{plackett1975}, might be \begin{align} \mathcal{L}\left({p_1,p_2,p_3,p_4,p_5}\right) &= \Prob\left({ p_1\succ p_2\succ p_3\succ p_4\succ p_5 \cup p_1\succ p_2\succ p_4\succ p_3\succ p_5 \cup\ldots }\right)\\ \label{ggrlmath} &=\Prob\left({\bigcup_{\left[{abc}\right]} p_1\succ p_a\succ p_b\succ p_c\succ p_5 }\right)\\ \nonumber &= \frac{p_1}{p_1+p_2+p_3+p_4+p_5}\cdot\frac{p_2}{p_2+p_3+p_4+p_5}\cdot\frac{p_3}{p_3+p_4+p_5}\cdot\frac{p_4}{p_4+p_5}\\ \nonumber &{\qquad+}\frac{p_1}{p_1+p_2+p_4+p_3+p_5}\cdot\frac{p_2}{p_2+p_4+p_3+p_5}\cdot\frac{p_4}{p_4+p_3+p_5}\cdot\frac{p_3}{p_3+p_5}\\ \nonumber &{\qquad{}+}\frac{p_1}{p_1+p_3+p_2+p_4+p_5}\cdot\frac{p_3}{p_3+p_2+p_4+p_5}\cdot\frac{p_2}{p_2+p_4+p_5}\cdot\frac{p_4}{p_4+p_5}\\ \nonumber &{\qquad{}+}\cdots \end{align} where Laura's strength is shown as~$p_1$ etc for brevity. The \proglang{R} idiom is as follows: <<>>= L <- ggrl(H, winner = "Laura", btm4 = c("Brent", "Tracy","Ben"), eliminated = "Renae") @ Arguments to \code{ggrl()} are disjoint subsets of the players, the subsets themselves being passed in competition order from best to worst. Object \code{L} includes information from the team challenge (via first argument \code{H}) and the elimination results. It is a list of length~$3!=6$ objects of class \code{hyper2}, each of which gives a \citeauthor{luce1959} likelihood function for a consistent total ordering of the competitors. A final example (taken from MasterChef series 8, week 10) is given as a generalization of the \citeauthor{luce1959} likelihood. The format was as follows. Eight contestants were split randomly into four teams of two, the top two teams being declared safe. Note that the likelihood interpretation differs from the previous team challenge, in which the observation was an unambiguous team ranking: here, there is only a partial ranking of the teams and one might expect this observation to be less informative. Without loss of generality, the result may be represented as \begin{equation} \left\{{p_1+p_2,p_3+p_4}\right\}\succ \left\{{p_5+p_6,p_7+p_8}\right\} \end{equation} and a likelihood function on~$p_1,\ldots p_8$ for this observation might be \begin{align} \mathcal{L}\left( {p_1,\ldots, p_8} \right) &= \Prob\left({ \left\{ {p_1+p_2} \right\}\succ \left\{ {p_3+p_4} \right\}\succ \left\{ {p_5+p_6} \right\}\succ \left\{ {p_7+p_8} \right\} \cup\vphantom{\bigcup} }\right. \nonumber\\ &\qquad \left.{ \left\{ {p_1+p_2} \right\}\succ \left\{ {p_3+p_4} \right\}\succ \left\{ {p_7+p_8} \right\}\succ \left\{ {p_5+p_6} \right\}\cup }\right.\nonumber\\ &\qquad \left.{ \left\{ {p_3+p_4} \right\}\succ \left\{ {p_1+p_2} \right\}\succ \left\{ {p_5+p_6} \right\}\succ \left\{ {p_7+p_8} \right\}\cup }\right.\nonumber\\ &\qquad \left.{ \left\{ {p_3+p_4} \right\}\succ \left\{ {p_1+p_2} \right\}\succ \left\{ {p_5+p_6} \right\}\succ \left\{ {p_7+p_8} \right\}\vphantom{\bigcup} }\right)\\ &=\frac{p_1+p_2}{p_1+p_2+p_3+p_4+p_5+p_6+p_7+p_8}\cdot \frac{p_3+p_4}{p_3+p_4+p_5+p_6+p_7+p_8}\cdot\frac{p_5+p_6}{p_5+p_6+p_7+p_8}\nonumber\\ &{}\qquad+\cdots+\nonumber\\ &{}\frac{p_3+p_4}{p_3+p_4+p_1+p_2+p_7+p_8+p_5+p_6}\cdot \frac{p_1+p_2}{p_3+p_4+p_7+p_8+p_5+p_6}\cdot\frac{p_7+p_8}{p_7+p_8+p_5+p_6}\nonumber\\ \nonumber \end{align} % \end{align} \subsection{Maximum likelihood estimation} The package provides an overall likelihood function for all informative judgements in the series on the final 13 players in object \code{masterchef\_series6}, documented at \code{masterchef.Rd}. We may assess a number of related hypotheses using the package. The first step is to calculate the likelihood for the hypothesis that all players are of equal strength: <>= data("masterchef") n <- 13 # 13 players equal_strengths <- rep(1/n,n-1) like_series(equal_strengths, masterchef) # see masterchef.Rd @ The strengths of the 13 players may be estimated using standard maximum likelihood techniques. This requires constrained optimization in order to prevent the search from passing through inadmissible points in p-space: <>= UI <- rbind(diag(n-1),-1) # p_i >= 0 CI <- c(rep(0,n-1),-1) # p_1+...+p_{n-1} <= 1 constrOptim( # maxp() does not work for masterchef_series6 theta = equal_strengths, # starting point for optimization f = function(p){-like_series(p,masterchef)}, # see masterchef.Rd ui=UI, ci=CI, grad=NULL) @ \begin{figure}[htbp] \begin{center} <>= masterchef_maxp dotchart(masterchef_maxp) @ \caption{Maximum likelihood estimate for the strengths of the top~13 competitors\label{masterchef6} in Series~6 of \emph{MasterChef Australia} } \end{center} \end{figure} The resulting maximum likelihood estimate, \code{masterchef_maxp} in the package, is shown pictorially in Figure~\ref{masterchef6}. The support at the precalculated evaluate is <>= like_series(indep(masterchef_maxp), masterchef) @ and this allows us to test the hypothesis of equal player strengths: by Wilks's theorem, $-2\log\Lambda$ has an asymptotic null distribution of~$\chi^2_{12}$. This corresponds to a $p$-value of <>= pchisq(2*(78.7-66.2),df=12,lower.tail=FALSE) @ showing that the observations do constitute evidence for differential player strengths. Figure~\ref{masterchef6} suggests that Laura, the runner-up, is actually a stronger competitor than the winner, Brent. We may assess this statistically by finding the maximum likelihood for~${\mathbf p}$, subject to the constraint that~$p_\mathrm{Laura}\leqslant p_\mathrm{Brent}$: <>= UI <- rbind(UI,c(0,0,1,0,0,0,0,0,-1,0,0,0,0)) # Brent >= Laura CI <- c(CI,0) ans2 <- constrOptim( # maxp() does not work for masterchef_series6 theta = equal_strengths, f = function(p){-like_series(p,masterchef_series6)}, # see masterchef.Rd grad=NULL, ui = UI, ci=CI) @ Object \code{maxp_masterchef6_constrained} in the package is the result of the above optimization, at which point the likelihood is <<>>= like_series(indep(masterchef_constrained_maxp), masterchef) @ The two estimates differ by about~$1.18$, less than the two-units-of-support criterion of~\citet{edwards1992}; alternatively, one may observe that the likelihood ratio is not in the tail region of its asymptotic distribution ($\chi^2_1$) as the $p$-value is about 0.12. This shows that there is no strong evidence for Laura's competitive strength being higher than that of Brent. Similar techniques can be used to give a profile likelihood function; the resulting support interval for Laura's strength is~$\left[{0.145,0.465}\right]$, which does not include~$\frac{1}{13}\simeq 0.077$, the mean player strength. However, further work would be needed to make statistically robust inferences from these findings. Even given equal competitive ability, one would expect the player with the highest estimated Bradley-Terry strength to have an elevated support interval; and it is not clear to what extent Wilks, being asymptotic, is relevant to the data at hand. \section{Conclusions} Several generalization of Bradley-Terry strengths are appropriate to describe competitive situations in which order statistics are sufficient. The \code{hyper2} package is introduced, providing a suite of functionality for generalizations of the partial rank analysis of~\cite{critchlow1985}. The software admits natural \proglang{R} idiom for translating commonly occurring observations into a likelihood function. The package is used to calculate maximum likelihood estimates for generalized Bradley-Terry strengths in two competitive situations: Olympic rowing, and \emph{MasterChef Australia}. The estimates for the competitors' strengths are plausible; and several meaningful statistical hypotheses are assessed quantitatively. \bibliography{hyper2} \end{document}