---
title: "Triangle Distribution Math"
author: "Rob Carnell"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Triangle Distribution Math}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
header-includes:
  - \usepackage{amsmath}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Triangle Notation

```{r triangle_drawing, echo=FALSE, fig.width=5, fig.height=4}
plot(c(-1,5), c(0,4), type = "n", xlab = "x", ylab = "f(x)", axes = FALSE)
lines(c(0,1), c(0,3), col = "red")
lines(c(1,4), c(3,0), col = "red")
axis(2, at = c(0,3), labels = c("0","h"), las = 2)
axis(1, at = c(0,1,4), labels = c("a", "c", "b"))
box()
```

- $a$ = minimum
- $b$ = maximum
- $c$ = mode
- $h$ = density at the mode = $\frac{2}{b-a}$

## Triangle PDF

$$f(x) =
\left\{
	\begin{array}{ll}
		\frac{h}{c-a}(x-a)  & \mbox{if } a \leq x \leq c \\
		\frac{h}{c-b}(x-b) & \mbox{if } c < x \leq b \\
		0 & \mbox{otherwise}
	\end{array}
\right.\ \ \ \ (1)$$

Integrating the pdf in equation (1) to solve for h

$$\int f(x) dx = 1$$

$$\frac{h}{c-a}\int_{a}^{c} (x-a) dx + \frac{h}{c-b} \int_{c}^{b} (x-b) dx = \frac{h(b-a)}{2}$$

$$h=\frac{2}{b-a}\ \ \ \ (2)$$

Substituting back into equation (1),

$$f(x) =
\left\{
	\begin{array}{ll}
		\frac{2}{(b-a)(c-a)}(x-a)  & \mbox{if } a \leq x \leq c \\
		\frac{2}{(b-a)(c-b)}(x-b) & \mbox{if } c < x \leq b \\
		0 & \mbox{otherwise}
	\end{array}
\right.\ \ \ \ (3)$$

## Triangle Mean

Integrating equation (3) to find $E(x)$,

$$E(X)=\int xf(x)dx = \frac{h}{c-a}\int_a^c (x^2-ax) dx + 
\frac{h}{c-b}\int_c^b (x^2-bx) dx$$

$$=\frac{a+b+c}{3}\ \ \ \ (3)$$

## Triangle Variance

$$V(X) = E(X^2) - \big(E(X)\big)^2 = \int x^2f(x)dx- \bigg(\frac{a+b+c}{3}\bigg)^2$$

$$=\frac{h}{c-a}\int_{a}^{c} x^2(x-a) dx + \frac{h}{c-b} \int_{c}^{b} x^2(x-b) dx- \bigg(\frac{a+b+c}{3}\bigg)^2$$

$$=\frac{a^2+b^2+c^2-ab-ac-bc}{18}$$

## Logarithmic Triangle distribution

Define:

$$a_l=\log_{\phi}(a),\ \ b_l=\log_{\phi}(b),\ \ c_l=\log_{\phi}(c),\ \ h=\frac{2}{b_l-a_l}, \ \ \phi = \mbox{log base}$$

$$f(z) = 
\left\{
  \begin{array}{ll}
    \frac{h}{c_l - a_l}(z - a_l) & \mbox{if } a_l \leq z \leq c_l \\
    \frac{h}{c_l - b_l}(z - b_l) & \mbox{if } c_l < z \leq b_l \\
    0 & \mbox{otherwise}
  \end{array}
\right.\ \ \ \ (4)$$

However,

$$E({\phi}^z) \neq {\phi}^{E(z)}\ \ \ \ (5)$$

Therefore, transforming...

$$Y={\phi}^Z$$

$$Z=\log_{\phi}(Y)$$

$$w(y)=\log_{\phi}(y)$$

$$w'(y)=\frac{dz}{dy} = \frac{1}{y\log({\phi})}$$

$$g(y)=f(w(y))w'(y)$$

$$g(y) = 
\left\{
  \begin{array}{ll}
    \frac{2}{(c_l-a_l)(b_l-a_l)\log({\phi})}\frac{log_{\phi}(y) - a_l}{y} & \mbox{if } 0 < a \leq y \leq c \\
    \frac{2}{(c_l-b_l)(b_l-a_l)\log({\phi})}\frac{log_{\phi}(y) - b_l}{y} & \mbox{if } c < y \leq b \\
    0 & \mbox{otherwise}
  \end{array}
\right.\ \ \ \ (5)$$

Define:

$$\beta_1=\frac{2}{(c_l-a_l)(b_l-a_l)}$$

$$\beta_2=\frac{2}{(c_l-b_l)(b_l-a_l)}$$

Finding the CDF, 

$$G(y)=\int_{-\infty}^y g(y)dy$$

$$\mbox{for}\ a \leq y \leq c,\ \ G(y) = \frac{\beta_1}{\log({\phi})} \int_a^y \frac{\log(y)}{y\log({\phi})}-\frac{a_l}{y}dy$$

$$=\beta_1 \bigg[\frac{\log_{\phi}^2(y)}{2} - a_l \log_{\phi}(y) - \frac{a_l^2}{2} + a_l^2\bigg]$$

$$\mbox{for}\ c < y \leq b,\ \ G(y) = G(c) + \frac{\beta_2}{\log({\phi})} \int_c^y \frac{\log(y)}{y\log({\phi})} - \frac{b_l}{y}dy$$

$$=G(c) + \beta_2 \bigg[\frac{\log_{\phi}^2(y)}{2} - b_l \log_{\phi}(y) - \frac{c_l^2}{2} + b_l c_l\bigg]$$

Checking that the CDF is 1 at b,

$$G(b) = \frac{c_l^2 - 2a_l c_l + a_l^2}{(c_l-a_l)(b_l-a_l)} + \frac{-b_l^2-c_l^2+2b_lc_l}{(c_l-b_l)(b_l-a_l)}$$

$$= \frac{c_l-a_l}{b_l-a_l} + \frac{-(c_l-b_l)}{b_l-a_l} = 1$$

Now calculating $E(y)$,

$$E(y) = \int y\ g(y)\ dy$$

$$=\frac{\beta_1}{\log({\phi})} \int_a^c \bigg[\frac{\log(y)}{\log({\phi})} - a_l\bigg]dy + 
\frac{\beta_2}{\log({\phi})} \int_c^b \bigg[\frac{\log(y)}{\log({\phi})} - b_l\bigg]dy$$

$$=\frac{c\beta_1}{\log^2({\phi})} \bigg[\log(c) - 1 - \log(a) + \frac{a}{c} \bigg] + \frac{c\beta_2}{\log^2({\phi})} \bigg[\frac{-b}{c} - \log(c) + 1 + \log(b) \bigg]$$

## Method of Moments

$$E(x) = \frac{a + b + c}{3}$$

$$c = 3E(x) - a - b = 3E(x) - \min(x) - \max(x)$$

## Maximum Likelihood Estimation

The procedure for maximum likelihood estimation involves maximizing the 
likelihood with respect to $c$ for a fixed $a$ and $b$,
followed by minimizing the negative log likelihood with respect to $a$ and $b$
for a fixed $c$.  

###  Maximizing the Likelihood with respect to $c$

This discussion follows the results from [Samuel Kotz and Johan Rene van Dorp. Beyond Beta](https://doi.org/10.1142/5720)

For the purposes of this section, with a fixed $a$ and $b$, the sample can be easily
rescaled to $a=0$ and $b=1$.  This section will proceed on $[0,1]$ with the mode at $0 \le c \le 1$

$$w(x) = 
\left\{
  \begin{array}{ll}
    \frac{2x}{c} & \mbox{if } 0 \le x \lt c \\
    \frac{2(1-x)}{1-c} & \mbox{if } c \le x \leq 1 \\
    0 & \mbox{otherwise}
  \end{array}
\right.$$

$$L(x|c) = \prod_{i}^{n} w(x|c)$$

Assume that the sample is ordered into order statistics $X_{(1)} \lt \dots \lt X_{(n)}$.  Also, note
that $X_{(r)} \le c \lt X_{(r+1)}$.  In other words, the mode falls between the $r^{th}$ and $r+1$ order statistics.

$$L(x|c) = \prod_{i=1}^{r} \frac{2x_{(i)}}{c} \prod_{i=r+1}^{n} \frac{2(1-x_{(i)})}{1-c} = \frac{2^n \prod_{i=1}^{r} x_{(i)} \prod_{i=r+1}^{n} (1-x_{(i)})}{c^r(1-c)^{n-r}}$$

To maximize the likelihood, we can first maximize with respect to $r$ and then locate $c$ between
the $r^{th}$ and $r+1$ order statistics.  For notation purposes, also define $X_{(0)} = 0$ and $X_{(n+1)} = 1$.

$$\large \max_{0 \le c \le 1} L(x|c) = \max_{r \ \epsilon \ (0,\dots,n)} \ \ \max_{x_{(r)} \le c \le x_{(r+1)}} \ \ L(x|c)$$

#### Case 1.  $c$ is between the first and second to last order statistic $r \ \epsilon \ (1, \dots, n-1)$

Noticing that maximizing the likelihood is equivalent to minimizing the denominator:

$$\large \max L(x|c) = \max_{r \ \epsilon \ (1,\dots,n-1)} \ \ \min_{x_{(r)} \le c \le x_{(r+1)}} \ \ c^r(1-c)^{n-r}$$

Since $c^r(1-c)^{n-r}$ is unimodal with respect to $c$, it should be sufficient to test the end points
of an interval to find the minimum on the interval

$$\large = \max_{r \ \epsilon \ (1,\dots,n-1)} \ \ \min_{c \ \epsilon \ (x_{(r)},\ \ x_{(r+1)})} \ \ c^r(1-c)^{n-r}$$

Therefore, for this case, it is sufficient to test the likelihood using $c$ at each of the sampled points and find the largest.

##### Side note on $z=c^r(1-c)^{n-r}$ being unimodal

$$\frac{dz}{dc} = rc^{(r-1)}(1-c)^{n-r} + c^r(n-r)(1-c)^{n-r-1}(-1) = c^{(r-1)}(1-c)^{n-r-1}(r - cn)$$

$\frac{dz}{dc} = 0$ at $c=0,\ 1,\ \frac{r}{n}$.  At $0 < c < \frac{r}{n}$, $z$ is positive, and at $\frac{r}{n} < c < 1$, $z$ is negative.
Therefore, $z$ is unimodal on $(0,1)$.

#### Case 2.  $c$ is between 0 and the first order statistic $r = 0$

$$\large \max L(x|c) = \max_{0 \le c \le x_{(1)}} \prod_{i=1}^{n} \frac{1-x_{(i)}}{1-c} = \prod_{i=1}^{n} \frac{1-x_{(i)}}{1-x_{(1)}}$$

Choosing the largest endpoint in the interval, creates the smallest denominator, and the largest likelihood.

Therefore, for this case, it is sufficient to test the likelihood using $c$ at the first sampled point.

#### Case 3.  $c$ is between the last order statistic $r = n$ and 1

$$\large \max L(x|c) = \max_{x_{(n)} \le c \le 1} \prod_{i=1}^{n} \frac{x_{(i)}}{c} = \prod_{i=1}^{n} \frac{x_{(i)}}{x_{(n)}}$$

Choosing the smallest option in the denominator creates the largest likelihood.  Again, it is sufficient
to test the likelihood using $c$ at the largest sample point.

#### All Cases

For all cases, it is sufficient to compute the sample likelihood using $c$ equal
to each of the samples, and choosing the largest likelihood from the $n$ options
to find the corresponding $c$.  This calculation is performed with a fixed $a$ and $b$,
so the test must be performed iteratively as $a$ and $b$ are separately optimized.

### Negative Log Likelihood

$$nLL = -\log(L) = -\log\left(\prod_i^n f(x_i)\right)$$

$$ = - \sum_i^n \log\left(f(x_i)\right) = - \sum_{i: \ a \le x_i \lt c}^{n_1} \log\left(f(x_i)\right) - \sum_{i: \ c \le x_i \le b}^{n_2} \log\left(f(x_i)\right)$$

where $n = n_1 + n_2$

#### Case 1 $a = c \lt b$

$$ nLL = - \sum_{i}^{n} \log(2) + \log(b-x_i) - \log(b-a) - \log(b-c)$$

$$ = -n\log(2) + n\log(b-a) + n \log(b-c) - \sum_{i}^{n} \log(b-x_i)$$

#### Case 2 $a \lt c = b$

$$ = - \sum_{i}^{n} \log(2) + \log(x_i - a) - \log(b-a) - \log(c-a) $$

$$ = -n\log(2) + n\log(b-a) + n\log(c-a) - \sum_{i}^{n} \log(x_i - a)$$

#### Case 3 a \lt c \lt b

$$ = - \sum_{i: \ a \lt x_i \lt c}^{n_1} \log(2) + \log(x_i - a) - \log(b-a) - \log(c-a) - \sum_{i: \ c \le x_i \lt b}^{n_2} \log(2) + \log(b-x_i) - \log(b-a) - \log(b-c)$$

$$ = -n\log(2) + n\log(b-a) + n_1\log(c-a) + n_2 \log(b-c) - \sum_{i: \ a \lt x_i \lt c}^{n_1} \log(x_i - a) - \sum_{i: \ c \le x_i \lt b}^{n_2} \log(b-x_i)$$

### Gradient of the negative Log Likelihood Given $c$:

The negative log likelihood is not differentiable with respect to $c$ because the limits of the sum ($n_1$ and $n_2$) are
functions of $c$.  There the gradient and hessian are derived as if $c$ is fixed.

#### Case 1 $a = c \lt b$

$$\frac{\partial nLL}{\partial a} = - \frac{n}{b-a}$$

$$\frac{\partial nLL}{\partial b} = \frac{n}{b-a} + \frac{n}{b-c} - \sum_i^{n} \frac{1}{b-x_i}$$

#### Case 2 $a \lt c = b$

$$\frac{\partial nLL}{\partial a} = - \frac{n}{b-a} - \frac{n}{c-a} + \sum_i^{n} \frac{1}{x_i - a}$$

$$\frac{\partial nLL}{\partial b} = \frac{n}{b-a}$$

#### Case 3 a \lt c \lt b

$$\frac{\partial nLL}{\partial a} = - \frac{n}{b-a} - \frac{n_1}{c-a} + \sum_i^{n_1} \frac{1}{x_i - a}$$

$$\frac{\partial nLL}{\partial b} = \frac{n}{b-a} + \frac{n_2}{b-c} - \sum_i^{n_2} \frac{1}{b-x_i}$$

### Hessian of the negative Log Likelihood Given $c$:

#### Case 1 $a = c \lt b$

$$\frac{\partial^2nLL}{\partial a^2} = - \frac{n}{(b-a)^2}$$

$$\frac{\partial^2 nLL}{\partial b^2} = -\frac{n}{(b-a)^2} - \frac{n}{(b-c)^2} + \sum_i^{n} \frac{1}{(b-x_i)^2}$$

$$\frac{\partial^2 nLL}{\partial a\partial b} = \frac{\partial^2 nLL}{\partial b\partial a} = - \frac{n}{(b-a)^2}$$

#### Case 2 $a \lt c = b$

$$\frac{\partial^2 nLL}{\partial a^2} = - \frac{n}{(b-a)^2} - \frac{n}{(c-a)^2} + \sum_i^{n} \frac{1}{(x_i - a)^2}$$

$$\frac{\partial^2 nLL}{\partial b^2} = - \frac{n}{(b-a)^2}$$

$$\frac{\partial^2 nLL}{\partial a\partial b} = \frac{\partial^2 nLL}{\partial b\partial a} = - \frac{n}{(b-a)^2}$$

#### Case 3 $a \lt c \lt b$

$$\frac{\partial^2 nLL}{\partial a^2} = - \frac{n}{(b-a)^2} - \frac{n_1}{(c-a)^2} + \sum_i^{n_1} \frac{1}{(x_i - a)^2}$$


$$\frac{\partial ^2 nLL}{\partial b^2} = -\frac{n}{(b-a)^2} - \frac{n_2}{(b-c)^2} + \sum_i^{n_2} \frac{1}{(b-x_i)^2}$$

$$\frac{\partial ^2 nLL}{\partial a\partial b} = \frac{\partial ^2 nLL}{\partial b\partial a} = - \frac{n}{(b-a)^2}$$

### MLE Variance Co-variance

For the optimization of $(a,b)$ given $c$, we can use the inverse of the hessian
of the negative log likelihood for an estimate of the covariance matrix of $\hat{a}$ 
and $\hat{b}$.  For the variance in $\hat{c}$, we use the variance of the $r^{th}$ order
statistic which corresponds to $c$.  The covariance of $(a,b)$ and $c$ is not
computed because the negative log likelihood is not differentiable with respect to $c$.

Let $H$ denote the Hessian matrix, and let $H^{-1}[1,1]$ be the $V(\hat{a})$,
$H^{-1}[2,2]$ be the $V(\hat{b})$, and $H^{-1}[1,2] = H^{-1}[2,1]$ be the 
$Cov(\hat{a}, \hat{b})$.  Then,

$$ V([\hat{a}, \hat{b}, \hat{c}]) =   
\begin{bmatrix} 
   H^{-1}[1,1] & H^{-1}[1,2] & 0  \\
   H^{-1}[2,1] & H^{-1}[2,2] & 0  \\
   0 & 0 & V(\hat{c})  \\
\end{bmatrix} 
$$

#### $r^{th}$ order statistic

$$f(x_{(r)}) = \frac{n!}{(r-1)!(n-r)!} f(x) [F(x)]^{(r-1)}[1-F(x)]^{(n-r)}$$

A closed form solution to $V(X_{(r)})$ is not easily obtainable for the triangle,
so numerical integration is used with $f(x)$ as `dtriangle` and $F(x)$ as `ptriangle`.

$$V\left(X_{(r)}\right) = \int_a^b x^2 f(x_{(r)}) dx - \left[\int_a^b x f(x_{(r)}) dx \right]^2$$