---
title: "Boolean Algebra of Hypothesis Tests"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Boolean Algebra of Hypothesis Tests}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(hypothesize)
```

## The Idea

What if hypothesis tests formed an algebra?

In SICP terms, we want three things: **primitive expressions** (the tests
themselves), **means of combination** (ways to build compound tests from
simpler ones), and **means of abstraction** (ways to transform and
generalize tests). The `hypothesize` package provides all three, and in
v0.11.0, the combinators form a genuine Boolean algebra.

This vignette develops the algebra from first principles.

## The Trinity: Three Ways to Test a Hypothesis

Statistical theory gives us three fundamental likelihood-based tests. They
approach the same question from different angles and are asymptotically
equivalent, but each requires different information:

| Test | What it needs | When to use |
|------|---------------|-------------|
| **Wald** | MLE + standard error | You already fit the full model |
| **LRT** | Log-likelihoods of both models | You can fit both models |
| **Score** | Score + Fisher info at null only | The null model is cheap, the alternative is expensive |

The score test completes the trinity:

```{r}
# Setup: testing whether a Poisson rate equals 5
# Observed: 60 events in 10 time units
# MLE: lambda_hat = 6, se = sqrt(6/10)

# Wald: evaluate at the MLE
w <- wald_test(estimate = 6, se = sqrt(6/10), null_value = 5)

# Score: evaluate at the null
# Score function at lambda0=5: n*xbar/lambda0 - n = 60/5 - 10 = 2
# Fisher info at lambda0=5: n/lambda0 = 10/5 = 2
s <- score_test(score = 2, fisher_info = 2)

# LRT: evaluate at both
# loglik(lambda) = sum(x)*log(lambda) - n*lambda
ll_null <- 60 * log(5) - 10 * 5
ll_alt  <- 60 * log(6) - 10 * 6
l <- lrt(null_loglik = ll_null, alt_loglik = ll_alt, dof = 1)

# All three give similar (not identical) answers:
data.frame(
  test = c("Wald", "Score", "LRT"),
  statistic = c(test_stat(w), test_stat(s), test_stat(l)),
  p_value = round(c(pval(w), pval(s), pval(l)), 4)
)
```

The three tests agree asymptotically but can diverge in finite samples.
This is expected and well-understood — the choice between them depends
on what information is available, not which gives the "right" answer.

## Boolean Algebra: AND, OR, NOT

The heart of v0.11.0 is a Boolean algebra over hypothesis tests. Just as
propositional logic has AND, OR, and NOT, we can combine tests with the
same operations.

### NOT: The Complement

The complement of a test rejects when the original *fails* to reject:

```{r}
# A significant Wald test
w <- wald_test(estimate = 3.0, se = 1.0)
pval(w)                    # small — rejects H0

# Its complement
cw <- complement_test(w)
pval(cw)                   # large — fails to reject
```

The algebra is clean: double complement is identity.

```{r}
pval(complement_test(complement_test(w)))
pval(w)
```

And the class hierarchy is preserved — a complemented Wald test is
simultaneously a `complemented_test`, a `wald_test`, and a
`hypothesis_test`:

```{r}
class(cw)
```

**Connection to equivalence testing.** If a Wald test asks "is the
parameter different from the null?", its complement asks "is the parameter
*close to* the null?" This is exactly the logic of equivalence testing.
In bioequivalence studies, you don't want to show a drug is *different* —
you want to show it's *the same* (within tolerance).

### AND: The Intersection

The intersection test rejects only when *all* component tests reject:

```{r}
# Both must be significant
intersection_test(0.01, 0.03)          # both < 0.05
is_significant_at(intersection_test(0.01, 0.03), 0.05)

# One non-significant component blocks rejection
intersection_test(0.01, 0.80)
is_significant_at(intersection_test(0.01, 0.80), 0.05)
```

The p-value is simply $\max(p_1, \ldots, p_k)$ — the intersection rejects
at level $\alpha$ if and only if every component rejects, which happens if
and only if the largest p-value is below $\alpha$.

This is the intersection-union test (IUT; Berger, 1982). No multiplicity
correction is needed — the max operation is inherently conservative.

**Use case: bioequivalence.** Bioequivalence requires showing a drug's
effect is both "not too low" AND "not too high":

```{r}
# Drug effect estimate: 1.05, with SE = 0.08
# Must show: effect > 0.8 AND effect < 1.25
t_lower <- wald_test(estimate = 1.05, se = 0.08, null_value = 0.8)
t_upper <- wald_test(estimate = 1.05, se = 0.08, null_value = 1.25)

bioequiv <- intersection_test(t_lower, t_upper)
is_significant_at(bioequiv, 0.05)
```

### OR: The Union (via De Morgan)

The union test rejects when *any* component test rejects. But here's the
beautiful part: we don't implement it directly. Instead, we *define* it
through De Morgan's law:

$$\text{OR}(t_1, \ldots, t_k) = \text{NOT}(\text{AND}(\text{NOT}(t_1), \ldots, \text{NOT}(t_k)))$$

The implementation IS the theorem:

```{r, eval=FALSE}
# This is (essentially) the actual source code of union_test:
union_test <- function(...) {
  complement_test(
    do.call(intersection_test, lapply(tests, complement_test))
  )
}
```

Let's verify De Morgan's law holds:

```{r}
# Three p-values
p1 <- 0.03; p2 <- 0.15; p3 <- 0.07

# Direct union
ut <- union_test(p1, p2, p3)

# Manual De Morgan construction
tests <- list(
  hypothesis_test(stat = 0, p.value = p1, dof = 1),
  hypothesis_test(stat = 0, p.value = p2, dof = 1),
  hypothesis_test(stat = 0, p.value = p3, dof = 1)
)
dm <- complement_test(
  do.call(intersection_test, lapply(tests, complement_test))
)

# Identical
c(union = pval(ut), de_morgan = pval(dm))
```

The resulting p-value is $\min(p_1, \ldots, p_k)$. This falls out
algebraically: complement maps $p \to 1-p$, the intersection takes the
max, and the outer complement maps back. So:

$$1 - \max(1-p_1, \ldots, 1-p_k) = \min(p_1, \ldots, p_k)$$

**A note on multiplicity.** The uncorrected $\min(p)$ is anti-conservative.
If you need to control the family-wise error rate, apply `adjust_pval()`
to the component tests before combining. The raw union test is appropriate
for screening or exploratory analysis where you want to detect *any*
signal.

### The Full Algebra

The three operations satisfy the axioms of a Boolean algebra:

| Property | Statement |
|----------|-----------|
| Involution | `complement(complement(t)) = t` |
| De Morgan 1 | `union(a, b) = complement(intersection(complement(a), complement(b)))` |
| De Morgan 2 | `intersection(a, b) = complement(union(complement(a), complement(b)))` |

And they compose with the rest of the package — intersection and union
tests are `hypothesis_test` objects, so they work with `fisher_combine()`,
`adjust_pval()`, `pval()`, and everything else:

```{r}
# Compose: take the union, then combine with another test
ut <- union_test(0.01, 0.05)
combined <- fisher_combine(ut, wald_test(estimate = 2, se = 1))
pval(combined)
```

## Test-Confidence Duality

There is a deep connection between hypothesis tests and confidence
intervals: a $(1-\alpha)$ confidence set contains exactly those null values
that would *not* be rejected at level $\alpha$. The `invert_test()`
function makes this duality operational.

It takes a test constructor — any function that maps a null value to a
`hypothesis_test` — and returns the set of non-rejected values:

```{r}
# Invert a Wald test into a confidence interval
cs <- invert_test(
  test_fn = function(theta) wald_test(estimate = 2.5, se = 0.8, null_value = theta),
  grid = seq(0, 5, by = 0.01)
)
cs
```

This agrees with the analytical confidence interval (up to grid
resolution):

```{r}
# Analytical CI from the Wald test
confint(wald_test(estimate = 2.5, se = 0.8))

# Numerical CI from test inversion
c(lower = lower(cs), upper = upper(cs))
```

The power of `invert_test()` is generality. The analytical `confint()`
methods only work for specific test types (Wald, z-test). But `invert_test()`
works with *any* test — including user-defined ones:

```{r}
# A custom test: reject if |observation - theta| > 2
# (This has no analytical CI, but invert_test handles it)
my_test <- function(theta) {
  obs <- 5.0
  hypothesis_test(
    stat = (obs - theta)^2,
    p.value = if (abs(obs - theta) > 2) 0.01 else 0.5,
    dof = 1
  )
}

cs <- invert_test(my_test, grid = seq(0, 10, by = 0.1))
c(lower = lower(cs), upper = upper(cs))
```

This is the most SICP function in the package. It takes a *function* as
input (the test constructor) and returns a structured result (the confidence
set). It works because of the abstraction barrier: `invert_test()` only
needs `pval()`, so any test that implements the `hypothesis_test` interface
gets confidence sets for free.

## Multivariate Tests

Both `wald_test()` and `score_test()` are polymorphic — they accept scalar
or vector inputs and dispatch accordingly.

The multivariate Wald test uses the quadratic form:

$$W = (\hat\theta - \theta_0)^\top \Sigma^{-1} (\hat\theta - \theta_0) \sim \chi^2_k$$

```{r}
# Test two parameters jointly
est <- c(2.0, 3.0)
V <- matrix(c(1.0, 0.3, 0.3, 1.0), 2, 2)  # correlated

w_joint <- wald_test(estimate = est, vcov = V, null_value = c(0, 0))
w_joint
```

When parameters are independent (diagonal covariance), the multivariate
statistic decomposes into a sum of univariate statistics:

```{r}
# Independent parameters: diagonal vcov
se1 <- 0.8; se2 <- 1.2
V_diag <- diag(c(se1^2, se2^2))

w_multi <- wald_test(estimate = c(2.0, 3.0), vcov = V_diag)
w1 <- wald_test(estimate = 2.0, se = se1)
w2 <- wald_test(estimate = 3.0, se = se2)

# The joint statistic equals the sum of individual statistics
c(joint = test_stat(w_multi), sum = test_stat(w1) + test_stat(w2))
```

This decomposition breaks down when parameters are correlated — that's
when you need the full `vcov` matrix. The off-diagonal entries capture the
dependence structure that individual standard errors miss.

## The Design as an Algebra

Stepping back, the package implements a small algebra with clear structure:

```
Primitives:     z_test, wald_test, lrt, score_test
Combinators:    fisher_combine, intersection_test, union_test
Transformers:   adjust_pval, complement_test
Duality:        invert_test, confint
Accessors:      pval, test_stat, dof, is_significant_at, lower, upper
```

Each layer corresponds to an SICP principle:

| Layer | SICP Principle | What it does |
|-------|----------------|--------------|
| Primitives | Primitive expressions | The four ways to test a hypothesis |
| Combinators | Means of combination | Build compound tests (closure property) |
| Transformers | Means of abstraction | Transform tests into new tests |
| Duality | Power of the abstraction barrier | Any test becomes a confidence set |

The package has 18 exported functions. That's deliberately small. The goal
is not to provide every test, but to provide the algebraic primitives from
which users can compose arbitrary test logic. A well-chosen basis is more
powerful than an exhaustive catalog.