With analysis finally online, back to finishing ms.

Residual detail is how to deal with the analysis of the contingency table of expression patterns for each species. I posted this question to CrossValidated:

```
For a square matrix, is it appropriate to use a chi-squared distribution when each level of the variables are assumed to have the same overall frequency?
Specifically, I'm analyzing a dataset of the number of genes that have increased expression in an experimental treatment in two related species. My data look like this, with species 1 on the columns and species 2 on the rows:
<pre><code> Low Intermediate High
Low 2594 163 405
Intermediate 1350 558 155
High 467 65 322
</code></pre>
I *a priori* expect each class to have been the same in the common ancestor of the species (off-diagonals = 0). That is:
<pre><code> Low Intermediate High
Low 3786 0 0
Intermediate 0 1425 0
High 0 0 868
</code></pre>
My question is which of the off-diagonal cells have diverged more than expected by chance.
As a simple first pass, I've modified the standard `chisq.test` (in R) to use the *overall* total for each class (Low, Intermediate, High) rather than the marginal total for each class (assuming species independent...which they aren't).
<!-- language: lang-r -->
# data
d <- matrix(c(2594L, 1350L, 467L, 163L, 558L, 65L, 405L, 155L, 322L), nrow=3, ncol=3)
# row and column sums
rs <- rowSums(d)
cs <- colSums(d)
# grand mean for each class
gm <- (rs + cs) / sum(d * 2)
Ec <- outer(gm, gm, "*") * sum(d)
where `Ec` is the Expected value for each cell using the grand mean for each class.
Is it reasonable to use a chisq distribution to determine if the observed values deviate from the expected values by more than chance?
<!-- language: lang-r -->
Ec.chistat <- sum((data-Ec)^2 / Ec)
pchisq(Ec.chistat, df=(nrow(data)-1) * (ncol(data)-1), lower.tail = FALSE)
I realize I could probably use a GLM for this, but it's convenient to keep in table format to directly address which of the off-diagonals have increased more than expected by chance.
Note: for comparison, the standard chisq assuming independence of variable would be:
<!-- language: lang-r -->
rs <- rowSums(d)
cs <- colSums(d)
n <- sum(d)
(E <- outer(rs, cs, "*")/n)
chistat <- sum((d - E) ^ 2 / E)
pchisq(chistat, df=(nrow(d)-1) * (ncol(d)-1), lower.tail = FALSE)
```

And quickly received a response recommending McNemar’s paired odds ratio test, with the caveat that it only works for *binary* outcomes. Some Wikipedia revealed that there are at least two generalizations of the McNemar test to square tables with more than two observations. This SAS paper.

Found two R packages that include functions for Stuart-Maxwell or Bhapkar’s test:

Can I connect these to something like the `chisq.posthoc`

test from the polytomous package? Or just report overall lack of marginal homogeneity and then pick out the cells with the greatest deviation?

This work is licensed under a Creative Commons Attribution 4.0 International License.