This is a rant that I wanted to rant about for years about the terriftying guts of model matrix and its applications. I came accross three incidences where I have to understand (*hack*) model matrix for specific application and when I am developing/working on a new method. The three cases are the following:

- Contrast (I still have no clue how it works)
- Structural Imputation (NAs)
- Random Effect Matrices (Z)

I (like 50% (more like 99.9% humans)) like to assume I know what I am doing and stick a dataset in a blackbox, press a button and close my eyes and hope the blackbox can read my mind or do something smart/sensible. Blackboxes usually does something reasonable and often *more correct* than user have in mind because these are tools developed by very smart people.

Here is the part that bugs me how users donâ€™t appreicate how complicated these blackboxes are and take them for granted. Developing software is hard and smart people spend many hours banging on their heads to create these software! As a young developers, it is dangerous to assume/ignore/breeze through the details. Mathematically speaking, everything is straight forward and never pay much attention to *model matrices*. Here is a simple example of a very simple concept taught in an intro to stats course or experimental design course.

This is a short example of what I mean model matrices can get confusing and hairy. Letâ€™s create a dataset of species in different sites and we are intested in *modelling* the species-site, site-species interaction. What does the model matrix look like?

```
library(lme4)
library(Matrix)
library(dplyr)
```

```
dd <- expand.grid(sp=paste("sp",1:3,sep=""),site=paste("site",1:2,sep=""),rep=1)
dd$site <- factor(dd$site)
dd$sp <- factor(dd$sp)
print(dd)
```

```
## sp site rep
## 1 sp1 site1 1
## 2 sp2 site1 1
## 3 sp3 site1 1
## 4 sp1 site2 1
## 5 sp2 site2 1
## 6 sp3 site2 1
```

`print(image(with(dd,fac2sparse(interaction(site,sp)))))`

`print(with(dd,fac2sparse(interaction(site,sp))))`

```
## 6 x 6 sparse Matrix of class "dgCMatrix"
##
## site1.sp1 1 . . . . .
## site2.sp1 . . . 1 . .
## site1.sp2 . 1 . . . .
## site2.sp2 . . . . 1 .
## site1.sp3 . . 1 . . .
## site2.sp3 . . . . . 1
```

`print(image(with(dd,fac2sparse(interaction(sp,site)))))`

`print(with(dd,fac2sparse(interaction(sp,site))))`

```
## 6 x 6 sparse Matrix of class "dgCMatrix"
##
## sp1.site1 1 . . . . .
## sp2.site1 . 1 . . . .
## sp3.site1 . . 1 . . .
## sp1.site2 . . . 1 . .
## sp2.site2 . . . . 1 .
## sp3.site2 . . . . . 1
```

As you can see, it matters how you set it up. Now letâ€™s rearrange the data and sort them differently.

```
## sp site rep
## 1 sp1 site1 1
## 2 sp1 site2 1
## 3 sp2 site1 1
## 4 sp2 site2 1
## 5 sp3 site1 1
## 6 sp3 site2 1
```

```
## 6 x 6 sparse Matrix of class "dgCMatrix"
##
## site1.sp1 1 . . . . .
## site2.sp1 . 1 . . . .
## site1.sp2 . . 1 . . .
## site2.sp2 . . . 1 . .
## site1.sp3 . . . . 1 .
## site2.sp3 . . . . . 1
```

```
## 6 x 6 sparse Matrix of class "dgCMatrix"
##
## sp1.site1 1 . . . . .
## sp2.site1 . . 1 . . .
## sp3.site1 . . . . 1 .
## sp1.site2 . 1 . . . .
## sp2.site2 . . . 1 . .
## sp3.site2 . . . . . 1
```

Again, they look different *and* from p1 and p2. But they are *identical* somehow.

The point is, it gets very hairy when you try to do more complicated stuff like nested random effects with both main effects.