Schur complements

6 minute read

Published:

Schur complements are quantities that arise often in linear algebra in the context of block matrix inversion. Here, we review the basics and show an application in statistics.

LDU decomposition

Consider a block matrix $M$:

\[M = \begin{bmatrix} A & B \\ C & D \\ \end{bmatrix}\]

where $A$ is $p \times p$, $B$ is $p \times q$, $C$ is $q \times p$, and $D$ is $q \times q$. Suppose we were interested in inverting $M$. One way would be to try to invert $M$ directly without capitalizing on its block structure. However, as we’ll see, we can find a more clever way to find $M^{-1}$.

To start, let’s perform an LDU decomposition on $M$. If we right-multiply $M$ by

\(L = \begin{bmatrix} I_p & 0 \\ -D^{-1} C & I_q \\ \end{bmatrix},\) we get

\begin{align} ML &= \begin{bmatrix} A & B \\ C & D \\ \end{bmatrix} \begin{bmatrix} I_p & 0 \\ -D^{-1} C & I_q \\ \end{bmatrix} \\ &= \begin{bmatrix} A - BD^{-1} C & B \\ 0 & D \end{bmatrix} \\ \end{align}

This further decomposes as

\begin{align} \begin{bmatrix} A - BD^{-1} C & B \\ 0 & D \end{bmatrix} &= \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix} \begin{bmatrix} A - BD^{-1} C & 0 \\ 0 & D \end{bmatrix} \\ \end{align}

Thus, we can rewrite $M$ as

\[M = \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix} \begin{bmatrix} A - BD^{-1} C & 0 \\ 0 & D \end{bmatrix} \begin{bmatrix} I_p & 0 \\ D^{-1} C & I_q \\ \end{bmatrix}.\]

Notice that at this point, we have decomposed $M$ into to a upper-diagonal matrix, a diagonal matrix, and an lower-diagonal matrix.

Inverting $M$

Recall that for two matrices $A$ and $B$, the inverse of their product can be written as

\[(AB)^{-1} = B^{-1} A^{-1}.\]

For three matrices, we then have

\[(ABC)^{-1} = ((AB)C)^{-1} = C^{-1} (AB)^{-1} = C^{-1} B^{-1} A^{-1}.\]

Then, the inverse of $M$ can be written as

\[M^{-1} = \begin{bmatrix} I_p & 0 \\ D^{-1} C & I_q \\ \end{bmatrix}^{-1} \begin{bmatrix} A - BD^{-1} C & 0 \\ 0 & D \end{bmatrix}^{-1} \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix}^{-1}.\]

For the first and third matrices, we have fairly simple inverses (just negate the lower-left and upper-right blocks, respecitively):

\[\begin{bmatrix} I_p & 0 \\ D^{-1} C & I_q \\ \end{bmatrix}^{-1} = \begin{bmatrix} I_p & 0 \\ -D^{-1} C & I_q \\ \end{bmatrix}\]

and

\[\begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix}^{-1} = \begin{bmatrix} I_p & -BD^{-1} \\ 0 & I_q \end{bmatrix}.\]

For the middle matrix, the inverse is simply another block diagonal matrix with each block inverted:

\[\begin{bmatrix} A - BD^{-1} C & 0 \\ 0 & D \end{bmatrix}^{-1} = \begin{bmatrix} (A - BD^{-1} C)^{-1} & 0 \\ 0 & D^{-1} \end{bmatrix}\]

Plugging in and simplifying, we have

\begin{align} M^{-1} &= \begin{bmatrix} I_p & 0 \\ -D^{-1} C & I_q \\ \end{bmatrix} \begin{bmatrix} (A - BD^{-1} C)^{-1} & 0 \ 0 & D^{-1} \end{bmatrix} \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix} \\ &= \begin{bmatrix} (A - BD^{-1} C)^{-1} & 0 \\ -D^{-1} C (A - BD^{-1} C)^{-1} & D^{-1} \\ \end{bmatrix} \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix} \\ &= \begin{bmatrix} (A - BD^{-1} C)^{-1} & (A - BD^{-1} C)^{-1} BD^{-1} \\ -D^{-1} C (A - BD^{-1} C)^{-1} & -D^{-1} C (A - BD^{-1} C)^{-1} BD^{-1} + D^{-1} \\ \end{bmatrix} \\ \end{align}

Notice that to get the inverse of $M$, we now only need the the inverse of $D$ and the inverse of another quantity, $(A - BD^{-1} C)^{-1}$. This second quantity is known as the Schur complement.

Multivariate Guassians

To illustrate the usefulness and prevalence of Schur complements, let’s take a look at an application of them in statistics.

Consider two Gaussian random vectors $\mathbf{X}$ and $\mathbf{Y}$ of length $p$ and $q$, respectively, where we assume for the sake of simplicity that their means are 0:

\begin{align} \mathbf{X} &\sim \mathcal{N}_p(\mathbf{0}, \boldsymbol{\Sigma}_X) \\ \mathbf{Y} &\sim \mathcal{N}_q(\mathbf{0}, \boldsymbol{\Sigma}_Y). \\ \end{align}

Their joint distribution is then

\[(\mathbf{X}, \mathbf{Y}) \sim \mathcal{N}_{p + q}(\mathbf{0}, \boldsymbol{\Sigma})\]

where $\boldsymbol{\Sigma}$ has a block structure:

\[\boldsymbol{\Sigma} = \begin{bmatrix} \boldsymbol{\Sigma}_{XX} & \boldsymbol{\Sigma}_{XY}^\top \\ \boldsymbol{\Sigma}_{XY} & \boldsymbol{\Sigma}_{YY} \\ \end{bmatrix}.\]

Denote the precision matrix as $\boldsymbol{\Omega} = \boldsymbol{\Sigma}^{-1}$, and give it a similar block structure:

\[\boldsymbol{\Omega} = \begin{bmatrix} \boldsymbol{\Omega}_{XX} & \boldsymbol{\Omega}_{XY}^\top \\ \boldsymbol{\Omega}_{XY} & \boldsymbol{\Omega}_{YY} \\ \end{bmatrix}.\]

Using the Schur complement result above, we already know that

\[\boldsymbol{\Omega}_{XX} = (\boldsymbol{\Sigma}_{XX} - \boldsymbol{\Sigma}_{XY}^\top \boldsymbol{\Sigma}_{YY}^{-1} \boldsymbol{\Sigma}_{XY})^{-1}\]

Suppose we’re interested in the conditional distribution of $\mathbf{X} | \mathbf{Y} = \mathbf{y}$. Then we can write the conditional density as

\begin{align} f(\mathbf{x} | \mathbf{y}) &= 2\pi^{-p/2} |\boldsymbol{\Omega}|^{1/2} \exp\left( -\frac12 \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix}^\top \begin{bmatrix} \boldsymbol{\Omega}_{XX} & \boldsymbol{\Omega}_{XY}^\top \ \boldsymbol{\Omega}_{XY} & \boldsymbol{\Omega}_{YY} \ \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} \right) \\ &\propto \exp\left( -\frac12 \begin{bmatrix} \mathbf{x}^\top \boldsymbol{\Omega}_{XX} + \mathbf{y}^\top \boldsymbol{\Omega}_{XY} \\ \mathbf{x}^\top \boldsymbol{\Omega}_{XY}^\top + \mathbf{y}^\top \boldsymbol{\Omega}_{YY} \end{bmatrix}^\top \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} \right) \\ &\propto \exp\left( -\frac12 (\mathbf{x}^\top \boldsymbol{\Omega}_{XX} \mathbf{x} + \mathbf{y}^\top \boldsymbol{\Omega}_{XY} \mathbf{x} + \mathbf{x}^\top \boldsymbol{\Omega}_{XY}^\top \mathbf{y} + \mathbf{y}^\top \boldsymbol{\Omega}_{YY} \mathbf{y}) \right) \\ \end{align}

Ignoring the terms that don’t depend on $\mathbf{x}$, we have

\begin{align} f(\mathbf{x} | \mathbf{y}) &\propto \exp\left( -\frac12 (\mathbf{x}^\top \boldsymbol{\Omega}_{XX} \mathbf{x} + \mathbf{y}^\top \boldsymbol{\Omega}_{XY} \mathbf{x} + \mathbf{x}^\top \boldsymbol{\Omega}_{XY}^\top \mathbf{y}) \right) \\ &\propto \exp\left( -\frac12 \mathbf{x}^\top \boldsymbol{\Omega}_{XX} \mathbf{x} - \mathbf{x}^\top \boldsymbol{\Omega}_{XY}^\top \mathbf{y} \right) \\ \end{align}

Putting this into a form that allows us to read off the covariance, we have

\begin{align} \exp\left( -\frac12 \mathbf{x}^\top \boldsymbol{\Omega}_{XX} \mathbf{x} - \mathbf{x}^\top \boldsymbol{\Omega}_{XY}^\top \mathbf{y} \right) &= \exp\left( -\frac12 (\mathbf{x} - \boldsymbol{\Omega}_{XX}^{-1} \boldsymbol{\Omega}_{XY} \mathbf{y})^\top \boldsymbol{\Omega}_{XX} (\mathbf{x} - \boldsymbol{\Omega}_{XX}^{-1} \boldsymbol{\Omega}_{XY} \mathbf{y}) \right). \\ \end{align}

Now, we can see that the covariance of $\mathbf{X} | \mathbf{Y} = \mathbf{y}$ is $\boldsymbol{\Omega}_{XX} = (\boldsymbol{\Sigma}_{XX} - \boldsymbol{\Sigma}_{XY}^\top \boldsymbol{\Sigma}_{YY}^{-1} \boldsymbol{\Sigma}_{XY})^{-1}$.

References