Schur complements are quantities that arise often in linear algebra in the context of block matrix inversion. Here, we review the basics and show an application in statistics.
Consider a block matrix $M$:
\[M = \begin{bmatrix} A & B \\ C & D \\ \end{bmatrix}\]where $A$ is $p \times p$, $B$ is $p \times q$, $C$ is $q \times p$, and $D$ is $q \times q$. Suppose we were interested in inverting $M$. One way would be to try to invert $M$ directly without capitalizing on its block structure. However, as we’ll see, we can find a more clever way to find $M^{-1}$.
To start, let’s perform an LDU decomposition on $M$. If we right-multiply $M$ by
\(L = \begin{bmatrix} I_p & 0 \\ -D^{-1} C & I_q \\ \end{bmatrix},\) we get
\begin{align} ML &= \begin{bmatrix} A & B \\ C & D \\ \end{bmatrix} \begin{bmatrix} I_p & 0 \\ -D^{-1} C & I_q \\ \end{bmatrix} \\ &= \begin{bmatrix} A - BD^{-1} C & B \\ 0 & D \end{bmatrix} \\ \end{align}
This further decomposes as
\begin{align} \begin{bmatrix} A - BD^{-1} C & B \\ 0 & D \end{bmatrix} &= \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix} \begin{bmatrix} A - BD^{-1} C & 0 \\ 0 & D \end{bmatrix} \\ \end{align}
Thus, we can rewrite $M$ as
\[M = \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix} \begin{bmatrix} A - BD^{-1} C & 0 \\ 0 & D \end{bmatrix} \begin{bmatrix} I_p & 0 \\ D^{-1} C & I_q \\ \end{bmatrix}.\]Notice that at this point, we have decomposed $M$ into to a upper-diagonal matrix, a diagonal matrix, and an lower-diagonal matrix.
Recall that for two matrices $A$ and $B$, the inverse of their product can be written as
\[(AB)^{-1} = B^{-1} A^{-1}.\]For three matrices, we then have
\[(ABC)^{-1} = ((AB)C)^{-1} = C^{-1} (AB)^{-1} = C^{-1} B^{-1} A^{-1}.\]Then, the inverse of $M$ can be written as
\[M^{-1} = \begin{bmatrix} I_p & 0 \\ D^{-1} C & I_q \\ \end{bmatrix}^{-1} \begin{bmatrix} A - BD^{-1} C & 0 \\ 0 & D \end{bmatrix}^{-1} \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix}^{-1}.\]For the first and third matrices, we have fairly simple inverses (just negate the lower-left and upper-right blocks, respecitively):
\[\begin{bmatrix} I_p & 0 \\ D^{-1} C & I_q \\ \end{bmatrix}^{-1} = \begin{bmatrix} I_p & 0 \\ -D^{-1} C & I_q \\ \end{bmatrix}\]and
\[\begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix}^{-1} = \begin{bmatrix} I_p & -BD^{-1} \\ 0 & I_q \end{bmatrix}.\]For the middle matrix, the inverse is simply another block diagonal matrix with each block inverted:
\[\begin{bmatrix} A - BD^{-1} C & 0 \\ 0 & D \end{bmatrix}^{-1} = \begin{bmatrix} (A - BD^{-1} C)^{-1} & 0 \\ 0 & D^{-1} \end{bmatrix}\]Plugging in and simplifying, we have
\begin{align} M^{-1} &= \begin{bmatrix} I_p & 0 \\ -D^{-1} C & I_q \\ \end{bmatrix} \begin{bmatrix} (A - BD^{-1} C)^{-1} & 0 \ 0 & D^{-1} \end{bmatrix} \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix} \\ &= \begin{bmatrix} (A - BD^{-1} C)^{-1} & 0 \\ -D^{-1} C (A - BD^{-1} C)^{-1} & D^{-1} \\ \end{bmatrix} \begin{bmatrix} I_p & BD^{-1} \\ 0 & I_q \end{bmatrix} \\ &= \begin{bmatrix} (A - BD^{-1} C)^{-1} & (A - BD^{-1} C)^{-1} BD^{-1} \\ -D^{-1} C (A - BD^{-1} C)^{-1} & -D^{-1} C (A - BD^{-1} C)^{-1} BD^{-1} + D^{-1} \\ \end{bmatrix} \\ \end{align}
Notice that to get the inverse of $M$, we now only need the the inverse of $D$ and the inverse of another quantity, $(A - BD^{-1} C)^{-1}$. This second quantity is known as the Schur complement.
To illustrate the usefulness and prevalence of Schur complements, let’s take a look at an application of them in statistics.
Consider two Gaussian random vectors $\mathbf{X}$ and $\mathbf{Y}$ of length $p$ and $q$, respectively, where we assume for the sake of simplicity that their means are 0:
\begin{align} \mathbf{X} &\sim \mathcal{N}_p(\mathbf{0}, \boldsymbol{\Sigma}_X) \\ \mathbf{Y} &\sim \mathcal{N}_q(\mathbf{0}, \boldsymbol{\Sigma}_Y). \\ \end{align}
Their joint distribution is then
\[(\mathbf{X}, \mathbf{Y}) \sim \mathcal{N}_{p + q}(\mathbf{0}, \boldsymbol{\Sigma})\]where $\boldsymbol{\Sigma}$ has a block structure:
\[\boldsymbol{\Sigma} = \begin{bmatrix} \boldsymbol{\Sigma}_{XX} & \boldsymbol{\Sigma}_{XY}^\top \\ \boldsymbol{\Sigma}_{XY} & \boldsymbol{\Sigma}_{YY} \\ \end{bmatrix}.\]Denote the precision matrix as $\boldsymbol{\Omega} = \boldsymbol{\Sigma}^{-1}$, and give it a similar block structure:
\[\boldsymbol{\Omega} = \begin{bmatrix} \boldsymbol{\Omega}_{XX} & \boldsymbol{\Omega}_{XY}^\top \\ \boldsymbol{\Omega}_{XY} & \boldsymbol{\Omega}_{YY} \\ \end{bmatrix}.\]Using the Schur complement result above, we already know that
\[\boldsymbol{\Omega}_{XX} = (\boldsymbol{\Sigma}_{XX} - \boldsymbol{\Sigma}_{XY}^\top \boldsymbol{\Sigma}_{YY}^{-1} \boldsymbol{\Sigma}_{XY})^{-1}\]Suppose we’re interested in the conditional distribution of $\mathbf{X} | \mathbf{Y} = \mathbf{y}$. Then we can write the conditional density as
\begin{align} f(\mathbf{x} | \mathbf{y}) &= 2\pi^{-p/2} |\boldsymbol{\Omega}|^{1/2} \exp\left( -\frac12 \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix}^\top \begin{bmatrix} \boldsymbol{\Omega}_{XX} & \boldsymbol{\Omega}_{XY}^\top \ \boldsymbol{\Omega}_{XY} & \boldsymbol{\Omega}_{YY} \ \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} \right) \\ &\propto \exp\left( -\frac12 \begin{bmatrix} \mathbf{x}^\top \boldsymbol{\Omega}_{XX} + \mathbf{y}^\top \boldsymbol{\Omega}_{XY} \\ \mathbf{x}^\top \boldsymbol{\Omega}_{XY}^\top + \mathbf{y}^\top \boldsymbol{\Omega}_{YY} \end{bmatrix}^\top \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} \right) \\ &\propto \exp\left( -\frac12 (\mathbf{x}^\top \boldsymbol{\Omega}_{XX} \mathbf{x} + \mathbf{y}^\top \boldsymbol{\Omega}_{XY} \mathbf{x} + \mathbf{x}^\top \boldsymbol{\Omega}_{XY}^\top \mathbf{y} + \mathbf{y}^\top \boldsymbol{\Omega}_{YY} \mathbf{y}) \right) \\ \end{align}
Ignoring the terms that don’t depend on $\mathbf{x}$, we have
\begin{align} f(\mathbf{x} | \mathbf{y}) &\propto \exp\left( -\frac12 (\mathbf{x}^\top \boldsymbol{\Omega}_{XX} \mathbf{x} + \mathbf{y}^\top \boldsymbol{\Omega}_{XY} \mathbf{x} + \mathbf{x}^\top \boldsymbol{\Omega}_{XY}^\top \mathbf{y}) \right) \\ &\propto \exp\left( -\frac12 \mathbf{x}^\top \boldsymbol{\Omega}_{XX} \mathbf{x} - \mathbf{x}^\top \boldsymbol{\Omega}_{XY}^\top \mathbf{y} \right) \\ \end{align}
Putting this into a form that allows us to read off the covariance, we have
\begin{align} \exp\left( -\frac12 \mathbf{x}^\top \boldsymbol{\Omega}_{XX} \mathbf{x} - \mathbf{x}^\top \boldsymbol{\Omega}_{XY}^\top \mathbf{y} \right) &= \exp\left( -\frac12 (\mathbf{x} - \boldsymbol{\Omega}_{XX}^{-1} \boldsymbol{\Omega}_{XY} \mathbf{y})^\top \boldsymbol{\Omega}_{XX} (\mathbf{x} - \boldsymbol{\Omega}_{XX}^{-1} \boldsymbol{\Omega}_{XY} \mathbf{y}) \right). \\ \end{align}
Now, we can see that the covariance of $\mathbf{X} | \mathbf{Y} = \mathbf{y}$ is $\boldsymbol{\Omega}_{XX} = (\boldsymbol{\Sigma}_{XX} - \boldsymbol{\Sigma}_{XY}^\top \boldsymbol{\Sigma}_{YY}^{-1} \boldsymbol{\Sigma}_{XY})^{-1}$.