Here, we discuss two distributions which arise as scale mixtures of normals: the Laplace and the Student-$t$.
Mixture models are typically first introduced in the context of discrete mixtures. For example, the Gaussian mixture model (GMM) is often the canonical mixture model. In particular, if we assume $x$ follows a GMM with $K$ mixture components, we can write the model as follows. \begin{align} p(x) = \sum\limits_{k=1}^K \pi_k \mathcal{N}(x; \mu_k, \sigma_k) \\ \end{align} where $\pi_1, \dots, \pi_K$ are the mixture weights that must satisfy $\sum_k \pi_k = 1$.
However, we can also consider continuous mixtures. Consider, for example, a continuous mixture of Gaussians defined by the following hierarchical model: \begin{align} x &\sim \mathcal{N}(\mu, \sigma^2) \\ \mu &\sim \mathcal{N}(0, \tau^2) \\ \end{align} Now, we can write the marginal distribution of $x$: \begin{equation} p(x) = \int_{-\infty}^\infty p(x | \mu) p(\mu) d\mu \end{equation} We call this a continuous mixture of Gaussians. Notice that this mixture has a similar form to the discrete mixture above. In particular, $p(\mu)$ here plays a similar role as ${\pi_k}_{k=1}^K$ above as the “mixing distribution”.
In the Gaussian setting, we can compute this marginal in closed form (see appendix for full derivation) to get: \begin{equation} x \sim \mathcal{N}(0, \tau^2 + \sigma^2) \end{equation} Intuitively, we can think of the Gaussian prior on $\mu$ as adding extra variability to $x$, as compared to a model with a fixed $\mu$.
Here, the mixing distribution was specified for the mean parameter $\mu$, but we can also specify it for the variance $\sigma^2$, as we’ll see next.
Consider the following hierarchical model: \begin{align} x &\sim \mathcal{N}(\mu, \sigma^2) \\ \sigma^2 &\sim p(\sigma^2) \\ \end{align} Assuming a constant $\mu$ for now, the marginal distribution of $x$ is then \begin{equation} p(x) = \int_0^\infty p(x | \sigma^2) p(\sigma^2) d\sigma^2 \end{equation} Below, we consider two choices for $p(\sigma^2)$ and discuss the implications for the marginal density of $x$.
Consider placing an exponential prior on $\sigma^2$ such that $\sigma^2 \sim \text{Exp}(2 \lambda^2)$. Recall that the PDF of the exponential distribution is \begin{equation} p(\sigma^2; \lambda) = 2 \lambda^2 \exp(-2 \lambda^2 \sigma^2) \end{equation} We can then compute the marginal of $x$ as follows:
\begin{align} p(x) &= \int_0^\infty p(x | \sigma^2) p(\sigma^2) d\sigma^2 \\ &= \int_0^\infty \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{1}{2\sigma^2} x^2 \right) 2\lambda^2 \exp(-2 \lambda^2 \sigma^2) d\sigma^2 \\ &= 2 \lambda^2 \int_0^\infty \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -2 \lambda^2 \sigma^2 - \frac{1}{2\sigma^2} x^2 \right) d\sigma^2 \\ &= 2 \lambda^2 \int_0^\infty \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{2 \lambda^2}{\sigma^2} \left( (\sigma^2)^2 + \frac{1}{4 \lambda^2} x^2 \right) \right) d\sigma^2 \\ &= 2 \lambda^2 \int_0^\infty \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{ 2\lambda^2}{\sigma^2} \left( (\sigma^2 - \frac{1}{2 \lambda} x)^2 + 2 \sigma^2 \frac{1}{2 \lambda} x \right) \right) d\sigma^2 \\ &= 2 \lambda^2 \int_0^\infty \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{2\lambda^2}{\sigma^2} \left( (\sigma^2 - \frac{x}{2 \lambda})^2 \right) - 2 \lambda x \right) d\sigma^2 \\ &= 2 \lambda^2 \exp(- 2\lambda x) \int_0^\infty \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{2\lambda^2 (x / 2\lambda)^2 }{\sigma^2 (x / 2\lambda)^2} \left( \sigma^2 - \frac{x}{2\lambda} \right)^2 \right) d\sigma^2 \\ &= 2 \lambda^2 \exp(- 2\lambda x) \int_0^\infty \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{2\lambda^2 \frac{x^2}{4\lambda^2} }{\sigma^2 \frac{x^2}{4\lambda^2}} \left( \sigma^2 - \frac{x}{2\lambda} \right)^2 \right) d\sigma^2 \\ &= 2 \lambda^2 \exp(- 2\lambda x) \int_0^\infty \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{x^2 }{2\sigma^2 (x / 2\lambda)^2} \left( \sigma^2 - \frac{x}{2\lambda} \right)^2 \right) d\sigma^2 \\ &= 2 \lambda^2 \exp(- 2\lambda x) \frac{1}{x} \int_0^\infty \sigma^2 \sqrt{\frac{x^2}{ 2\pi \sigma^3}} \exp \left( -\frac{x^2 }{2\sigma^2 (x / 2\lambda)^2} \left( \sigma^2 - \frac{x}{2\lambda} \right)^2 \right) d\sigma^2 \\ \end{align} Now let $\tau := x^2$ and $\mu = \frac{x}{2\lambda}$. \begin{align} p(x) = 2 \lambda^2 \exp(- 2\lambda x) \frac{1}{x} \int_0^\infty \sigma^2 \sqrt{\frac{\tau}{ 2\pi \sigma^3}} \exp \left( -\frac{\tau }{2\sigma^2 \mu^2} \left( \sigma^2 - \mu \right)^2 \right) d\sigma^2 \\ \end{align}
Recognizing the integrand (past the first $\sigma^2$) as a inverse-Gaussian distribution with mean $\mu$ and scale parameter $\tau$, we can notice that the integral is an expectation of an inverse-Gaussian-distributed random variable $\sigma^2$. Thus, \begin{align} p(x) &= 2 \lambda^2 \exp(- 2\lambda x) \frac{1}{x} \mathbb{E}[\sigma^2] \\ &= 2 \lambda^2 \exp(- 2\lambda x) \frac{1}{x} \mu \\ &= 2 \lambda^2 \exp(- 2\lambda x) \frac{1}{x} \frac{x}{2 \lambda} \\ &= \lambda \exp(- 2\lambda x) \\ \end{align}
Notice that this has the form of a Laplace distribution with scale parameter $b = \frac{1}{2\lambda}$. To be precise, I should have started using $|x|$ instead of $x$ starting in line $(14)$. Inserting that now, we have a real Laplace density: \(p(x) = \lambda \exp(- 2\lambda |x|).\)
Laplace priors (or, equivalently, a scale mixture of normals with exponential mixing distribution) are often used as a Bayesian analogue to the LASSO. Intuitively, the correspondence with the LASSO can be seen in the log-likelihood, which will penalize large values of $|x|$.
Consider the following hierarchical model: \begin{align} x &\sim \mathcal{N}(0, \sigma^2) \\ \sigma^2 &\sim \text{Inv-Gamma}(\nu / 2, \nu / 2) \\ \end{align} We can again find the marginal distribution of $\sigma^2$: \begin{align} p(x) &= \int_0^\infty p(x | \sigma^2) p(\sigma^2) d\sigma^2 \\ &= \int_0^\infty \frac{1}{\sigma \sqrt{2\pi}} \exp \left( \frac{-x^2}{2\sigma^2} x^2 \right) \frac{(\nu / 2)^{\nu / 2}}{\Gamma(\nu / 2)} (\sigma^2)^{-\nu / 2 - 1} \exp\left( -\frac{\nu / 2}{\sigma^2} \right) d\sigma^2 \\ &= \frac{(\nu / 2)^{\nu / 2}}{\Gamma(\nu / 2) \sqrt{2 \pi}} \int_0^\infty \exp \left( -\frac{ x^2 + \nu}{2\sigma^2} \right) (\sigma^2)^{-\frac{\nu + 1}{2} - 1} d\sigma^2 \\ &= \frac{(\nu / 2)^{\nu / 2}}{\Gamma(\nu / 2) \sqrt{2 \pi}} \Gamma\left(\frac{\nu + 1}{2}\right) \left(\frac{x^2 + \nu}{2}\right)^{-\frac{\nu + 1}{2}} \\ \end{align} With a little more algebra, this can be put in the form of a Student-t distribution with $\nu$ degrees of freedom.
\begin{align} p(x) &= \int_{-\infty}^\infty p(x | \mu) p(\mu) d\mu \\ &= \int_{-\infty}^\infty \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{1}{2\sigma^2} (x - \mu)^2 \right) \frac{1}{\tau \sqrt{2\pi}} \exp \left( -\frac{1}{2\tau^2} \mu^2 \right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \int_{-\infty}^\infty \exp \left( -\frac{1}{2\sigma^2} (x - \mu)^2 - \frac{1}{2\tau^2} \mu^2 \right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \int_{-\infty}^\infty \exp \left( -\frac{1}{2} \left(\frac{1}{\sigma^2} x^2 - \frac{1}{\sigma^2} 2x\mu + \frac{1}{\sigma^2} \mu^2 + \frac{1}{\tau^2} \mu^2\right) \right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \int_{-\infty}^\infty \exp \left( -\frac{1}{2} \left( \left(\frac{1}{\sigma^2} + \frac{1}{\tau^2}\right) \mu^2 - \frac{1}{\sigma^2} 2x\mu + \frac{1}{\sigma^2} x^2 \right) \right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \int_{-\infty}^\infty \exp \left( -\frac{1}{2} \left( \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \mu^2 - \frac{1}{\sigma^2} 2x\mu + \frac{1}{\sigma^2} x^2 \right) \right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \int_{-\infty}^\infty \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left( \mu^2 - \frac{\tau^2}{\tau^2 + \sigma^2} 2x\mu + \frac{\tau^2}{\tau^2 + \sigma^2} x^2 \right) \right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \int_{-\infty}^\infty \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left( \mu^2 - \frac{\tau^2}{\tau^2 + \sigma^2} 2x\mu + \frac{\tau^2}{\tau^2 + \sigma^2} x^2 \right) + \left(\left(\frac{\tau^2}{\tau^2 + \sigma^2} x\right)^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x^2\right) - \left(\left(\frac{\tau^2}{\tau^2 + \sigma^2} x\right)^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x^2\right) \right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \int_{-\infty}^\infty \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left( \mu^2 - \frac{\tau^2}{\tau^2 + \sigma^2} 2x\mu + \left(\frac{\tau^2}{\tau^2 + \sigma^2} x\right)^2 \right) - \left(\left(\frac{\tau^2}{\tau^2 + \sigma^2} x\right)^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x^2\right) \right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \int_{-\infty}^\infty \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left( \left(\mu^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x \right)^2 - \left(\frac{\tau^2}{\tau^2 + \sigma^2} x\right)^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x^2\right) \right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \int_{-\infty}^\infty \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left(\mu^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x \right)^2\right) \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left(\frac{\tau^2}{\tau^2 + \sigma^2} x\right)^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x^2\right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left(\frac{\tau^2}{\tau^2 + \sigma^2} x\right)^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x^2\right) \int_{-\infty}^\infty \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left(\mu^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x \right)^2\right) d\mu \\ &= \frac{1}{\sigma \tau 2\pi} \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left(\frac{\tau^2}{\tau^2 + \sigma^2} x\right)^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x^2\right) \sqrt{2\pi} \sqrt{\frac{\sigma^2 \tau^2}{\tau^2 + \sigma^2}} \\ &= \frac{1}{\sqrt{2\pi (\tau^2 + \sigma^2)}} \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left(\left(\frac{\tau^2}{\tau^2 + \sigma^2}\right)^2 x^2 - \frac{\tau^2}{\tau^2 + \sigma^2} x^2\right)\right) \\ &= \frac{1}{\sqrt{2\pi (\tau^2 + \sigma^2)}} \exp \left( -\frac{1}{2} \left(\frac{\tau^2 + \sigma^2}{\sigma^2 \tau^2}\right) \left(\left(\left(\frac{\tau^2}{\tau^2 + \sigma^2}\right)^2 - \frac{\tau^2}{\tau^2 + \sigma^2}\right) x^2 \right)\right) \\ &= \frac{1}{\sqrt{2\pi (\tau^2 + \sigma^2)}} \exp \left( -\frac{1}{2 (\tau^2 + \sigma^2)} x^2 \right) \\ \end{align}