Loading [MathJax]/jax/output/CommonHTML/jax.js

Andy Jones

Moment-generating functions

While probability distributions are most commonly defined by their probability density functions (PDFs) and cumulative density functions (CDFs), there exist other characterizations as well. One of those is the moment-generating function, which we explore in this post.

Defining moment-generating functions

Consider a random variable X. Its moment-generating function (MGF) is defined by

MX(t)=E[etX].

A key property of the MGF – and the one that gives the function its name – is that its derivatives with respect to t are equal to the distribution’s moments. In other words, by differentiating the MGF w.r.t. t, we “generate” the distribution’s moments.

To see this property, it is instructive to inspect the Taylor expansion of the MGF around t=0:

MX(t)=MX(0)+tMX(t)+12!t2MX(t)+13!t3MX(t)+=E[e0X]+E[tXe0X]+12!E[t2X2e0X]+13!E[t3X3e0X]+=1+tE[X]+12!t2E[X2]+13!t3E[X3]+,

where we have used the fact that all derivatives of et with respect to t are equal to et and the fact that t can be removed from the expectation.

We can now clearly see that the nth derivative of the MGF with respect to t evaluated at t=0 is equal to the nth moment.

m0=1m1=E[X]m2=E[X2]m3=E[X3]

Importantly, recall that these moments are in general not the same as the distribution’s central moments (mean, variance, skew, etc.). Recall that the nth central moment is given by:

E[(XE[X])n].

Of course, when E[X]=0, then the moments and central moments coincide, but this will not be true in general.

We now demonstrate the MGF through a series of examples.

Example: Discrete distribution

Consider a random variable X drawn from a discrete probability distribution across K states, where the probability of state k is denoted as pk. The code and plot below shows an example with K=5 states.

import numpy as np
np.random.seed(2)
K = 5
states = np.arange(1, K + 1)
ps = np.random.uniform(size=K)
ps = ps / ps.sum()
plt.bar(states, ps)
plt.show()
Discrete distribution across K=5 states.

In this case, the MGF is relatively straightforward to calculate by directly plugging into Equation 1:

MX(t)=E[etX]=Kk=1pketxk.

If the distribution is uniform (i.e., p1=p2==pK=1/K), then the MGF can be further simplified as a geometric series.

Let’s visualize the MGF for the example discrete distribution above with K=5 states. Below we plot MX(t) for t[1,1].

MGF for the discrete distribution.

Intuitively, we can think of this distribution’s first moment as the slope of this curve at t=0. The distribution’s second moment is given by this curve’s quadratic curvature at t=0, and so on. More precisely, the moments are easily calculated from the derivatives of the MGF:

m1=Kk=1pkxketxkm2=Kk=1pkx2ketxkm3=Kk=1pkx3ketxk

Recall that the moments are equal to the evaluation of these derivatives at t=0. We visualize each of these derivatives below, where the left and right panels show the derivatives on the original and log scales, respectively:

Let’s do some sanity checks to make sure that the empirical derivatives of the MGF are equal to those obtained by directly evaluating the moments. Below, we plot the first three derivatives of the MGF as red lines.

To compute an empirical estimate of each moment, we use the NumPy function np.gradient applied to the MGF. To estimate the nth moment, we apply np.gradient recursively n times. In the plots below, the horizontal blue lines show the empirical estimate of each derivative at t=0. For example, the code below computes an empirical estimate of the first derivative:

lims = [-1e0, 1e0]
ts = np.linspace(lims[0], lims[1], 401)
MXt = (np.exp(states * ts.reshape(-1, 1)) * ps).sum(1) # MGF
d1 = np.gradient(MXt, ts)                              # d/dt
m1 = d1[ts == 0]                                       # First moment at t=0

The vertical gray lines show t=0, which is where we expect the blue and red lines to intersect (demonstrating agreement between the theoretical and empirical versions).

Example: Normal distribution

Consider a Gaussian random variable XN(μ,σ2). Recall that multiplying X by a scalar t results in another Gaussian random variable with scaled mean and variance:

tXN(tμ,t2σ2).

Further, recall that exponentiating a Gaussian random variable results in a random variable with a log-normal distribution. That is,

etXlogN(tμ,t2σ2).

These transformations of X are shown visually below. Multiplication by t scales the mean and variance, and exponentiating drastically changes the shape of the distribution.

In this case, the MGF corresponds with the mean of the log-normal random variable. By the distribution’s basic properties, the mean of the log-normal distribution is given by

E[etX]=exp{tμ+12t2σ2}.

We can then compute the Gaussian distribution’s moments from the derivatives of the MGF. For the first moment, we recover the distribution’s mean μ, which should intuitively make sense:

m1=(μ+σ2t)exp{tμ+12t2σ2}|t=0=μ.

For the second moment, we differentiate again (using the product rule):

m2=[(μ+σ2t)2exp{tμ+12t2σ2}+σ2exp{tμ+12t2σ2}]|t=0=μ2+σ2.

Note that when the mean is zero, μ=0, we recover the variance σ2 as the second central moment.

Calculation of the third moment follows a similar logic, again using the product rule:

m3=[(μ+σ2t)3exp{tμ+12t2σ2}+2σ2(μ+σ2t)exp{tμ+12t2σ2}+σ2(μ+σ2t)]|t=0=μ3+3μσ2.

Again, notice that when the mean is zero, μ=0, we recover the fact that the third central moment is always zero, coinciding with the Gaussian distribution’s symmetry and lack of skew.

References