Stein’s Method

Normal Deviate 2013-12-20

I have mentioned Stein’s method in passing, a few times on this blog. Today I want to talk about Stein’s method in a bit of detail.

1. What Is Stein’s Method?

Stein’s method, due to Charles Stein, is actually quite old, going back to 1972. But there has been a great deal of interest in the method lately. As in many things, Stein was ahead of his time.

Stein’s method is actually a collection of methods for showing that the distribution {F_n} of some random variable {X_n} is close to Normal. What makes Stein’s method so powerful is that it can be used in a variety of situations (including cases with dependence) and it tells you how far {F_n} is from Normal.

I will follow Chen, Goldstein and Shao (2011) quite closely.

2. The Idea

Let {X_1,\ldots, X_n\in \mathbb{R}} be iid with mean 0 and variance 1. Let

\displaystyle  X = \sqrt{n}\ \overline{X}_n =\frac{1}{\sqrt{n}}\sum_i X_i

and let {Z\sim N(0,1)}. We want to bound

\displaystyle  \Delta_n = \sup_z \Bigl| \mathbb{P}(X \leq z) - \mathbb{P}( Z \leq z)\Bigr| = \sup_z \Bigl| \mathbb{P}(X \leq z) - \Phi(z)\Bigr|

where {\Phi} is the cdf of a standard Normal. For simplicity, we will assume here

\displaystyle  |X_i| \leq B < \infty

but this is not needed. The key idea is this:

\displaystyle  \mathbb{E}[Y f(Y)] = \mathbb{E}[f'(Y)]

for all smooth {f} if and only if {Y\sim N(0,1)}. This suggests the following idea: if we can show that {\mathbb{E}[Y f(Y)-f'(Y)]} is close to 0, then {Y} should be almost Normal.

More precisely, let {Z\sim N(0,1)} and let {h} be any function such that {\mathbb{E}|h(Z)| < \infty}. The Stein function {f} associated with {h} is a function satisfying

\displaystyle  f'(x) - x f(x) = h(x) - \mathbb{E}[h(Z)].

It then follows that

\displaystyle  \mathbb{E}[h(X)] - \mathbb{E}[h(Z)]= \mathbb{E}[f'(X) - Xf(X)]

and showing that {X} is close to normal amounts to showing that {\mathbb{E}[f'(X) - Xf(X)]} is small.

Is there such an {f}? In fact, letting {\mu = \mathbb{E}[h(Z)]}, the Stein function is

\displaystyle  f(x) =f_h(x)= e^{x^2/2}\int_{-\infty}^x[ h(y) - \mu] e^{-y^2/2} dy. \ \ \ \ \ (1)

More precisely, {f_h} is the unique solution to the above equation subject to the side condition {\lim_{x\rightarrow\pm\infty} e^{-x^2/2}f(x) = 0}.

Let’s be more specific. Choose any {z\in \mathbb{R}} and let {h(x) = I(x\leq z) - \Phi(z)}. Let {f_z} denote the Stein function for {h}; thus {f_z} satisfies

\displaystyle  f_z'(x)-xf_z(x) = I(x\leq z) - \Phi(z).

The unique bounded solution to this equation is

\displaystyle  f(x)\equiv f_z(x) = \begin{cases} \sqrt{2\pi}e^{x^2/2}\Phi(x)[1-\Phi(z)] & x\leq z\\ \sqrt{2\pi}e^{x^2/2}\Phi(z)[1-\Phi(x)] & x> z. \end{cases}

The function {f_z} has the following properties:

\displaystyle  \Bigl| (x+a)f_z(x+a) - (x+b)f_z(x+b) \Bigr| \leq ( |x| + c)(|a|+|b|)

where {c = \sqrt{2\pi}/4}. Also,

\displaystyle  ||f_z||_\infty \leq \sqrt{\frac{\pi}{2}},\ \ \ ||f'_z||_\infty \leq 2.

Let {{\cal F} = \{f_z:\ z\in \mathbb{R}\}}. From the equation {f'(x) - x f(x) = h(x) - \mathbb{E}[h(Z)]} it follows that

\displaystyle  \mathbb{P}(X \leq z) - \mathbb{P}(Y \leq z) = \mathbb{E}[f'(X) - Xf(X)]

and so

\displaystyle  \Delta_n = \sup_z \Bigl| \mathbb{P}(X \leq z) - \mathbb{P}( Z \leq z)\Bigr| \leq \sup_{f\in {\cal F}} \Bigl| \mathbb{E}[ f'(X) - X f(X)]\Bigr|.

We have reduced the problem of bounding {\sup_z \Bigl| \mathbb{P}(X \leq z) - \mathbb{P}( Z \leq z)\Bigr|} to the problem of bounding {\sup_{f\in {\cal F}} \Bigl| \mathbb{E}[ f'(X) - X f(X)]\Bigr|}.

3. Zero Bias Coupling

As I said, Stein’s method is really a collection of methods. One of the easiest to explain is “zero-bias coupling.”

Let {\mu_3 = \mathbb{E}[|X_i|^3]}. Recall that {X_1,\ldots,X_n} are iid, mean 0, variance 1. Let

\displaystyle  X = \sqrt{n}\ \overline{X}_n = \frac{1}{\sqrt{n}}\sum_i X_i = \sum_i \xi_i

where

\displaystyle  \xi_i = \frac{1}{\sqrt{n}} X_i.

For each {i} define the leave-one-out quantity

\displaystyle  X^i = X - \xi_i

which plays a crucial role. Note that {X^i} is independent of {X_i}. We also make use of the following simple fact: for any {y} and {a}, {\Phi(y+a) - \Phi(y)\leq a/\sqrt{2\pi}} since the density of the Normal is bounded above by {1/\sqrt{2\pi}}.

Recall that {X\sim N(0,\sigma^2)} if and only if

\displaystyle  \sigma^2 \mathbb{E}[f'(X)] = \mathbb{E}[X f(X)]

for all absolutely continuous functions {f} (for which the expectations exists). Inspired by this, Goldstein and Reinert (1997) introduced the following definition. Let {X} be any mean 0 random variable with variance {\sigma^2}. Say that {X^*} has the {X}-zero bias distribution if

\displaystyle  \sigma^2 \mathbb{E} [f'(X^*)] = \mathbb{E}[X f(X)].

for all absolutely continuous functions {f} for which {\mathbb{E}| X f(X)|< \infty}. Zero-biasing is a transform that maps one random variable {X} into another random variable {X^*}. (More precisely, it maps the distribution of {X} into the distribution of {X^*}.) The Normal is the fixed point of this map. The following result shows that {X^*} exists and is unique.

Theorem 1 Let {X} be any mean 0 random variable with variance {\sigma^2}. There exists a unique distribution corresponding to a random variable {X^*} such that

\displaystyle  \sigma^2 \mathbb{E}[f'(X^*)] = \mathbb{E}[X f(X)]. \ \ \ \ \ (2)

for all absolutely continuous functions {f} for which {\mathbb{E}| X f(X)|< \infty}. The distribution of {X^*} has density

\displaystyle  p^*(x) = \frac{\mathbb{E}[ X I(X>x)]}{\sigma^2} = -\frac{\mathbb{E}[ X I(X\leq x)]}{\sigma^2}.

Proof: It may be verified that {p^*(x) \geq 0} and integrates to 1. Let us verify that (2) holds. For simplicity, assume that {\sigma^2=1}. Given an absolutely continuous {f} there is a {g} such that {f(x) = \int_0^x g}. Then

\displaystyle  \begin{array}{rcl}  \int_0^\infty f'(u)\ p^*(u) du &=& \int_0^\infty f'(u)\ \mathbb{E}[X I(X>u)] du = \int_0^\infty g(u)\ \mathbb{E}[X I(X>u)] du\\ &=& \mathbb{E}\left[X \int_0^\infty g(u) I(X>u) du \right]= \mathbb{E}\left[X \int_{0}^{X\vee 0} g(u) du \right]= \mathbb{E}[X f(X)I(X\geq 0)]. \end{array}

Similarly, {\int_{-\infty}^0 f'(u)\ p^*(u) du =\mathbb{E}[X f(X)I(X\geq 0)].}\Box

Here is a way to construct {X^*} explicitly when dealing with a sum.

Lemma 2 Let {\xi_1,\ldots, \xi_n} be independent, mean 0 random variables and let {\sigma_i^2 = {\rm Var}(\xi_i)}. Let {W =\sum_i \xi_i}. Let {\xi_1^*,\ldots, \xi_n^*} be independent and zero-bias. Define

\displaystyle  W^* = W - \xi_J + \xi_J^*

where {\mathbb{P}(J=i)=\sigma_i^2}. Then {W^*} has the {W}-zero bias distribution. In particular, suppose that {X_1,\ldots, X_n} have mean 0 common variance and let {W = \frac{1}{\sqrt{n}}\sum_i X_i}. Let {J} be a random integer from 1 to {n}. Then {W^* = W - \frac{1}{\sqrt{n}}X_J + \frac{1}{\sqrt{n}}X_J^*} has the {W}-zero bias distribution.

Now we can prove a Central Limit Theorem using zero-biasing.

Theorem 3 Suppose that {|X_i| \leq B}. Let {Y\sim N(0,1)}. Then

\displaystyle  \sup_z \Bigl| \mathbb{P}(X \leq z) - \mathbb{P}( Y \leq z)\Bigr| \leq \frac{6 B}{\sqrt{n}}.

Proof: Let {X_1^*,\ldots, X_n^*} be independent random variables where {X_i^*} is zero-bias for {X_i}. Let {J} be chosen randomly from {\{1,\ldots, n\}} and let

\displaystyle  X^* = X - \frac{1}{\sqrt{n}}(X_J - X_J^*).

Then, by the last lemma, {X^*} is zero-bias for {X} and hence

\displaystyle  \mathbb{E}[ X f(X)] = \mathbb{E}[f'(X^*)] \ \ \ \ \ (3)

for all absolutely continuous {f}. Also note that

\displaystyle  |X^* - X| \leq \frac{2B}{\sqrt{n}} \equiv \delta. \ \ \ \ \ (4)

So,

\displaystyle  \begin{array}{rcl}  \mathbb{P}(X \leq z) - \mathbb{P}(Y \leq z) &\leq & \mathbb{P}(X \leq z) - \mathbb{P}(Y \leq z+\delta) +\mathbb{P}(Y \leq z+\delta) -\mathbb{P}(Y \leq z) \leq \mathbb{P}(X \leq z) - \mathbb{P}(Y \leq z+\delta) + \frac{\delta}{\sqrt{2\pi}}\\ &\leq & \mathbb{P}(X^* \leq z+\delta) - \mathbb{P}(Y \leq z+\delta) + \frac{\delta}{\sqrt{2\pi}} \leq \sup_z |\mathbb{P}(X^* \leq z+\delta) - \mathbb{P}(Y \leq z+\delta)| + \frac{\delta}{\sqrt{2\pi}}\\ &=& \sup_z |\mathbb{P}(X^* \leq z) - \mathbb{P}(Y \leq z)| + \frac{\delta}{\sqrt{2\pi}}. \end{array}

By a symmetric argument, we deduce that

\displaystyle  \sup_z |\mathbb{P}(X \leq z) - \mathbb{P}(Y \leq z)| \leq \sup_z |\mathbb{P}(X^* \leq z) - \mathbb{P}(Y \leq z)| + \frac{\delta}{\sqrt{2\pi}}.

Let {f=f_z}. From the properties of the Stein function given earlier we have

\displaystyle  \begin{array}{rcl}  \sup_z |\mathbb{P}(X^* \leq z) - \mathbb{P}(Y \leq z)| &\leq & \sup_{f\in {\cal F}}\Big|\mathbb{E}[ f'(X^*) - X^* f(X^*)]\Bigr|= \sup_{f\in {\cal F}}\Big|\mathbb{E}[ X f(X) - X^* f(X^*)]\Bigr|\\ &\leq & \mathbb{E}\left[ \left( |X| + \frac{2\pi}{4}\right)\ |X^* - X| \right]\leq \delta \left(1 + \frac{2\pi}{4}\right). \end{array}

Combining these inequalities we have

\displaystyle  \sup_z |\mathbb{P}(X \leq z) - \mathbb{P}(Y \leq z)| \leq \delta \left(1 + \frac{1}{\sqrt{2\pi}} + \frac{2\pi}{4}\right)\leq 3\delta =\frac{6 B}{\sqrt{n}}.

\Box

4. Conclusion

This is just the tip of the iceberg. If you want to know more about Stein’s method, see the references below. I hope I have given you a brief hint of what it is all about.

References

Chen, Goldstein and Shao. (2011). Normal Approximation by Stein’s Method. Springer.

Nourdin and Peccati (2012). Normal Approximations With Malliavin Calculus. Cambridge.

Ross, N. (2011). Fundamentals of Stein’s method . Probability Surveys, 210-293. link