A corollary of the ergodic theorem

What's new 2018-08-02

Let {(X,T,\mu)} be a measure-preserving system – a probability space {(X,\mu)} equipped with a measure-preserving translation {T} (which for simplicity of discussion we shall assume to be invertible). We will informally think of two points {x,y} in this space as being “close” if {y = T^n x} for some {n} that is not too large; this allows one to distinguish between “local” structure at a point {x} (in which one only looks at nearby points {T^n x} for moderately large {n}) and “global” structure (in which one looks at the entire space {X}). The local/global distinction is also known as the time-averaged/space-averaged distinction in ergodic theory.

A measure-preserving system is said to be ergodic if all the invariant sets are either zero measure or full measure. An equivalent form of this statement is that any measurable function {f: X \rightarrow {\bf R}} which is locally essentially constant in the sense that {f(Tx) = f(x)} for {\mu}-almost every {x}, is necessarily globally essentially constant in the sense that there is a constant {c} such that {f(x) = c} for {\mu}-almost every {x}. A basic consequence of ergodicity is the mean ergodic theorem: if {f \in L^2(X,\mu)}, then the averages {x \mapsto \frac{1}{N} \sum_{n=1}^N f(T^n x)} converge in {L^2} norm to the mean {\int_X f\ d\mu}. (The mean ergodic theorem also applies to other {L^p} spaces with {1 < p < \infty}, though it is usually proven first in the Hilbert space {L^2}.) Informally: in ergodic systems, time averages are asymptotically equal to space averages. Specialising to the case of indicator functions, this implies in particular that {\frac{1}{N} \sum_{n=1}^N \mu( E \cap T^n E)} converges to {\mu(E)^2} for any measurable set {E}.

In this short note I would like to use the mean ergodic theorem to show that ergodic systems also have the property that “somewhat locally constant” functions are necessarily “somewhat globally constant”; this is not a deep observation, and probably already in the literature, but I found it a cute statement that I had not previously seen. More precisely:

Corollary 1 Let {(X,T,\mu)} be an ergodic measure-preserving system, and let {f: X \rightarrow {\bf R}} be measurable. Suppose that

\displaystyle  \limsup_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \mu( \{ x \in X: f(T^n x) = f(x) \} ) \geq \delta \ \ \ \ \ (1)

for some {0 \leq \delta \leq 1}. Then there exists a constant {c} such that {f(x)=c} for {x} in a set of measure at least {\delta}.

Informally: if {f} is locally constant on pairs {x, T^n x} at least {\delta} of the time, then {f} is globally constant at least {\delta} of the time. Of course the claim fails if the ergodicity hypothesis is dropped, as one can simply take {f} to be an invariant function that is not essentially constant, such as the indicator function of an invariant set of intermediate measure. This corollary can be viewed as a manifestation of the general principle that ergodic systems have the same “global” (or “space-averaged”) behaviour as “local” (or “time-averaged”) behaviour, in contrast to non-ergodic systems in which local properties do not automatically transfer over to their global counterparts.

Proof: By composing {f} with (say) the tangent function, we may assume without loss of generality that {f} is bounded. Let {k>0}, and partition {X} as {\bigcup_{m \in {\bf Z}} E_{m,k}}, where {E_{m,k}} is the level set

\displaystyle  E_{m,k} := \{ x \in X: m 2^{-k} \leq f(x) < (m+1) 2^{-k} \}.

For each {k}, only finitely many of the {E_{m,k}} are non-empty. By (1), one has

\displaystyle  \limsup_{N \rightarrow \infty} \sum_m \frac{1}{N} \sum_{n=1}^N \mu( E_{m,k} \cap T^n E_{m,k} ) \geq \delta.

Using the ergodic theorem, we conclude that

\displaystyle  \sum_m \mu( E_{m,k} )^2 \geq \delta.

On the other hand, {\sum_m \mu(E_{m,k}) = 1}. Thus there exists {m_k} such that {\mu(E_{m_k,k}) \geq \delta}, thus

\displaystyle  \mu( \{ x \in X: m_k 2^{-k} \leq f(x) < (m_k+1) 2^{-k} \} ) \geq \delta.

By the Bolzano-Weierstrass theorem, we may pass to a subsequence where {m_k 2^{-k}} converges to a limit {c}, then we have

\displaystyle  \mu( \{ x \in X: c-2^{-k} \leq f(x) \leq c+2^{-k} \}) \geq \delta

for infinitely many {k}, and hence

\displaystyle  \mu( \{ x \in X: f(x) = c \}) \geq \delta.

The claim follows. \Box