Recall the notion of convergence for real-valued sequences.
Recap (Convergent sequence): A sequence (xn)n∈N converges to ℓ iff
∀ε∃N∈N∀n∈N:∣xn−ℓ∣≤ε
and we write limn→∞xn=ℓ.
How does the notion of convergence extend to random variables? Let (Xn)n∈N be a sequence of random variables. Intuitively, Xn is a random point in R and we want to express that “Xn is close to X”, where X is possibly random as well. Formally, Xn:Ω→R is a function and there are several ways to define convergence towards another function X:Ω→R. We list some possible definitions:
Convergence in probability: ∀ε>0:limn→∞P(∣Xn−X∣>ε)=0
Convergence in Lp:limn→∞E[∣Xn−X∣p]=0
We note that 1. and 2. are not suitable for probability, because random variables are only defined almost surely. In this chapter, we study 3. and 4., while 5. will be the topic of a later chapter. In general, the study of convergnece of random variables is related to functional analysis. To give sense to Xn→X, we choose a functional space S where Xn and X “live” and equip this space with a topology.
4.2 Almost Sure Convergence
Let (Xn)n∈N and X be random variables. If we fix ω∈Ω, then (Xn(ω))n∈N is simply a sequence of real numbers and we know how to give sense to limn→∞Xn(ω)=X(ω). This equation means that the sequence converges in R and its limit is X(ω). The existence and the value of the limit generally depends on the underlying ω. Now, we may consider the set of all ω for which this holds
{n→∞limXn=X}={ω∈Ωn→∞limXn(ω)=X(ω)}
When this event occurs almost-surely, we say that the sequence converges almost surely.
Definition 4.1 (Almost-sure convergence): Let (Xn)n∈N and X be random variables. We say that Xn converges to X almost surely if
P({n→∞limXn=X})=1
Note:
We also write limn→∞Xn=a.s.X or Xn→a.s.X.
Note that limn→∞Xn=a.s.X iff limn→∞∣Xn−X∣=a.s.0.
Example (Deterministic sequence): Let (an)n∈N and ℓ be real numbers and consider a sequence (Xn)n∈N of deterministic random variables satisfying Xn=a.s.an. Then limn→∞Xn=a.s.ℓ iff limn→∞an=ℓ.
Example (Dyadic approximation): Let X be a non-negative real random variable. We define Xn=min{n,2−n⌊2nX⌋} for every n∈N. Then we have limn→∞Xn=a.s.X.
Example (Functional point of view): Let Ω=[0,1] and P=λ. Define X(ω)=ω and Xn(ω)=ω⋅1{∣ω−21∣≥2n1}. For all ω∈[0,1]∖{21} we have limn→∞Xn(ω)=X(ω). Since P([0,1]∖{21})=1, it follows that Xn→a.s.X. In other word, “we do not care what happens when ω=21, because it does not happen”.
Proposition 4.2 (Criterion for a.s. convergence): Let (Xn)n∈N and X be random variables. If
∀ε>0:n∈N∑P({∣Xn−X∣≥ε})<∞
then Xn→a.s.X.
Example (Minimum of uniform rvs): Let (Un)n∈N∼iidU([0,1]). For every n∈N define Xn=min{Ui∣i≤n}. For ε>1 we have P({∣Xn∣≥ε})=0 and for ε∈[1,0) we have P({∣Xn∣≥ε})=P({U1≥ε})n=(1−ε)n. Since ∑n∈N(1−ε)n<∞, the criterion applies and Xn→a.s.0.
4.3 Convergence in Probability
Definition 4.3 (Convergence in probability): Let (Xn)n∈N and X be random variables. We say that Xn converges to X in probability if
∀ε>0:n→∞limP({∣Xn−X∣≥ε})=0
Note: We also write limn→∞Xn=PX or Xn→PX.
Proposition 4.4 (A.s. implies in probability): If Xn→a.s.X, then Xn→PX.
We now give two examples illustrating that convergence in probability does not generally imply almost-sure convergence. The first example considers a sequence of independent Bernoulli random variables with decreasing success probabilities.
Example (Bernoulli with decreasing success): Let Xn∼Ber(n1) independent. Trivially, ∀ε>1:P({∣Xn∣>ε})=0 and for ∀ε∈[1,0) we have
P({∣Xn∣≥ε})=P({Xn=1})=n1n→∞0
Hence Xn→P0. However ∑n∈NP({Xn=1})=∞ and since the events {Xn=1} are, by Borel-Cantelli II we have
⟹⟹n→∞limsupXn=a.s.1Xn(ω) almost-surely does not convergeXn→a.s.0
The second example also involves a sequence of Bernoulli random variables, but adopts a functional viewpoint by constructing a sequence of indicator functions on an explicit probability space Ω=[0,1].
Example (Typesetter): Let Ω=[0,1],F=B([0,1]) and P=λ. Define
For all ε>0, we have
P({∣Xn∣≥ε})=2⌊log2(n)⌋1<2log2(n)−11=n2n→∞0
but for any x∈[0,1] we have {Xn=x} for infinitely many n, thus Xn(ω) almost-surely does not converge. Hence Xn→P0 but Xn→a.s.0.
Note: Let X be a random variable, then we introduce the notation X∧m=max{X,m} for the random variable that caps X to m∈R.
Theorem 4.5 (Characterisation of convergence in probability): Let (Xn)n∈N and X be random variables. Then
Xn→PX⟺n→∞limE[∣Xn−X∣∧1]=0
Note:∣Xn−X∣∧1 is being used as the expectation always exists. Indeed, ∣Xn−X∣∧1 has values in [0,1], hence E[∣Xn−X∣∧1] is well defined and lies in [0,1].
4.4 Converging Subsequence
Proposition 4.6 (Converging subsequence): Let (Xn)n∈N and X be random variables. If Xn→PX, then there exists a subsequence (Xn(k))k∈N that converges almost-surely to X.
Example (Bernoulli with decreasing success): Let Xn∼Ber(n1) independent. Then the subsequence Xk2 converges to 0 almost-surely.
Example (Typesetter): For the typesetter example, the subsequence X2k converges to 0 almost-surely.
In summary, almost-sure convergence always implies convergence in probability, while convergence in probability only implies almost-sure convergence along a subsequence or if the convergence is “strong enough”.