In this and following chapters, we consider a fixed, appropriate choice of probability space (Ω,F,P).
2.1 Measure-Theoretic Definition
Definition 2.1 (Random variable)
: A random variable is a
F/B(R)-measurable map
X:Ω→R. Note: X is
F/B(R)-measurable if
∀B∈B(R):X−1(B)∈F. The Borel σ-algebra B(R) is difficult to describe and we often use the following criterion to prove that a mapping is a random variable.
Proposition 2.2 (Criterion for measurability)
: Let
B⊆P(R) be a collection such that
σ(B)=B(R), then
X is a random variable if and only if
∀B∈B:X−1(B)∈F. Proof: TODO
In particular, X is a random variable if and only if for every a∈R the set {ω∈Ω ∣ X(ω)≤a} is measurable. The notion of random variables can be generalized to arbitrary measurable spaces.
Definition 2.3 (General random variable)
: Let
(E,E) be a measurable space. A random variable in
E is a
F/E-measurable map
X:Ω→E. Note: A random variable is a special case of a general random variable.
We call random vector a random variable in Rn on the measurable space (Rn,B(Rn)). We may also consider random variables in R+.
Note: Let
X be a general random variable. We use the following notational conventions:
- We write X∈B for X−1(B)
- We write X∈/B for X−1(E∖B)
- We write X=x for X−1({x})
- We write X=x for X−1({E∖{x}})
Furthermore, if
X is a random variable, i.e.
E=R, we use the following notational conventions:
- We write X≤a for X−1((−∞,a])
- We write X<a for X−1((−∞,a))
- We write X≥a for X−1([a,∞))
- We write X>a for X−1((a,∞))
2.2 Law and Cdf
Definition 2.4 (Law of a random variable)
: Let
X be a random variable. The law of
X is the probability measure
μX:B(R)→[0,1] on
(R,B(R)) defined by
μX(B)=P(X∈B). Note: In measure-theoric terms,
μX=X#P is the push-forward measure of
P by
X. Recap (Dirac measure)
: The Dirac measure on a measurable space
(E,E) concentrated at
e∈E is the probability measure
δe:E→[0,1] defined by
δe(B)={10if e∈Botherwise Example (Different rvs but same law)
: Consider the probability space
(Ω,F,P) with
Ω={0,1}, F=P(Ω) and
P(A)=#Ω#A. The random variables
X and
Y defined by
X(ω)=ω and
Y(ω)=1−ω are different but follow the same law
μX=μY=21δ0+21δ1. Note: We often define random variables through their law. This is done by writing
X∼□, where “
∼” means “follows the distribution of” and
□ can be one of three possible objects:
- A probability measure μ, where the measure could either be directly defined such as μ=21δ0+21δ1 or predefined via common laws such as μ=Ber(p).
- Another random variable Y, in which case we define X to follow the same law as Y, i.e. μX=μY.
- A distribution function, e.g. the cumulative distribution function F:R→[0,1]. Distribution functions can be mapped in a one-to-one fashion to a probability measure μ.
Note that, generally, there exists multiple random variables that would follow the same law given by
□, i.e.
X∼□ defines an equivalence class of random variables. Hence when writing “let
X∼□” we mean any random variable in that equivalence class.
Two fundamental laws are presented.
Definition 2.5 (Bernoulli random variable)
: Let
p∈[0,1]. We say
X is a Bernoulli random variable
X∼Ber(p) if and only if
μX=Ber(p)=(1−p)δ0+pδ0 Note: If
X∼Ber(p), then
P(X=0)=1−p and
P(X=1)=p. Recap (Lebesgue measure)
: The Lebesgue measure on the measurable space
(R,B(R)) is the unique measure
λ:B(R)→[0,1] such that
λ([a,b])=b−a for all
a,b∈R. Note: We denote with
λ∣E the Lebesgue measure on the measurable space
(E,B(E)). Definition 2.6 (Uniform random variable)
: We say
X is a uniform random variable
X∼U([0,1]) if and only if
μX=U([0,1])=λ∣[0,1] Note: If
X∼U([0,1]), then
P(X∈[a,b])=b−a for all
a,b∈[0,1]. The law can be equivalently defined for general random variables, i.e. if X is a general random variable, then the law of X is the probability measure μX:E→[0,1] on (E,E) defined by μX(B)=P(X∈B).
Definition 2.7 (Cumulative distribution function)
: Let
X be a random variable. The cumulative distribution function or cdf of
X is the function
FX:R→[0,1] defined by
FX(a)=μX((−∞,a]). Note: For
a<b one has
P(X∈(a,b])=FX(b)−FX(a). Theorem 2.8 (Characterization of the cdf)
: F:R→[0,1] is a cdf of a random variable
X if and only if
- F is non-decreasing, i.e. F(a)≤F(b) for all a≤b
- F is right-continuous, i.e. limh↓0F(a+h)=F(a) for all a∈R
- lima→−∞F(a)=0 and lima→∞F(a)=1
Given a function F satisfying the properties listed in the theorem, we can always consider a probability space and a random variable X on it satisfying FX=F. The construction of such random variable relies on the generalized inverse of F.
Definition 2.9 (Generalized inverse)
: Let
F:R→[0,1] be non-decreasing, right-continuous, satisfying
lima→−∞F(a)=0 and
lima→∞F(a)=1. The generalized inverse of
F is the mapping
F−1:(0,1)→R defined by
F−1(α)=inf{x∈R:F(x)≥α}. Note: For every
x∈R and
α∈(0,1) we have
F−1(α)≤x if and only if
α≤F(x). Theorem 2.10 (Inverse transform sampling)
: Let
F:R→[0,1] be non-decreasing, right-continuous, satisfying
lima→−∞F(a)=0 and
lima→∞F(a)=1. Let
U∼U([0,1]). Then the random variable
X=F−1(U) has distribution
FX=F. Note: Formally, there is an issue in the definition of
X in the aforementioned theorem. Note that
U:Ω→[0,1] however
F−1:(0,1)→R with
0,1∈(0,1). Nevertheless, we have
P(U∈(0,1))=1 and we can fix the issue by defining
X(ω)={F−1(U(ω))cif U(ω)∈(0,1)otherwise
where the value
c∈R can be chosen arbitrarily.
Theorem 2.11 (Cdf charaterizes the law)
: For two random variables
X and
Y we have
FX=FY if and only if
μX=μY. Note: Equivalently,
X∼Y if and only if
FX=FY. Proposition 2.12 (Random vector as a collection of rvs)
: Let
X1,…,Xn be random variables. The function
X:Ω→Rn with
X(ω)=[X1(ω),…,Xn(ω)]⊤ is
F/B(Rn)-measurable.
Note: Recall that
X is called a random vector.
Proposition 2.13 (Transformation of random variables)
: Let
X1,…,Xn be random variables and
ϕ:Rn→R be measurable, then
Y(ω)=ϕ(X(ω)) is a random variable.
Recall that if a function is continuous, then it is measurable.
Proposition 2.14 (Limits of rvs)
: Let
(Xn)n∈N be a sequence of general random variables in
R. The following are also general random variables in
R:- supn∈NXn
- infn∈NXn
- limsupn→∞Xn
- liminfn→∞Xn
2.4 Pmf and Pdf
Definition 2.15 (Discrete random variable)
: Let
S⊆R be finite or countable. A random variable
X is said to be discrete with support
S if
μX(S)=1 and
∀x∈S:μX(x)>0. Definition 2.16 (Probability mass function)
: Let
X be a discrete random variables with support
S. The probability mass function or pmf of
X is the function
pX:S→[0,1] defined by
pX(x)=μX(x). Note: Bernoulli random variables are discrete with support
S={0,1} and pmf
pX(1)=p and
pX(0)=1−p. Recap (Absolute continuity)
: Let
μ and
ν be two measures on
(R,B(R)). μ is said to be absolutely continuous w.r.t.
ν if for every
B∈B(R) with
ν(B)=0 it follows
μ(B)=0. Note: If
μ is absolutely continuous w.r.t.
ν we write
μ≪ν. Definition 2.17 (Absolutely continuous random variable)
: Let
X be a random variable.
X is said to be absolutely continuous if
μX≪λ. Note:- If X is absolutely continuous, then μX(x)=0 for all x∈R.
- Uniform random variables are continuous.
Definition 2.18 (Probability density function)
: Let
X be an absolutely continuous random variable. The probability density function or pdf of
X is the function
fX:R→[0,∞] defined by
∀B∈B(R):μX(B)=∫BfX(x)dx Note:- The pdf is unique up to null sets of the Lebesgue measure.
- A pdf fX satisfies ∫−∞+∞fX(x)dx=1.
- We have FX(x)=∫−∞xfX(ξ)dξ and dxdFX(x)=fX(x).
2.5 RV-Generated σ-Algebra
One may ask whether less information suffices to determine a random variable X, i.e. whether there exists a smaller σ-algebra G⊆F such that X:Ω→R is G/B(R)-measurable.
Definition 2.19 (Rv-generated
σ-algebra)
: Let
X be a random variable. The
σ-algebra generated by
X is
σ(X)={X∈B ∣ B∈B(R)}. Note:- σ(X) is a sub-σ-algebra of F.
- σ(X) is the smallest σ-algebra such that X is σ(X)/B(R)-measurable.
- An alternative definition is σ(X)=σ({X≤a ∣ a∈R}).
Definition 2.20 (Rvs-generated
σ-algebra)
: Let
(Xi)i∈I be a finite or countable sequence of random variables. The
σ-algebra generated by
(Xi)i∈I is
σ((Xi)i∈I)=σ(⋃i∈Iσ(Xi)). Note: If the sequence is finite, i.e.
X1,…,Xn, we write
σ(X1,…,Xn). Theorem 2.21 (Doob-Dynkin Factorization)
: Let
Y and
X1,…,Xn be random variables. Then
Y is
σ(X1,…,Xn)-measurable if and only if
Y=φ(X1,…,Xn) for some measurable
φ:Rn→R. Note: The result generalizes to general random variables in
R or
Rd but not to arbitrary spaces
E. 2.6 Independence
Definition 2.22 (Independence of rvs)
: Let
(Xi)i∈I be a finite or countable sequence of random variables. We say that the random variables
(Xi)i∈I are independent if the family of
σ-algebras
(σ(Xi))i∈I is independent.
Note:- The events in σ(Xi) are of the form Xi∈Bi for Bi∈B(R). Hence, X1,…,Xn are independent if and only if for each choice of B1,…,Bn∈B(R) we have
P(X1∈B1,…,Xn∈Bn)=i=1∏nP(Xi∈Bi)
- Equivalently, X1,…,Xn are independent if and only if for each choice of a1,…,an∈R we have
P(X1≤a1,…,Xn≤an)=i=1∏nP(Xi≤ai)
- If X1,…,Xn are discrete random variables with supports S1,…,Sn⊆R, then X1,…,Xn are independent if and only if for each choice of ai∈Si we have
P(X1=a1,…,Xn=an)=i=1∏nP(Xi≤ai)
Proposition 2.23 (Grouping)
: Let
(Xi)i∈I be a finite or countable sequence of independent random variables. For any partition
K={I1,I2,…} of
I the familiy of
σ-algebras
(σ((Xi)i∈J))J∈K is independent.
Example (Grouping pairs)
: Let
(Xn)n∈N be a sequence of indepedent random variables, then the sequence
(Yn)n∈N defined by
Yn=X2n+1+X2n+2 is independent.