Definition 2.1 (Real random variable): A real random variable is a measurable map X:Ω→R.
Note: Concretely, X is measurable if ∀B∈B(R):X−1(B)∈F.
The Borel σ-algebra B(R) is difficult to describe and we often use the following criterion to prove that a mapping is a random variable.
Proposition 2.2 (Criterion for Measurability): Let C⊂P(R) be a family of sets such that σ(C)=B(R), then X is a random variable iff
∀C∈C:X−1(C)∈F
In particular, X is a random varible iff for every a∈R the set {X≤a} is measurable. We will mainly mainly work with real random variables but the notion can be generalized to arbitrary measurable spaces.
Definition 2.3 (General random variables): Let (E,E) be a measurbale space. A random varibale with values in E is a measurable map X:Ω→E.
Note: Concretely, X is measurable if ∀B∈E:X−1(B)∈F.
In particular, we call random vector a random varibale in Rn equipped with its Borel σ-algebra. We may also consider random variables in [0,∞].
Note: Unless stated otherwise, we will write “random variable” for real random variables.
2.2 Law of a Random Variable
Note: As a motivation consider Ω={0,1},F=P({0,1}),P=2∣⋅∣. The random variables X and Y defined by X(ω)=ω and Y(ω)=1−ω are different but the random numbers they represent have the same property. We would like to say that these two random variables have the same “probabilistic” properties.
Definition 2.4 (Law of a random variable): Let X be a random variable. The law of X is the probability measure μX on (R,B(R)) defined by
∀B∈B(R):μX(B)=P(X∈B)
Note: From the point of view of measure theory, μX=X#P is the push-forward measure of P by X.
Example (Different rvs but same law): In the motivation at the begginig of the section we have X=Y but μX=μY=21δ0+21δ1.
We often define random variables through their law, as in the two fundamental laws below.
Definition 2.5 (Bernoulli random variable): Let p∈[0,1], then X is a Bernoulli random variable or X∼Ber(p) iff μX=(1−p)δ0+pδ0. An equivalent definition is
P(X=0)=1−pandP(X=1)=p
Definition 2.6 (Uniform radom variable): The random variable X is uniform or X∼U([0,1]) iff μX=λ∣[0,1] where λ∣[0,1] is the Lebesgue measure on [0,1]. An equivalent definition is
∀a≤b∈[0,1]:P(X∈[a,b])=b−a
Note: The law of a random variable can also be defined for a general random variable. If X is a general random variable, then the law of X is the probability measure μX on (E,E) defined by ∀B∈E:μX(B)=P(X∈B).
2.3 Cumulative Distribution Function
Definition 2.7 (Cumulative distribution function): Let X be a random variable. The cumulative distribution function or cdf of X is the map FX:R→[0,1] defined by
FX(a)=P(X≤a)
for every a∈R.
Example: TODO
Note: For a<b one has P(X∈(a,b])=FX(b)−FX(a).
Theorem 2.8 (Properties of the cdf): Let X be a random variable with distribution function F=FX. Then
F is non-decreasing
F is right-continuous, i.e. ∀a∈R:limh↓0F(a+h)=F(a)
lima→−∞F(a)=0 and limn→∞F(a)=1
The theorem above admits a reverse statement: given a function F satisfying the properties listed in the theorem, we can always consider a probability space and a random variable X on it satisfying FX=F. The construction of such random variable relies on the generalized inverse of F.
Definition 2.9 (Generalized inverse): Let F:R→[0,1] non-decresing, right-continuous, satisfying lima→−∞F(a)=0 and limn→∞F(a)=1. The generalized inverse of F is the mapping F−1:(0,1)→R defined by
∀α∈(0,1):F−1(α)=inf{x∈R:F(x)≥α}
Note: By definition of the infimum and using right continuity of F, we have for every x∈R and α∈(0,1) the relationship F−1(α)≤x⟺α≤F(x).
Relying on this general inverse function, the follwoing theorem provides a way to construct a random variable wiht arbitrary distribution functions.
Theorem 2.10 (Inverse transform sampling): Let F:R→[0,1] non-decresing, right-continuous, satisfying lima→−∞F(a)=0 and limn→∞F(a)=1. Let U∼U([0,1]). Then the random variable X=F−1(U) has distribution FX=F.
Note: Formally, there is an issue in the definition of X in the aforementioned theorem. Note that we have U:Ω→[0,1] however F−1:(0,1)→R with 0,1∈(0,1). Nevertheless, we have P(U∈(0,1))=1 and therefore X is well defined on a set of probability 1. We can fix the issue by defining
X(ω)={F−1(U(ω))0if U(ω)∈(0,1)otherwise
where the value 0 in the second case plays no role and could be replaced by any real number.
Theorem 2.11 (The cdf charaterizes the law): For random variables X,Y we have FX=FY⟺μX=μY.
2.4 Transformation of Random Variables
Proposition 2.12 (Random vector is a collection or rvs): Let n≥1,X1,…,Xn be random variables, then X:Ω→Rn with X(ω)=(X1(ω),…,Xn(ω)) is measurable and called a random vector.
Proposition 2.13 (Transformation of random variables): Let n≥1,X1,…,Xn be random variables and ϕ:Rn→R be measurable, then
ϕ(X1,…,Xn):ω↦ϕ(X1(ω),…,Xn(ω))
is a random variable.
The statement above strengthens the idea that random variables can be manipulated like real numbers. Given deterministic elements x1,…,xn∈E we can construct a new element y=ϕ(x1,…,xn) via a mapping ϕ:Rn→R. Analogously, given random variables X1,…,Xn we can costruct a new random variable Y=ϕ(X1,…,Xn) provided the mapping is measurable.
Recall that continuous map are measurable. In paritucalr we have the following examples.
Example (abs, max and exp of random variable): If X is a real random variable, then ∣X∣,X+=max(X,0),X−=max(−X,0) and exp(X) are non-negative random variables.
Example (Sum of random variables): If X1,…,Xn are random variables, then Sn=X1+…+Xn is also a random variable.
Example (Distance between random variables): If X,Y are two random variables, then ∣X−Y∣ is also a random variable.
Some random variables can also be constructed as limits of random variables.
Proposition 2.14 (Limits of random variables): Let (Xn)n∈N be a sequence of random variables in R=R∪{−∞,+∞}, then supn∈NXn,infn∈NXn,limsupn→∞Xn and liminfn→∞Xn are well defined random variables in R.
2.5 Discrete Random Variables
Definition 2.15: Let S⊂R be finite or countable. A random variable X is said to be discrete with support S if P(X∈S)=1 and ∀x∈S:P(X=x)>0.
Definition 2.16 (Probability mass function): Let X be a discrete random variables with support S. The probability mass function or pmf of X is the function pX:R→[0,1] defined by
pX={P(X=x)0if x∈Sif x∈R∖S
Example (Some discrete random variables): Bernoulli, Binomial, Poisson and Geometric random varaibles are discrete.
2.6 Density
Definition 2.17 (Density): Let d≥1 and let X be a random variable. A probability density function or pdf is a Lebesgue-integrable function f:R→[0,∞) such that
∀A∈B(R):P(X∈A)=∫Af(x)dx
Intuitively, f(x)dx represents the probability of the infinitesimal interval
f(x)dx=P(X∈[x,x+dx])
Note that this is mathematically imprecise and serves only as an intuiton.
Note:
X admits a pdf iff its law μX is absolutely continuous w.r.t. the Lebesgue measure. If thats the case, the pdf f corresponds to the Radon-Nikodym derivative.
Not all random variables admit a pdf. For instance, any discrete random variable does not admit a pdf.
When a random variable admits a pdf, this pdf is unique up to null sets.
A pdf f always satisfies ∫−∞+∞f(x)dx=1.
A random variable admitting a pdf is said to be continuous because it implies that FX is a continuous function.
If X admits a pdf, then it satisfies ∀x∈R:P(X=x)=0 since one can take A=x.
Example (Some continuous random variables): Uniform, Exponential and Normal random variables are continuous.
Proposition 2.18 (From pdf to cdf): Let X be a continuous random variable with density fX. The distribution function can be calculated as the integral
FX(x)=∫−∞xfX(y)dy
Proposition 2.19 (From cdf to pdf): Let FX be piecewise-differentiable, then X admits a pdf given by fX=FX′ well defined almost everywhere.
2.7 Generated σ-Algebra
One may ask whether less information suffices to determine a random variable X, i.e. whether there exists a smaller σ-algebra G⊂F such that X:Ω→R is G/B(R)-measurable.
Definition 2.20 (Measurable under G): Let X be a random variable. Let G⊂F be a σ-algebra. We say that X is G/B(R)-measurable if
∀B∈B(R):{X∈B}∈G
Definition 2.21 (σ-algebra generated by a rv): Let X be a random variable. The σ-algebra generated by X is defined by
σ(X)={{X∈A}∣A∈B(R)}
Note:
The σ-algebra σ(X) represents exactly the information of the values of X.
The σ-algebra σ(X) is such that X is σ(X)-measurable and it is the smallest σ-algebra with this property. Equivalently, X is G/B(R)-measurable iff σ(X)⊂G.
Proposition 2.22 (Open interval definition of σ(X)): Let X be a random variable. We have
σ(X)=σ({{X≤a}∣a∈R})
Definition 2.23 (σ-algebra generated by multiple rvs): Let I be finite or countable. For every i∈I, let Xi be a random variable. The σ-algebra generated by this collection of random variables {Xi∣i∈I} is defined as
σ({Xi∣i∈I})=σ(i∈I⋃σ(Xi))
Theorem 2.24 (Doob-Dynkin Factorization): Let n≥1, let Y,X1,…,Xn be random variables. Then Y is σ(X1,…,Xn)-measurable iff
Y=φ(X1,…,Xn)
for some measurable φ:Rn→R.
Note: The result generalizes to Y:Ω→R or Y:Ω→Rd but not to arbitrary image spaces E.
2.8 Independence
Definition 2.25 (Independence of rvs): Let I be finite or countable. For every i∈I let Xi be a random variable. We say that the random variables {Xi∣i∈I} are independent if the σ-algebras {σ(Xi)∣i∈I} are independent.
The events in σ(Xi) are the events of the form {Xi∈A} for Ai∈B(R). Hence, by definition, X1,…,Xn are independent iff
∀A1,…,An∈B(R):P(X1∈A1,…,Xn∈An)=i=1∏nP(Xi∈Ai)
Proposition 2.26 (Characterization for finitely many rvs): Let n∈N. The random variables X1,…,Xn are independent iff
∀a1,…,an∈R:P(X1≤a1,…,Xn≤an)=i=1∏nP(Xi≤ai)
Proposition 2.27 (Independence of discrete rvs): Let n∈N. The discrete random variables X1,…,Xn with supports included in a finite or countable set S are independent iff
∀a1,…,an∈S:P(X1=a1,…,Xn=an)=i=1∏nP(Xi=ai)
Note: A sequence of random variables (Xn)n∈N is independent iff ∀n∈N the random variables X1,…,Xn are independent.
Proposition 2.28 (Grouping): Let I be a finite or countable index set and {Xi∣i∈I} a collection of independent random variables. For any partition K={I1,I2,…} of I the σ-algebras
{σ({Xi∣i∈Ik})Ik∈K}
are independent.
Example (Grouping in even and odd): Let (Xn)n∈N be a sequence of indepedent random variables, then the sequence (Yn)n∈N defined by Yn=X2n+1+X2n+2 is independent.