2. Random Variables

In this and following chapters, we consider a fixed, appropriate choice of probability space (Ω,F,P)(\Omega, \sigmaF, \P).

2.1 Measure-Theoretic Definition

Definition 2.1 (Random variable): A random variable is a F/B(R)\sigmaF/\borelB(\R)-measurable map X:ΩRX : \Omega \to \R.
Note: XX is F/B(R)\sigmaF/\borelB(\R)-measurable if BB(R):X1(B)F\forall \evB \in \borelB(\R) : X^{-1}(\evB) \in \sigmaF.

The Borel σ\sigma-algebra B(R)\borelB(\R) is difficult to describe and we often use the following criterion to prove that a mapping is a random variable.

Proposition 2.2 (Criterion for measurability): Let BP(R)\colB \subset \powerset(\R) be a collection such that σ(B)=B(R)\sigma(\colB) = \borelB(\R), then XX is a random variable if and only if BB:X1(B)F\forall \setB \in \colB : X^{-1}(\setB) \in \sigmaF.
Proof: TODO

In particular, XX is a random variable if and only if for every aRa \in \R the set {ωΩ | X(ω)a}\set{\omega \in \Omega \mid X(\omega) \leq a} is measurable. The notion of random variables can be generalized to arbitrary measurable spaces.

Definition 2.3 (General random variable): Let (E,E)(\setE, \colE) be a measurable space. A random variable in E\setE is a F/E\sigmaF/\colE-measurable map X:ΩEX: \Omega \to \Epsilon.
Note: A random variable is a special case of a general random variable.

We call random vector a random variable in Rn\R^n on the measurable space (Rn,B(Rn))(\R^n, \borelB(\R^n)). We may also consider random variables in R+\R^{+}.

Note: Let XX be a general random variable. We use the following notational conventions:
  • We write XBX \in \setB for X1(B)X^{-1}(\setB)
  • We write XBX \notin \setB for X1(EB)X^{-1}(\setE \setminus \setB)
  • We write X=xX = x for X1({x})X^{-1}(\set{x})
  • We write XxX \neq x for X1({E{x}})X^{-1}(\set{\setE \setminus \set{x}})
Furthermore, if XX is a random variable, i.e. E=R\setE = \R, we use the following notational conventions:
  • We write XaX \leq a for X1((,a])X^{-1}((-\infty, a])
  • We write X<aX < a for X1((,a))X^{-1}((-\infty, a))
  • We write XaX \geq a for X1([a,))X^{-1}([a, \infty))
  • We write X>aX > a for X1((a,))X^{-1}((a, \infty))

2.2 Law and Cdf

Definition 2.4 (Law of a random variable): Let XX be a random variable. The law of XX is the probability measure μX:B(R)[0,1]\mu_X : \borelB(\R) \to [0,1] on (R,B(R))(\R, \borelB(\R)) defined by μX(B)=P ⁣(XB)\mu_X(\evB) = \probP{X \in \evB}.
Note: In measure-theoric terms, μX=X#P\mu_X = X \# \P is the push-forward measure of P\P by XX.
Recap (Dirac measure): The Dirac measure on a measurable space (E,E)(\setE, \colE) concentrated at eEe \in \setE is the probability measure δe:E[0,1]\delta_e : \colE \to [0,1] defined by δe(B)={1if eB0otherwise \delta_e(\setB) = \begin{cases} 1 & \text{if } e \in \setB \\ 0 & \text{otherwise} \end{cases}
Example (Different rvs but same law): Consider the probability space (Ω,F,P)(\Omega, \sigmaF, \P) with Ω={0,1}\Omega = \set{0,1}, F=P(Ω)\sigmaF = \powerset(\Omega) and P ⁣(A)=#A#Ω\probP{\setA} = \frac{\# \setA}{\# \Omega}. The random variables XX and YY defined by X(ω)=ωX(\omega) = \omega and Y(ω)=1ωY(\omega) = 1- \omega are different but follow the same law μX=μY=12δ0+12δ1\mu_X = \mu_Y = \frac12 \delta_0 + \frac12 \delta_1.
Note: We often define random variables through their law. This is done by writing XX \sim \square, where “\sim” means “follows the distribution of” and \square can be one of three possible objects:
  • A probability measure μ\mu, where the measure could either be directly defined such as μ=12δ0+12δ1\mu = \frac12 \delta_0 + \frac12 \delta_1 or predefined via common laws such as μ=Ber(p)\mu = \lawBer(p).
  • Another random variable YY, in which case we define XX to follow the same law as YY, i.e. μX=μY\mu_X = \mu_Y.
  • A distribution function, e.g. the cumulative distribution function F:R[0,1]F : \R \to [0,1]. Distribution functions can be mapped in a one-to-one fashion to a probability measure μ\mu.
Note that, generally, there exists multiple random variables that would follow the same law given by \square, i.e. XX \sim \square defines an equivalence class of random variables. Hence when writing “let XX \sim \square” we mean any random variable in that equivalence class.

Two fundamental laws are presented.

Definition 2.5 (Bernoulli random variable): Let p[0,1]p \in [0,1]. We say XX is a Bernoulli random variable XBer(p)X \sim \lawBer(p) if and only if μX=Ber(p)=(1p)δ0+pδ0 \mu_X = \lawBer(p) = (1-p) \delta_0 + p \delta_0
Note: If XBer(p)X \sim \lawBer(p), then P ⁣(X=0)=1p\probP{X = 0} = 1 - p and P ⁣(X=1)=p\probP{X = 1} = p.
Recap (Lebesgue measure): The Lebesgue measure on the measurable space (R,B(R))(\R, \borelB(\R)) is the unique measure λ:B(R)[0,1]\lambda : \borelB(\R) \to [0,1] such that λ([a,b])=ba\lambda([a,b]) = b-a for all a,bRa,b \in \R.
Note: We denote with λE\lambda\on{\setE} the Lebesgue measure on the measurable space (E,B(E))(\setE, \borelB(\setE)).
Definition 2.6 (Uniform random variable): We say XX is a uniform random variable XU([0,1])X \sim \lawU([0,1]) if and only if μX=U([0,1])=λ[0,1] \mu_X = \lawU([0,1]) = \lambda\on{[0,1]}
Note: If XU([0,1])X \sim \lawU([0,1]), then P ⁣(X[a,b])=ba\probP{X \in [a,b]} = b-a for all a,b[0,1]a,b \in [0,1].

The law can be equivalently defined for general random variables, i.e. if XX is a general random variable, then the law of XX is the probability measure μX:E[0,1]\mu_X : \colE \to [0,1] on (E,E)(\setE, \colE) defined by μX(B)=P(XB)\mu_X(\evB) = \P(X\in \evB).

Definition 2.7 (Cumulative distribution function): Let XX be a random variable. The cumulative distribution function or cdf of XX is the function FX:R[0,1]F_X : \R \to [0,1] defined by FX(a)=μX((,a])F_X(a) = \mu_X((-\infty, a]).
Note: For a<ba < b one has P ⁣(X(a,b])=FX(b)FX(a)\probP{X \in (a,b]} = F_X(b) - F_X(a).
Theorem 2.8 (Characterization of the cdf): F:R[0,1]F : \R \to [0,1] is a cdf of a random variable XX if and only if
  • FF is non-decreasing, i.e. F(a)F(b)F(a) \leq F(b) for all aba \leq b
  • FF is right-continuous, i.e. limh0F(a+h)=F(a)\lim_{h \downarrow 0} F(a+h) = F(a) for all aRa \in \R
  • limaF(a)=0\lim_{a \to -\infty} F(a) = 0 and limaF(a)=1\lim_{a \to \infty} F(a) = 1

Given a function FF satisfying the properties listed in the theorem, we can always consider a probability space and a random variable XX on it satisfying FX=FF_X = F. The construction of such random variable relies on the generalized inverse of FF.

Definition 2.9 (Generalized inverse): Let F:R[0,1]F : \R \to [0,1] be non-decreasing, right-continuous, satisfying limaF(a)=0\lim_{a \to -\infty} F(a) = 0 and limaF(a)=1\lim_{a \to \infty} F(a) = 1. The generalized inverse of FF is the mapping F1:(0,1)RF^{-1} : (0,1) \to \R defined by F1(α)=inf{xR:F(x)α}F^{-1}(\alpha) = \inf\set{x \in \R : F(x) \geq \alpha}.
Note: For every xRx \in \R and α(0,1)\alpha \in (0,1) we have F1(α)xF^{-1}(\alpha) \leq x if and only if αF(x)\alpha \leq F(x).
Theorem 2.10 (Inverse transform sampling): Let F:R[0,1]F : \R \to [0,1] be non-decreasing, right-continuous, satisfying limaF(a)=0\lim_{a \to -\infty} F(a) = 0 and limaF(a)=1\lim_{a \to \infty} F(a) = 1. Let UU([0,1])U \sim \lawU([0,1]). Then the random variable X=F1(U)X = F^{-1}(U) has distribution FX=FF_X = F.
Note: Formally, there is an issue in the definition of XX in the aforementioned theorem. Note that U:Ω[0,1]U : \Omega \to [0,1] however F1:(0,1)RF^{-1} : (0,1) \to \R with 0,1∉(0,1)0,1 \not\in (0,1). Nevertheless, we have P ⁣(U(0,1))=1\probP{U \in (0,1)} = 1 and we can fix the issue by defining X(ω)={F1(U(ω))if U(ω)(0,1)cotherwise X(\omega) = \begin{cases} F^{-1}(U(\omega)) & \text{if } U(\omega) \in (0,1) \\ c & \text{otherwise} \end{cases} where the value cRc \in \R can be chosen arbitrarily.
Theorem 2.11 (Cdf charaterizes the law): For two random variables XX and YY we have FX=FYF_X = F_Y if and only if μX=μY\mu_X = \mu_Y.
Note: Equivalently, XYX \sim Y if and only if FX=FYF_X = F_Y.

2.3 Transformations

Proposition 2.12 (Random vector as a collection of rvs): Let X1,,XnX_1, \ldots, X_n be random variables. The function X:ΩRn\rvec X : \Omega \to \R^n with X(ω)=[X1(ω),,Xn(ω)]\rvec X(\omega) = [X_1(\omega), \ldots, X_n(\omega)]^{\top} is F/B(Rn)\sigmaF/\borelB(\R^n)-measurable.
Note: Recall that X\rvec X is called a random vector.
Proposition 2.13 (Transformation of random variables): Let X1,,XnX_1, \ldots, X_n be random variables and ϕ:RnR\phi : \R^n \to \R be measurable, then Y(ω)=ϕ(X(ω))Y(\omega) = \phi(\rvec X (\omega)) is a random variable.

Recall that if a function is continuous, then it is measurable.

Proposition 2.14 (Limits of rvs): Let (Xn)nN\sequence{X} be a sequence of general random variables in R\Rext. The following are also general random variables in R\Rext:
  • supnNXn\sup_{n \in \N} X_n
  • infnNXn\inf_{n \in \N} X_n
  • lim supnXn\limsup_{n \to \infty} X_n
  • lim infnXn\liminf_{n \to \infty} X_n

2.4 Pmf and Pdf

Definition 2.15 (Discrete random variable): Let SR\setS \subset \R be finite or countable. A random variable XX is said to be discrete with support S\evS if μX(S)=1\mu_X(\setS) = 1 and xS:μX(x)>0\forall x \in \evS : \mu_X(x) > 0.
Definition 2.16 (Probability mass function): Let XX be a discrete random variables with support S\evS. The probability mass function or pmf of XX is the function pX:S[0,1]p_X : \setS \to [0,1] defined by pX(x)=μX(x)p_X(x) = \mu_X(x).
Note: Bernoulli random variables are discrete with support S={0,1}\setS = \set{0,1} and pmf pX(1)=pp_X(1) = p and pX(0)=1pp_X(0) = 1-p.
Recap (Absolute continuity): Let μ\mu and ν\nu be two measures on (R,B(R))(\R, \borelB(\R)). μ\mu is said to be absolutely continuous w.r.t. ν\nu if for every BB(R)\setB \in \borelB(\R) with ν(B)=0\nu(\setB) = 0 it follows μ(B)=0\mu(\setB) = 0.
Note: If μ\mu is absolutely continuous w.r.t. ν\nu we write μν\mu \ll \nu.
Definition 2.17 (Absolutely continuous random variable): Let XX be a random variable. XX is said to be absolutely continuous if μXλ\mu_X \ll \lambda.
Note:
  • If XX is absolutely continuous, then μX(x)=0\mu_X(x) = 0 for all xRx \in \R.
  • Uniform random variables are continuous.
Definition 2.18 (Probability density function): Let XX be an absolutely continuous random variable. The probability density function or pdf of XX is the function fX:R[0,]f_X : \R \to [0,\infty] defined by BB(R):μX(B)=BfX(x)dx \forall \setB \in \borelB(\R) : \mu_X(\setB) = \int_{\setB} f_X(x) \dd x
Note:
  • The pdf is unique up to null sets of the Lebesgue measure.
  • A pdf fXf_X satisfies +fX(x)dx=1\int_{-\infty}^{+\infty} f_X(x) \dd x = 1.
  • We have FX(x)=xfX(ξ)dξF_X(x) = \int_{-\infty}^x f_X(\xi) \dd \xi and ddxFX(x)=fX(x)\dv{}{x} F_X(x) = f_X(x).

2.5 RV-Generated σ\sigma-Algebra

One may ask whether less information suffices to determine a random variable XX, i.e. whether there exists a smaller σ\sigma-algebra GF\colG \subset \sigmaF such that X:ΩRX: \Omega \to \R is G/B(R)\colG / \borelB(\R)-measurable.

Definition 2.19 (Rv-generated σ\sigma-algebra): Let XX be a random variable. The σ\sigma-algebra generated by XX is σ(X)={XB | BB(R)}\sigma(X) = \set{X \in \setB \mid \setB \in \borelB(\R)}.
Note:
  • σ(X)\sigma(X) is a sub-σ\sigma-algebra of F\sigmaF.
  • σ(X)\sigma(X) is the smallest σ\sigma-algebra such that XX is σ(X)/B(R)\sigma(X)/\borelB(\R)-measurable.
  • An alternative definition is σ(X)=σ ⁣({Xa | aR})\sigma(X) = \sigma\of{\set{X \leq a \mid a \in \R }}.
Definition 2.20 (Rvs-generated σ\sigma-algebra): Let (Xi)iI\sequence*{X}{i}{\setI} be a finite or countable sequence of random variables. The σ\sigma-algebra generated by (Xi)iI\sequence*{X}{i}{\setI} is σ ⁣((Xi)iI)=σ ⁣(iIσ(Xi))\sigma\of{\sequence*{X}{i}{\setI}} = \sigma\of{\bigcup_{i \in \setI} \sigma(X_i)}.
Note: If the sequence is finite, i.e. X1,,XnX_1, \ldots, X_n, we write σ ⁣(X1,,Xn)\sigma\of{X_1, \ldots, X_n}.
Theorem 2.21 (Doob-Dynkin Factorization): Let YY and X1,,XnX_1, \ldots, X_n be random variables. Then YY is σ ⁣(X1,,Xn)\sigma\of{X_1, \ldots, X_n}-measurable if and only if Y=φ(X1,,Xn)Y = \varphi(X_1, \ldots, X_n) for some measurable φ:RnR\varphi: \R^n \to \R.
Note: The result generalizes to general random variables in R\Rext or Rd\R^d but not to arbitrary spaces E\setE.

2.6 Independence

Definition 2.22 (Independence of rvs): Let (Xi)iI\sequence*{X}{i}{\setI} be a finite or countable sequence of random variables. We say that the random variables (Xi)iI\sequence*{X}{i}{\setI} are independent if the family of σ\sigma-algebras (σ(Xi))iI(\sigma(X_i))_{i \in \setI} is independent.
Note:
  • The events in σ(Xi)\sigma(X_i) are of the form XiBiX_i \in \setB_i for BiB(R)\setB_i \in \borelB(\R). Hence, X1,,XnX_1, \ldots, X_n are independent if and only if for each choice of B1,,BnB(R)\setB_1, \ldots, \setB_n \in \borelB(\R) we have P ⁣(X1B1,,XnBn)=i=1nP ⁣(XiBi) \probP{X_1 \in \setB_1, \ldots, X_n \in \setB_n} = \prod_{i=1}^n \probP{X_i \in \setB_i}
  • Equivalently, X1,,XnX_1, \ldots, X_n are independent if and only if for each choice of a1,,anRa_1, \ldots, a_n \in \R we have P ⁣(X1a1,,Xnan)=i=1nP ⁣(Xiai) \probP{X_1 \leq a_1, \ldots, X_n \leq a_n} = \prod_{i=1}^n \probP{X_i \leq a_i}
  • If X1,,XnX_1, \ldots, X_n are discrete random variables with supports S1,,SnR\setS_1, \ldots, \setS_n \subseteq \R, then X1,,XnX_1, \ldots, X_n are independent if and only if for each choice of aiSia_i \in \setS_i we have P ⁣(X1=a1,,Xn=an)=i=1nP ⁣(Xiai) \probP{X_1 = a_1, \ldots, X_n = a_n} = \prod_{i=1}^n \probP{X_i \leq a_i}
Proposition 2.23 (Grouping): Let (Xi)iI\sequence*{X}{i}{\setI} be a finite or countable sequence of independent random variables. For any partition K={I1,I2,}\setK = \set{\setI_1, \setI_2, \ldots} of I\setI the familiy of σ\sigma-algebras (σ ⁣((Xi)iJ))JK(\sigma\of{(X_i)_{i \in \setJ}})_{\setJ \in \setK} is independent.
Example (Grouping pairs): Let (Xn)nN\sequence{X} be a sequence of indepedent random variables, then the sequence (Yn)nN\sequence{Y} defined by Yn=X2n+1+X2n+2Y_n = X_{2n + 1} + X_{2n + 2} is independent.