2. Random Variables

2.1 Random Variables

Definition 2.1 (Real random variable): A real random variable is a measurable map X:ΩRX : \Omega \to \R.
Note: Concretely, XX is measurable if BB(R):X1(B)F\forall \evB \in \borelB(\R) : X^{-1}(\evB) \in \sigmaF.

The Borel σ\sigma-algebra B(R)\borelB(\R) is difficult to describe and we often use the following criterion to prove that a mapping is a random variable.

Proposition 2.2 (Criterion for Measurability): Let CP(R)\setC \subset \powerset(\R) be a family of sets such that σ(C)=B(R)\sigma(\setC) = \borelB(\R), then XX is a random variable iff CC:X1(C)F \forall \evC \in \setC : X^{-1}(\evC) \in \sigmaF

In particular, XX is a random varible iff for every aRa \in \R the set {Xa}\set{X \leq a} is measurable. We will mainly mainly work with real random variables but the notion can be generalized to arbitrary measurable spaces.

Definition 2.3 (General random variables): Let (E,E)(\Epsilon, \setE) be a measurbale space. A random varibale with values in E\Epsilon is a measurable map X:ΩEX: \Omega \to \Epsilon.
Note: Concretely, XX is measurable if BE:X1(B)F\forall \evB \in \setE : X^{-1}(\evB) \in \sigmaF.

In particular, we call random vector a random varibale in Rn\R^n equipped with its Borel σ\sigma-algebra. We may also consider random variables in [0,][0,\infty].

Note: Unless stated otherwise, we will write “random variable” for real random variables.

2.2 Law of a Random Variable

Note: As a motivation consider Ω={0,1}\Omega = \set{0,1}, F=P({0,1})\sigmaF = \powerset(\set{0,1}), P=2\P = \frac{\abs{\cdot}}{2}. The random variables XX and YY defined by X(ω)=ωX(\omega) = \omega and Y(ω)=1ωY(\omega) = 1- \omega are different but the random numbers they represent have the same property. We would like to say that these two random variables have the same “probabilistic” properties.
Definition 2.4 (Law of a random variable): Let XX be a random variable. The law of XX is the probability measure μX\mu_X on (R,B(R))(\R, \borelB(\R)) defined by BB(R):μX(B)=P ⁣(XB) \forall \evB \in \borelB(\R) : \mu_X(\evB) = \probP{X \in \evB}
Note: From the point of view of measure theory, μX=X#P\mu_X = X \# \P is the push-forward measure of P\P by XX.
Example (Different rvs but same law): In the motivation at the begginig of the section we have XYX \neq Y but μX=μY=12δ0+12δ1\mu_X = \mu_Y = \frac12 \delta_0 + \frac12 \delta_1.

We often define random variables through their law, as in the two fundamental laws below.

Definition 2.5 (Bernoulli random variable): Let p[0,1]p \in [0,1], then XX is a Bernoulli random variable or XBer(p)X \sim \lawBer(p) iff μX=(1p)δ0+pδ0\mu_X = (1-p) \delta_0 + p \delta_0. An equivalent definition is P ⁣(X=0)=1pandP ⁣(X=1)=p \probP{X = 0} = 1 - p \qand \probP{X = 1} = p
Definition 2.6 (Uniform radom variable): The random variable XX is uniform or XU([0,1])X \sim \lawU([0,1]) iff μX=λ[0,1]\mu_X = \lambda\on{[0,1]} where λ[0,1]\lambda\on{[0,1]} is the Lebesgue measure on [0,1][0,1]. An equivalent definition is ab[0,1]:P ⁣(X[a,b])=ba \forall a\leq b \in [0,1] : \probP{X \in [a,b]} = b-a
Note: The law of a random variable can also be defined for a general random variable. If XX is a general random variable, then the law of XX is the probability measure μX\mu_X on (E,E)(\Epsilon, \setE) defined by BE:μX(B)=P(XB)\forall \evB \in \setE : \mu_X(\evB) = \P(X\in \evB).

2.3 Cumulative Distribution Function

Definition 2.7 (Cumulative distribution function): Let XX be a random variable. The cumulative distribution function or cdf of XX is the map FX:R[0,1]F_X : \R \to [0,1] defined by FX(a)=P ⁣(Xa) F_X(a) = \probP{X \leq a} for every aRa \in \R.
Example: TODO
Note: For a<ba < b one has P ⁣(X(a,b])=FX(b)FX(a)\probP{X \in (a,b]} = F_X(b) - F_X(a).
Theorem 2.8 (Properties of the cdf): Let XX be a random variable with distribution function F=FXF = F_X. Then
  • FF is non-decreasing
  • FF is right-continuous, i.e. aR:limh0F(a+h)=F(a)\forall a \in \R: \lim_{h \downarrow 0} F(a+h) = F(a)
  • limaF(a)=0\lim_{a \to -\infty} F(a) = 0 and limnF(a)=1\liminfty F(a) = 1

The theorem above admits a reverse statement: given a function FF satisfying the properties listed in the theorem, we can always consider a probability space and a random variable XX on it satisfying FX=FF_X = F. The construction of such random variable relies on the generalized inverse of FF.

Definition 2.9 (Generalized inverse): Let F:R[0,1]F : \R \to [0,1] non-decresing, right-continuous, satisfying limaF(a)=0\lim_{a \to -\infty} F(a) = 0 and limnF(a)=1\liminfty F(a) = 1. The generalized inverse of FF is the mapping F1:(0,1)RF^{-1} : (0,1) \to \R defined by α(0,1):F1(α)=inf{xR:F(x)α} \forall \alpha \in (0,1) : F^{-1}(\alpha) = \inf\set{x \in \R : F(x) \geq \alpha}
Note: By definition of the infimum and using right continuity of FF, we have for every xRx \in \R and α(0,1)\alpha \in (0,1) the relationship F1(α)x    αF(x)F^{-1}(\alpha) \leq x \iff \alpha \leq F(x).

Relying on this general inverse function, the follwoing theorem provides a way to construct a random variable wiht arbitrary distribution functions.

Theorem 2.10 (Inverse transform sampling): Let F:R[0,1]F : \R \to [0,1] non-decresing, right-continuous, satisfying limaF(a)=0\lim_{a \to -\infty} F(a) = 0 and limnF(a)=1\liminfty F(a) = 1. Let UU([0,1])U \sim \lawU([0,1]). Then the random variable X=F1(U)X = F^{-1}(U) has distribution FX=FF_X = F.
Note: Formally, there is an issue in the definition of XX in the aforementioned theorem. Note that we have U:Ω[0,1]U : \Omega \to [0,1] however F1:(0,1)RF^{-1} : (0,1) \to \R with 0,1∉(0,1)0,1 \not\in (0,1). Nevertheless, we have P ⁣(U(0,1))=1\probP{U \in (0,1)} = 1 and therefore XX is well defined on a set of probability 11. We can fix the issue by defining X(ω)={F1(U(ω))if U(ω)(0,1)0otherwise X(\omega) = \begin{cases} F^{-1}(U(\omega)) & \text{if } U(\omega) \in (0,1) \\ 0 & \text{otherwise} \end{cases} where the value 00 in the second case plays no role and could be replaced by any real number.
Theorem 2.11 (The cdf charaterizes the law): For random variables X,YX, Y we have FX=FY    μX=μYF_X = F_Y \iff \mu_X = \mu_Y.

2.4 Transformation of Random Variables

Proposition 2.12 (Random vector is a collection or rvs): Let n1n \geq 1, X1,,XnX_1, \ldots, X_n be random variables, then X:ΩRn\rvec X : \Omega \to \R^n with X(ω)=(X1(ω),,Xn(ω))\rvec X(\omega) = \pabig{X_1(\omega), \ldots, X_n(\omega)} is measurable and called a random vector.
Proposition 2.13 (Transformation of random variables): Let n1n \geq 1, X1,,XnX_1, \ldots, X_n be random variables and ϕ:RnR\phi : \R^n \to \R be measurable, then ϕ(X1,,Xn):ωϕ(X1(ω),,Xn(ω)) \phi(X_1, \ldots, X_n) : \omega \mapsto \phi\pabig{X_1(\omega), \ldots, X_n(\omega)} is a random variable.

The statement above strengthens the idea that random variables can be manipulated like real numbers. Given deterministic elements x1,,xnEx_1, \ldots, x_n \in \Epsilon we can construct a new element y=ϕ(x1,,xn)y = \phi(x_1, \ldots, x_n) via a mapping ϕ:RnR\phi : \R^n \to \R. Analogously, given random variables X1,,XnX_1, \ldots, X_n we can costruct a new random variable Y=ϕ(X1,,Xn)Y = \phi(X_1, \ldots, X_n) provided the mapping is measurable.

Recall that continuous map are measurable. In paritucalr we have the following examples.

Example (abs, max and exp of random variable): If XX is a real random variable, then X\abs{X}, X+=max(X,0)X_{+} = \max(X,0), X=max(X,0)X_{-} = \max(-X,0) and exp(X)\exp(X) are non-negative random variables.
Example (Sum of random variables): If X1,,XnX_1, \ldots, X_n are random variables, then Sn=X1++XnS_n = X_1 + \ldots + X_n is also a random variable.
Example (Distance between random variables): If X,YX, Y are two random variables, then XY\abs{X-Y} is also a random variable.

Some random variables can also be constructed as limits of random variables.

Proposition 2.14 (Limits of random variables): Let (Xn)nN\sequence{X} be a sequence of random variables in R=R{,+}\Rext = \R \cup \set{-\infty, +\infty}, then supnNXn\sup_{n \in \N} X_n, infnNXn\inf_{n \in \N} X_n, lim supnXn\limsupinfty X_n and lim infnXn\liminfinfty X_n are well defined random variables in R\Rext.

2.5 Discrete Random Variables

Definition 2.15: Let SR\evS \subset \R be finite or countable. A random variable XX is said to be discrete with support S\evS if P ⁣(XS)=1\probP{X \in \evS} = 1 and xS:P ⁣(X=x)>0\forall x \in \evS : \probP{X = x} > 0.
Definition 2.16 (Probability mass function): Let XX be a discrete random variables with support S\evS. The probability mass function or pmf of XX is the function pX:R[0,1]p_X : \R \to [0,1] defined by pX={P ⁣(X=x)if xS0if xRS p_X = \begin{cases} \probP{X = x} & \text{if } x \in \evS \\ 0 & \text{if } x \in \R \setminus \evS \end{cases}
Example (Some discrete random variables): Bernoulli, Binomial, Poisson and Geometric random varaibles are discrete.

2.6 Density

Definition 2.17 (Density): Let d1d \geq 1 and let XX be a random variable. A probability density function or pdf is a Lebesgue-integrable function f:R[0,)f : \R \to [0, \infty) such that AB(R):P ⁣(XA)=Af(x)dx \forall \evA \in \borelB(\R) : \probP{X \in \evA} = \int_{\evA} f(x) \dd x

Intuitively, f(x)dxf(x) \dd x represents the probability of the infinitesimal interval f(x)dx=P ⁣(X[x,x+dx]) f(x) \dd x = \probP{X \in [x, x + \dd x]} Note that this is mathematically imprecise and serves only as an intuiton.

Note:
  • XX admits a pdf iff its law μX\mu_X is absolutely continuous w.r.t. the Lebesgue measure. If thats the case, the pdf ff corresponds to the Radon-Nikodym derivative.
  • Not all random variables admit a pdf. For instance, any discrete random variable does not admit a pdf.
  • When a random variable admits a pdf, this pdf is unique up to null sets.
  • A pdf ff always satisfies +f(x)dx=1\int_{-\infty}^{+\infty} f(x) \dd x = 1.
  • A random variable admitting a pdf is said to be continuous because it implies that FXF_X is a continuous function.
  • If XX admits a pdf, then it satisfies xR:P ⁣(X=x)=0\forall x \in \R : \probP{X = x} = 0 since one can take A=x\evA = {x}.
Example (Some continuous random variables): Uniform, Exponential and Normal random variables are continuous.
Proposition 2.18 (From pdf to cdf): Let XX be a continuous random variable with density fXf_X. The distribution function can be calculated as the integral FX(x)=xfX(y)dy F_X(x) = \int_{-\infty}^x f_X(y) \dd y
Proposition 2.19 (From cdf to pdf): Let FXF_X be piecewise-differentiable, then XX admits a pdf given by fX=FXf_X = F_X' well defined almost everywhere.

2.7 Generated σ\sigma-Algebra

One may ask whether less information suffices to determine a random variable XX, i.e. whether there exists a smaller σ\sigma-algebra GF\setG \subset \sigmaF such that X:ΩRX: \Omega \to \R is G/B(R)\setG / \borelB(\R)-measurable.

Definition 2.20 (Measurable under G\setG): Let XX be a random variable. Let GF\setG \subset \sigmaF be a σ\sigma-algebra. We say that XX is G/B(R)\setG / \borelB(\R)-measurable if BB(R):{XB}G \forall \evB \in \borelB(\R) : \set{X \in \evB} \in \setG
Definition 2.21 (σ\sigma-algebra generated by a rv): Let XX be a random variable. The σ\sigma-algebra generated by XX is defined by σ(X)={{XA} | AB(R)} \sigma(X) = \set{\set{X \in \evA} \mid \evA \in \borelB(\R)}
Note:
  • The σ\sigma-algebra σ(X)\sigma(X) represents exactly the information of the values of XX.
  • The σ\sigma-algebra σ(X)\sigma(X) is such that XX is σ(X)\sigma(X)-measurable and it is the smallest σ\sigma-algebra with this property. Equivalently, XX is G/B(R)\setG / \borelB(\R)-measurable iff σ(X)G\sigma(X) \subset \setG.
Proposition 2.22 (Open interval definition of σ(X)\sigma(X)): Let XX be a random variable. We have σ(X)=σ({{Xa} | aR}) \sigma(X) = \sigma\pabig{\set{\set{X \leq a} \mid a \in \R }}
Definition 2.23 (σ\sigma-algebra generated by multiple rvs): Let I\setI be finite or countable. For every iIi \in \setI, let XiX_i be a random variable. The σ\sigma-algebra generated by this collection of random variables {Xi | iI}\set{X_i \mid i \in \setI} is defined as σ({Xi | iI})=σ(iIσ(Xi)) \sigma(\set{X_i \mid i \in \setI}) = \sigma\paBig{\bigcup_{i \in \setI} \sigma(X_i)}
Theorem 2.24 (Doob-Dynkin Factorization): Let n1n \geq 1, let Y,X1,,XnY, X_1, \ldots, X_n be random variables. Then YY is σ(X1,,Xn)\sigma(X_1, \ldots, X_n)-measurable iff Y=φ(X1,,Xn) Y = \varphi(X_1, \ldots, X_n) for some measurable φ:RnR\varphi: \R^n \to \R.
Note: The result generalizes to Y:ΩRY : \Omega \to \Rext or Y:ΩRdY : \Omega \to \R^d but not to arbitrary image spaces E\Epsilon.

2.8 Independence

Definition 2.25 (Independence of rvs): Let I\setI be finite or countable. For every iIi \in \setI let XiX_i be a random variable. We say that the random variables {Xi | iI}\set{X_i \mid i \in \setI} are independent if the σ\sigma-algebras {σ(Xi) | iI}\set{\sigma(X_i) \mid i \in \setI} are independent.

The events in σ(Xi)\sigma(X_i) are the events of the form {XiA}\set{X_i \in A} for AiB(R)A_i \in \borelB(\R). Hence, by definition, X1,,XnX_1, \ldots, X_n are independent iff A1,,AnB(R):P ⁣(X1A1,,XnAn)=i=1nP ⁣(XiAi) \forall A_1, \ldots, A_n \in \borelB(\R): \probP{X_1 \in A_1, \ldots, X_n \in A_n} = \prod_{i=1}^n \probP{X_i \in A_i}

Proposition 2.26 (Characterization for finitely many rvs): Let nNn \in \N. The random variables X1,,XnX_1, \ldots, X_n are independent iff a1,,anR:P ⁣(X1a1,,Xnan)=i=1nP ⁣(Xiai) \forall a_1, \ldots, a_n \in \R : \probP{X_1 \leq a_1, \ldots, X_n \leq a_n} = \prod_{i=1}^n \probP{X_i \leq a_i}
Proposition 2.27 (Independence of discrete rvs): Let nNn \in \N. The discrete random variables X1,,XnX_1, \ldots, X_n with supports included in a finite or countable set S\evS are independent iff a1,,anS:P ⁣(X1=a1,,Xn=an)=i=1nP ⁣(Xi=ai) \forall a_1, \ldots, a_n \in \evS : \probP{X_1 = a_1, \ldots, X_n = a_n} = \prod_{i=1}^n \probP{X_i = a_i}
Note: A sequence of random variables (Xn)nN\sequence{X} is independent iff nN\forall n \in \N the random variables X1,,XnX_1, \ldots, X_n are independent.
Proposition 2.28 (Grouping): Let I\setI be a finite or countable index set and {Xi | iI}\set{X_i \mid i \in \setI} a collection of independent random variables. For any partition K={I1,I2,}\setK = \set{\setI_1, \setI_2, \ldots} of I\setI the σ\sigma-algebras {σ({Xi | iIk}) | IkK} \set{\sigma\pabig{\set{X_i \mid i \in \setI_k}} \mid \setI_k \in \setK} are independent.
Example (Grouping in even and odd): Let (Xn)nN\sequence{X} be a sequence of indepedent random variables, then the sequence (Yn)nN\sequence{Y} defined by Yn=X2n+1+X2n+2Y_n = X_{2n + 1} + X_{2n + 2} is independent.