Probability Theory 2025-09-30

2. Random Variables

2.1 Random Variables

Definition 2.1 (Real random variable): A real random variable is a measurable map

X : \Omega \to \R

Note: Concretely,

X

is measurable if

\forall \evB \in \borelB(\R) : X^{-1}(\evB) \in \sigmaF

The Borel $\sigma$ -algebra $\borelB(\R)$ is difficult to describe and we often use the following criterion to prove that a mapping is a random variable.

Proposition 2.2 (Criterion for Measurability): Let

\setC \subset \powerset(\R)

be a family of sets such that

\sigma(\setC) = \borelB(\R)

then

X

is a random variable iff

\forall \evC \in \setC : X^{-1}(\evC) \in \sigmaF

In particular, $X$ is a random varible iff for every $a \in \R$ the set $\set{X \leq a}$ is measurable. We will mainly mainly work with real random variables but the notion can be generalized to arbitrary measurable spaces.

Definition 2.3 (General random variables): Let

(\Epsilon, \setE)

be a measurbale space. A random varibale with values in

\Epsilon

is a measurable map

X: \Omega \to \Epsilon

Note: Concretely,

X

is measurable if

\forall \evB \in \setE : X^{-1}(\evB) \in \sigmaF

In particular, we call random vector a random varibale in $\R^n$ equipped with its Borel $\sigma$ -algebra. We may also consider random variables in $[0,\infty]$

Note: Unless stated otherwise, we will write “random variable” for real random variables.

2.2 Law of a Random Variable

Note: As a motivation consider

\Omega = \set{0,1}

\sigmaF = \powerset(\set{0,1})

\P = \frac{\abs{\cdot}}{2}

The random variables

X

and

Y

defined by

X(\omega) = \omega

and

Y(\omega) = 1- \omega

are different but the random numbers they represent have the same property. We would like to say that these two random variables have the same “probabilistic” properties.

Definition 2.4 (Law of a random variable): Let

X

be a random variable. The law of

X

is the probability measure

\mu_X

(\R, \borelB(\R))

defined by

\forall \evB \in \borelB(\R) : \mu_X(\evB) = \probP{X \in \evB}

Note: From the point of view of measure theory,

\mu_X = X \# \P

is the push-forward measure of

\P

X

Example (Different rvs but same law): In the motivation at the begginig of the section we have

X \neq Y

but

\mu_X = \mu_Y = \frac12 \delta_0 + \frac12 \delta_1

We often define random variables through their law, as in the two fundamental laws below.

Definition 2.5 (Bernoulli random variable): Let

p \in [0,1]

then

X

is a Bernoulli random variable or

X \sim \lawBer(p)

iff

\mu_X = (1-p) \delta_0 + p \delta_0

An equivalent definition is

\probP{X = 0} = 1 - p \qand \probP{X = 1} = p

Definition 2.6 (Uniform radom variable): The random variable

X

is uniform or

X \sim \lawU([0,1])

iff

\mu_X = \lambda\on{[0,1]}

where

\lambda\on{[0,1]}

is the Lebesgue measure on

[0,1]

An equivalent definition is

\forall a\leq b \in [0,1] : \probP{X \in [a,b]} = b-a

Note: The law of a random variable can also be defined for a general random variable. If

X

is a general random variable, then the law of

X

is the probability measure

\mu_X

(\Epsilon, \setE)

defined by

\forall \evB \in \setE : \mu_X(\evB) = \P(X\in \evB)

2.3 Cumulative Distribution Function

Definition 2.7 (Cumulative distribution function): Let

X

be a random variable. The cumulative distribution function or cdf of

X

is the map

F_X : \R \to [0,1]

defined by

F_X(a) = \probP{X \leq a}

for every

a \in \R

Example: TODO

Note: For

a < b

one has

\probP{X \in (a,b]} = F_X(b) - F_X(a)

Theorem 2.8 (Properties of the cdf): Let

X

be a random variable with distribution function

F = F_X

Then

$F$ is non-decreasing
$F$ is right-continuous, i.e. $\forall a \in \R: \lim_{h \downarrow 0} F(a+h) = F(a)$
$\lim_{a \to -\infty} F(a) = 0$ and $\liminfty F(a) = 1$

The theorem above admits a reverse statement: given a function $F$ satisfying the properties listed in the theorem, we can always consider a probability space and a random variable $X$ on it satisfying $F_X = F$ The construction of such random variable relies on the generalized inverse of $F$

Definition 2.9 (Generalized inverse): Let

F : \R \to [0,1]

non-decresing, right-continuous, satisfying

\lim_{a \to -\infty} F(a) = 0

and

\liminfty F(a) = 1

The generalized inverse of

F

is the mapping

F^{-1} : (0,1) \to \R

defined by

\forall \alpha \in (0,1) : F^{-1}(\alpha) = \inf\set{x \in \R : F(x) \geq \alpha}

Note: By definition of the infimum and using right continuity of

F

we have for every

x \in \R

and

\alpha \in (0,1)

the relationship

F^{-1}(\alpha) \leq x \iff \alpha \leq F(x)

Relying on this general inverse function, the follwoing theorem provides a way to construct a random variable wiht arbitrary distribution functions.

Theorem 2.10 (Inverse transform sampling): Let

F : \R \to [0,1]

non-decresing, right-continuous, satisfying

\lim_{a \to -\infty} F(a) = 0

and

\liminfty F(a) = 1

Let

U \sim \lawU([0,1])

Then the random variable

X = F^{-1}(U)

has distribution

F_X = F

Note: Formally, there is an issue in the definition of

X

in the aforementioned theorem. Note that we have

U : \Omega \to [0,1]

however

F^{-1} : (0,1) \to \R

with

0,1 \not\in (0,1)

Nevertheless, we have

\probP{U \in (0,1)} = 1

and therefore

X

is well defined on a set of probability

1

We can fix the issue by defining

X(\omega) = \begin{cases} F^{-1}(U(\omega)) & \text{if } U(\omega) \in (0,1) \\ 0 & \text{otherwise} \end{cases}

where the value

0

in the second case plays no role and could be replaced by any real number.

Theorem 2.11 (The cdf charaterizes the law): For random variables

X, Y

we have

F_X = F_Y \iff \mu_X = \mu_Y

2.4 Transformation of Random Variables

Proposition 2.12 (Random vector is a collection or rvs): Let

n \geq 1

X_1, \ldots, X_n

be random variables, then

\rvec X : \Omega \to \R^n

with

\rvec X(\omega) = \pabig{X_1(\omega), \ldots, X_n(\omega)}

is measurable and called a random vector.

Proposition 2.13 (Transformation of random variables): Let

n \geq 1

X_1, \ldots, X_n

be random variables and

\phi : \R^n \to \R

be measurable, then

\phi(X_1, \ldots, X_n) : \omega \mapsto \phi\pabig{X_1(\omega), \ldots, X_n(\omega)}

is a random variable.

The statement above strengthens the idea that random variables can be manipulated like real numbers. Given deterministic elements $x_1, \ldots, x_n \in \Epsilon$ we can construct a new element $y = \phi(x_1, \ldots, x_n)$ via a mapping $\phi : \R^n \to \R$ Analogously, given random variables $X_1, \ldots, X_n$ we can costruct a new random variable $Y = \phi(X_1, \ldots, X_n)$ provided the mapping is measurable.

Recall that continuous map are measurable. In paritucalr we have the following examples.

Example (abs, max and exp of random variable): If

X

is a real random variable, then

\abs{X}

X_{+} = \max(X,0)

X_{-} = \max(-X,0)

and

\exp(X)

are non-negative random variables.

Example (Sum of random variables): If

X_1, \ldots, X_n

are random variables, then

S_n = X_1 + \ldots + X_n

is also a random variable.

Example (Distance between random variables): If

X, Y

are two random variables, then

\abs{X-Y}

is also a random variable.

Some random variables can also be constructed as limits of random variables.

Proposition 2.14 (Limits of random variables): Let

\sequence{X}

be a sequence of random variables in

\Rext = \R \cup \set{-\infty, +\infty}

then

\sup_{n \in \N} X_n

\inf_{n \in \N} X_n

\limsupinfty X_n

and

\liminfinfty X_n

are well defined random variables in

\Rext

2.5 Discrete Random Variables

Definition 2.15: Let

\evS \subset \R

be finite or countable. A random variable

X

is said to be discrete with support

\evS

\probP{X \in \evS} = 1

and

\forall x \in \evS : \probP{X = x} > 0

Definition 2.16 (Probability mass function): Let

X

be a discrete random variables with support

\evS

The probability mass function or pmf of

X

is the function

p_X : \R \to [0,1]

defined by

p_X = \begin{cases} \probP{X = x} & \text{if } x \in \evS \\ 0 & \text{if } x \in \R \setminus \evS \end{cases}

Example (Some discrete random variables): Bernoulli, Binomial, Poisson and Geometric random varaibles are discrete.

2.6 Density

Definition 2.17 (Density): Let

d \geq 1

and let

X

be a random variable. A probability density function or pdf is a Lebesgue-integrable function

f : \R \to [0, \infty)

such that

\forall \evA \in \borelB(\R) : \probP{X \in \evA} = \int_{\evA} f(x) \dd x

Intuitively, $f(x) \dd x$ represents the probability of the infinitesimal interval $f(x) \dd x = \probP{X \in [x, x + \dd x]}$ Note that this is mathematically imprecise and serves only as an intuiton.

Note:

$X$ admits a pdf iff its law $\mu_X$ is absolutely continuous w.r.t. the Lebesgue measure. If thats the case, the pdf $f$ corresponds to the Radon-Nikodym derivative.
Not all random variables admit a pdf. For instance, any discrete random variable does not admit a pdf.
When a random variable admits a pdf, this pdf is unique up to null sets.
A pdf $f$ always satisfies $\int_{-\infty}^{+\infty} f(x) \dd x = 1$
A random variable admitting a pdf is said to be continuous because it implies that $F_X$ is a continuous function.
If $X$ admits a pdf, then it satisfies $\forall x \in \R : \probP{X = x} = 0$ since one can take $\evA = {x}$

Example (Some continuous random variables): Uniform, Exponential and Normal random variables are continuous.

Proposition 2.18 (From pdf to cdf): Let

X

be a continuous random variable with density

f_X

The distribution function can be calculated as the integral

F_X(x) = \int_{-\infty}^x f_X(y) \dd y

Proposition 2.19 (From cdf to pdf): Let

F_X

be piecewise-differentiable, then

X

admits a pdf given by

f_X = F_X'

well defined almost everywhere.

2.7 Generated $\sigma$ -Algebra

One may ask whether less information suffices to determine a random variable $X$ i.e. whether there exists a smaller $\sigma$ -algebra $\setG \subset \sigmaF$ such that $X: \Omega \to \R$ is $\setG / \borelB(\R)$ -measurable.

Definition 2.20 (Measurable under

\setG

): Let

X

be a random variable. Let

\setG \subset \sigmaF

be a

\sigma

-algebra. We say that

X

\setG / \borelB(\R)

-measurable if

\forall \evB \in \borelB(\R) : \set{X \in \evB} \in \setG

Definition 2.21 (

\sigma

-algebra generated by a rv): Let

X

be a random variable. The

\sigma

-algebra generated by

X

is defined by

\sigma(X) = \set{\set{X \in \evA} \mid \evA \in \borelB(\R)}

Note:

The $\sigma$ -algebra $\sigma(X)$ represents exactly the information of the values of $X$
The $\sigma$ -algebra $\sigma(X)$ is such that $X$ is $\sigma(X)$ -measurable and it is the smallest $\sigma$ -algebra with this property. Equivalently, $X$ is $\setG / \borelB(\R)$ -measurable iff $\sigma(X) \subset \setG$

Proposition 2.22 (Open interval definition of

\sigma(X)

): Let

X

be a random variable. We have

\sigma(X) = \sigma\pabig{\set{\set{X \leq a} \mid a \in \R }}

Definition 2.23 (

\sigma

-algebra generated by multiple rvs): Let

\setI

be finite or countable. For every

i \in \setI

let

X_i

be a random variable. The

\sigma

-algebra generated by this collection of random variables

\set{X_i \mid i \in \setI}

is defined as

\sigma(\set{X_i \mid i \in \setI}) = \sigma\paBig{\bigcup_{i \in \setI} \sigma(X_i)}

Theorem 2.24 (Doob-Dynkin Factorization): Let

n \geq 1

let

Y, X_1, \ldots, X_n

be random variables. Then

Y

\sigma(X_1, \ldots, X_n)

-measurable iff

Y = \varphi(X_1, \ldots, X_n)

for some measurable

\varphi: \R^n \to \R

Note: The result generalizes to

Y : \Omega \to \Rext

Y : \Omega \to \R^d

but not to arbitrary image spaces

\Epsilon

2.8 Independence

Definition 2.25 (Independence of rvs): Let

\setI

be finite or countable. For every

i \in \setI

let

X_i

be a random variable. We say that the random variables

\set{X_i \mid i \in \setI}

are independent if the

\sigma

-algebras

\set{\sigma(X_i) \mid i \in \setI}

are independent.

The events in $\sigma(X_i)$ are the events of the form $\set{X_i \in A}$ for $A_i \in \borelB(\R)$ Hence, by definition, $X_1, \ldots, X_n$ are independent iff $\forall A_1, \ldots, A_n \in \borelB(\R): \probP{X_1 \in A_1, \ldots, X_n \in A_n} = \prod_{i=1}^n \probP{X_i \in A_i}$

Proposition 2.26 (Characterization for finitely many rvs): Let

n \in \N

The random variables

X_1, \ldots, X_n

are independent iff

\forall a_1, \ldots, a_n \in \R : \probP{X_1 \leq a_1, \ldots, X_n \leq a_n} = \prod_{i=1}^n \probP{X_i \leq a_i}

Proposition 2.27 (Independence of discrete rvs): Let

n \in \N

The discrete random variables

X_1, \ldots, X_n

with supports included in a finite or countable set

\evS

are independent iff

\forall a_1, \ldots, a_n \in \evS : \probP{X_1 = a_1, \ldots, X_n = a_n} = \prod_{i=1}^n \probP{X_i = a_i}

Note: A sequence of random variables

\sequence{X}

is independent iff

\forall n \in \N

the random variables

X_1, \ldots, X_n

are independent.

Proposition 2.28 (Grouping): Let

\setI

be a finite or countable index set and

\set{X_i \mid i \in \setI}

a collection of independent random variables. For any partition

\setK = \set{\setI_1, \setI_2, \ldots}

\setI

the

\sigma

-algebras

\set{\sigma\pabig{\set{X_i \mid i \in \setI_k}} \mid \setI_k \in \setK}

are independent.

Example (Grouping in even and odd): Let

\sequence{X}

be a sequence of indepedent random variables, then the sequence

\sequence{Y}

defined by

Y_n = X_{2n + 1} + X_{2n + 2}

is independent.

2. Random Variables

2.1 Random Variables

2.2 Law of a Random Variable

2.3 Cumulative Distribution Function

2.4 Transformation of Random Variables

2.5 Discrete Random Variables

2.6 Density

2.7 Generated σ\sigmaσ-Algebra

2.8 Independence

2.7 Generated $\sigma$ -Algebra