To keep things simple and the notation as light as possible, we only consider unconditional distributions of univariate responses from independent observations in this chapter.
In this text, we assume an underlying probability space (Ω,F,P) exists and only consider real random varibles, e.g. X:Ω→R.
Note: Unless stated otherwise, we will write “random variable” for real random variables.
μX denotes the push-forward measure P(X−1)=X#P of the random variable X.
Recap (Support of a random variable)
: The support of a random variable
X denoted as
supp(X) is the smallest closed set
S⊆R such that
P(X∈S)=1. Note: For discrete random variables,
supp(X) is countable.
1.1 Distribution Functions
The response Y we are interested in follows a distribution μY and we write Y∼μY. It is important to note that for non-finite sample spaces an observation is conceptually always an event, i.e. a set A∈B(R). We never observe the outcomes y∈R directly. For discrete sample spaces with finite cardinality however, events A={y} might very well be observed.
Recap (Cumulative distribution function)
: The cumulative distribution function or cdf
FY:R→[0,1] is
FY(y)=μY((−∞,y]) Note: The cdf is a monotonically, but not necessarily strictly, increasing function with
∀y1<y2:FY(y1)≤FY(y2). 1.2 Supports and Measurement Scales
The choice of an appropriate support S=supp(Y) very much depends on the measurement scale of the response Y. Most situations can be classified into binary, ordered categorical, unordered categorical, count or absolutely continuous responses Y.
Definition 1.1 (Binary response)
: The support is
S={y1,y2} with
y1∈R and
y2∈R. Note: We understand these two outcomes as being truely categorical and explicitly exclude dichotomisation, e.g. binary variables like “younger than 65 years” vs. “older than 65 years” should be modelled by a sample space appropriate for age as a numeric variable.
Example (Binary response)
: y1=failure and
y2=success. Definition 1.2 (Ordered categorical response)
: The support is
S={y1,…,yK} with
K<∞ and
yi∈R for all
i∈{1,…,K}. Example (Ordered categorical response)
: The happines scores, e.g.
y1=very unhappy, y2=not too happy, y3=somewhat happy and
y4=very happy with
y1<y2<y3<y4. Note: For unordered categorical data
S={s1,…,sK} with
K<∞ one first needs to define an injection
f:S→R that defines the support
S=f(S). The distribution
μY will thus depend on
f. Example (Ordered categorical response)
: The faculties at a university, e.g.
s1=Medicine, s2=Natural Sciences, s3=Philosophy and so on.
Definition 1.3 (Count response)
: The support is
S=N. Example (Count response): The number of wildlife-vehicle collsions counted per year on a specific road segment.
Definition 1.4 (Absolutely continuous response)
: The support
S is a contiguous subset of
R. Example (Absolutely continuous response)
: The age
Y of a person has support
S=(0,∞). The event “the person is 44 years old” is represented by the interval
[44,45). Example (Mixed distribution)
: The amount of precipitation
Y for a meteorological station at one day has support
S=[0,∞). Note that
μY(0)>0 because the probability of no rain or snow at all is larger than zero. This is an example of a distribution with a discrete part at
0 and a continuous part at
(0,∞). 1.3 Other Distribution Characterizations
The cdf FY, the pdf and pmf are only some of the available functions to fully characterize the distribution μY. Here, we list more of such functions.
Definition 1.5 (Survivor function)
: The survivor function
SY:R→[0,1] is
SY(y)=1−FY(y)=μY([y,∞)) Definition 1.6 (Odds function)
: The odds function
OY:R→R+ is
OY(y)=SY(y)FY(y)=μY([y,∞))μY((−∞,y]) Definition 1.7 (Quantile function)
: The quantile function
QY:[0,1]→R is the generalized inverse of the cdf
FY, i.e.
QY(p)=inf{y∈R:FY(y)≥p} Note: We have
QY(p)=FY−1(p) if
FY is invertible.
Definition 1.8 (Hazard function)
: The hazard function
hY:R→R+ is
hY(y)=μY([y,∞))fY(y) Note: If
Y is absolutely continuous we have
hY(y)=SY(y)fY(y) or equivalently
hY(y)=−dydlogSY(y). Definition 1.9 (Cumulative hazard function)
: The cumulative hazard function
HY:R→R+ is
HY(y)=∫−∞yhY(t)dt=∫−∞yμY([t,∞))fY(t)dt Note: If
Y is absolutely continuous we have
HY(y)=−logSY(y).