Likelihood and Regression 2025-09-24

1. Likelihood Function

To keep things simple and the notation as light as possible, we only consider unconditional distributions of univariate responses from independent observations in this chapter.

Note: We assume an underlying probability space

(\Omega, \sigmaF, \P)

exists and write

X \in \Omega_X

for

X : \Omega \to \Omega_X

and

\mu_{Y}

for the push-forward measure

\probP{Y^{-1}} = Y \# \P

1.1 Probabilities and Distribution Functions

Note: The response

Y \in \Omega_Y

we are interested in follows a distribution

\mu_{Y} : \sigmaF_Y \to [0,1]

and we write

Y \sim \mu_{Y}

understanding that the probability space

(\Omega_Y, \sigmaF_Y, \mu_{Y})

exists and that

Y

is a

\sigmaF/\sigmaF_Y

-measurable function.

Note: It is important to note that for non-finite sample spaces an observation is conceptually always an event, i.e. a set

\evA \in \sigmaF_Y

We never observe outcomes, i.e. elements

\omega \in \Omega_Y

directly. For discrete sample spaces with finite cardinality however, events

A = \set{\omega}

might very well be observed.

Definition 1.1 (Cumulative distribution function): Let

\Omega_Y

have an ordering. The cumulative distribution function

F_Y : \Omega_Y \to [0,1]

is defined as

F_Y(y) = \mu_Y(\set{x \in \Omega_Y \mid x \leq y})

Note: The cdf is a monotonically, but not necessarily strictly, increasing function with

\forall y_1 < y_2 : F_Y(y_1) \leq F_Y(y_2)

1.2 Sample Spaces and Measurement Scales

The choice of an appropriate sample space $\Omega_Y$ very much depends on the measurement scale of the response $Y$ Most situations can be classified into binary, ordered or unordered categorical, count and bounded or unbounded absolutely continuous responses $Y$

1.2.1 Binary Response

Definition 1.2 (Binary Response): The sample space is

\Omega_Y = \set{y_1, y_2}

i.e. the response can only take two outcomes

y_1

y_2

Note: We understand these two outcomes as being truely categorical and explicitly exclude dichotomisation. E.g. binary variables like “younger than 65 years” vs. “older than 65 years” should be modelled by a sample space appropriate for age as a numeric variable.