1. Likelihood Function

To keep things simple and the notation as light as possible, we only consider unconditional distributions of univariate responses from independent observations in this chapter.

Note: We assume an underlying probability space (Ω,F,P)(\Omega, \sigmaF, \P) exists and write XΩXX \in \Omega_X for X:ΩΩXX : \Omega \to \Omega_X and μY\mu_{Y} for the push-forward measure P ⁣(Y1)=Y#P\probP{Y^{-1}} = Y \# \P.

1.1 Probabilities and Distribution Functions

Note: The response YΩYY \in \Omega_Y we are interested in follows a distribution μY:FY[0,1]\mu_{Y} : \sigmaF_Y \to [0,1] and we write YμYY \sim \mu_{Y} understanding that the probability space (ΩY,FY,μY)(\Omega_Y, \sigmaF_Y, \mu_{Y}) exists and that YY is a F/FY\sigmaF/\sigmaF_Y-measurable function.
Note: It is important to note that for non-finite sample spaces an observation is conceptually always an event, i.e. a set AFY\evA \in \sigmaF_Y. We never observe outcomes, i.e. elements ωΩY\omega \in \Omega_Y, directly. For discrete sample spaces with finite cardinality however, events A={ω}A = \set{\omega} might very well be observed.
Definition 1.1 (Cumulative distribution function): Let ΩY\Omega_Y have an ordering. The cumulative distribution function FY:ΩY[0,1]F_Y : \Omega_Y \to [0,1] is defined as FY(y)=μY({xΩY | xy}) F_Y(y) = \mu_Y(\set{x \in \Omega_Y \mid x \leq y})
Note: The cdf is a monotonically, but not necessarily strictly, increasing function with y1<y2:FY(y1)FY(y2)\forall y_1 < y_2 : F_Y(y_1) \leq F_Y(y_2).

1.2 Sample Spaces and Measurement Scales

The choice of an appropriate sample space ΩY\Omega_Y very much depends on the measurement scale of the response YY. Most situations can be classified into binary, ordered or unordered categorical, count and bounded or unbounded absolutely continuous responses YY.

1.2.1 Binary Response

Definition 1.2 (Binary Response): The sample space is ΩY={y1,y2}\Omega_Y = \set{y_1, y_2}, i.e. the response can only take two outcomes y1y_1 or y2y_2.
Note: We understand these two outcomes as being truely categorical and explicitly exclude dichotomisation. E.g. binary variables like “younger than 65 years” vs. “older than 65 years” should be modelled by a sample space appropriate for age as a numeric variable.