Likelihood and Regression 2025-10-01

2. The Likelihood

For the moment, we assume that the response $Y$ is univariate and that its distribution is parametrised in terms of a vector of unknown parameters $\vtheta \in \Theta \subset \R^p$ i.e. $F_Y : \R \times \Theta \to \R$ Further, reasonable statistical inference requires multiple observations. We assume a sample of $n$ $\iid$ observations $Y_1, \ldots, Y_n$ with $Y_i \simiid F_{Y}(\cdotsep, \vthetastar)$ where $\vthetastar \in \Theta$ denotes the true parameter.

Note: We denote with

\P_{\vtheta}

\E*_{\vtheta}

and

\Var*_{\vtheta}

that the random variables follow the distribution

Y_i \sim F_Y(\cdotsep, \vtheta)

for a fixed

\vtheta

2.1 Likelihood Function

Definition 2.1 (Likelihood contribution): Given a single observation

\evA

Y_i

the likelihood contribution

l_i : \Theta \times \borelB(\R) \to [0,1]

l_i(\vtheta, \evA) = \probPwrt{\vtheta}{Y_i \in \evA}

The most important special case is an observation $\evA = (y_1, y_2]$ for which we can compute the likelihood contribution as $l_i(\vtheta, \evA) = F_Y(y_2, \vtheta) - F_Y(y_1, \vtheta)$

Example (Wildlife-vehicle collisions): We observe

Y_i

wildlife-vehicle collisions counted per year on a specific road segment. We have lost the records of December and observed

10

wildlife-vehicle collsisions between January and November. The observation is then

\evA = (9, \infty)

and the likelihood is

l_i(\vtheta, \evA) = 1 - F_Y(9,\vtheta)

Note: We denote with

\evA_1, \ldots, \evA_n

the respective observations of the random variables

Y_1, \ldots, Y_n

Definition 2.2 (Likelihood function): Given the observations

\evA_1, \ldots, \evA_n

Y_1, \ldots, Y_n

the likelihood function

L : \vtheta \times \borelB(\R)^n \to [0,1]

L(\vtheta, \evA_1, \ldots, \evA_n) = \probPwrt{\vtheta}{\bigcap_{i=1}^n Y_i \in \evA_i} = \prod_{i=1}^n l_i(\vtheta, \evA_i)

Note: For notational convenience, we often suppress the dependency of the likelihood when the observations are clear from the context and write

l(\vtheta)

and

L(\vtheta) = \prod_{i=1}^n l_i(\vtheta)

2.2 Log-Likelihood Function

Definition 2.3 (Log-likelihood contribution): The log-likelihood contribution

\ell_i : \Theta \times \borelB(\R) \to \R

is the logarithm of the likelihood contribution

l_i

i.e.

\ell_i(\vtheta, \evA) = \log l_i(\vtheta, \evA)

Definition 2.4 (Log-likelihood function): The log-likelihood function

\ell : \vtheta \times \borelB(\R)^n \to \R

is the logarithm of the likelihood function, i.e.

\ell(\vtheta, \evA_1, \ldots, \evA_n) = \log L(\vtheta, \evA_1, \ldots, \evA_n) = \sum_{i=1}^n \ell_i(\vtheta, \evA_i)

The log-likelihood is computationally convenient, because it is much easier to optimise sums instead of products.

2.3 Likelihood Approximation

Let $Y$ be an absolutely continuous response for which the probability density function $f_Y(y, \vtheta) = \pd{F_Y(y, \vtheta)}{\vtheta}$ exists. It is almost surely not possible to observe singleton events ${y}$ as $\probP{Y \in {y}} = 0$ Instead an outcome always has some deviation $\epsilon$

Proposition 2.5 (Likelihood approximation): If we observe the event

\evA_i = (y - \epsilon, y + \epsilon]

for

\epsilon > 0

neglegibly small we have

l_i(\vtheta, \evA_i) \propto f_{Y}(y, \vtheta)

Hence it makes sense to evaluate the likelihood contribution for outcomes by the density $f_{Y}(y, \vtheta)$ The proportionality relation to $l_i(\vtheta)$ doesnt matter when interested in the maximum likelihood. We need to keep in mind however that this is an approximation which is only valid if $\epsilon$ is small and that it fails if $f_{Y}(y, \vtheta) = \infty$ is feasible in the interested range.

2.4 Log-Likelihood Density

In the context of the aforementioned approximation and under the assumption that $Y$ has a probability density function $f_{Y}(y, \vtheta)$ we redefine the log-likelihood using the density.

Definition 2.6 (Log-likelihood density contribution): The log-likelihood density contribution

\ell_i: \Theta \times \R \to \R

is the approximation of the log-likelihood contribution using the density, i.e.

\ell_i(\vtheta, y) = \log f_{Y}(y, \vtheta)

Note: As the log-likelihood density contribution approximates the log-likelihood contribution, we use the same symbol

\ell_i

Note also, that as the approximation only holds for

\evA \approx \set{y}

we have

\ell_i: \Theta \times \R \to \R

Example (Gaussian response): Let

Y \sim \lawN(\mu, \sigma^2)

be an absolutely continuous Gaussian response. The probability density function is

f_{Y}(y, \mu, \sigma) = \frac{1}{\sqrt{2\pi} \sigma} \exp\pa{-\frac{(y - \mu)^2}{2\sigma^2}}

and the log-likelihood contribution for some observation

(y - \epsilon, y + \epsilon]

\ell_i(\mu, \sigma, y) \eqsim -\log{\sigma} - \frac{(y - \mu)^2}{2\sigma^2}

Example (Poisson response): Let

Y \sim \lawPois(\lambda)

be a count Poisson response. The probability mass function is

p_{Y}(k, \lambda) = \frac{\exp\pa{-\lambda}\lambda^k}{k!}

and the log-likelihood contribution for some observation

(k - \epsilon, k + \epsilon]

\ell_i(\lambda, k) \eqsim -\lambda + k \log \lambda - \log k!

Definition 2.7 (Log-likelihood density function): The log-likelihood density function

\ell: \Theta \times \R^n \to \R

is the approximation of the log-likelihood function using the density, i.e.

\ell(\vtheta, \vy) = \sum_{i=1}^n \ell_i(\vtheta, y_i) = \sum_{i=1}^n \log f_{Y}(y_i, \vtheta)

Note: For notational convenience, we often supress the dependency on

\vy

if it is clear from context and write

\ell(\vtheta)

Example (Gaussian response): The log-likelihood function for the Gaussian response is

\ell(\mu, \sigma, \vy) \eqsim - n \log \sigma - \sum_{i=1}^n \frac{(y_i - \mu)^2}{2\sigma^2}