2. The Likelihood

For the moment, we assume that the response YY is univariate and that its distribution is parametrised in terms of a vector of unknown parameters θΘRp\vtheta \in \Theta \subset \R^p, i.e. FY:R×ΘRF_Y : \R \times \Theta \to \R. Further, reasonable statistical inference requires multiple observations. We assume a sample of nn iid\iid observations Y1,,YnY_1, \ldots, Y_n with YiiidFY(,θ)Y_i \simiid F_{Y}(\cdotsep, \vthetastar) where θΘ\vthetastar \in \Theta denotes the true parameter.

Note: We denote with Pθ\P_{\vtheta}, Eθ\E*_{\vtheta} and Varθ\Var*_{\vtheta}, that the random variables follow the distribution YiFY(,θ)Y_i \sim F_Y(\cdotsep, \vtheta) for a fixed θ\vtheta.

2.1 Likelihood Function

Definition 2.1 (Likelihood contribution): Given a single observation A\evA of YiY_i, the likelihood contribution li:Θ×B(R)[0,1]l_i : \Theta \times \borelB(\R) \to [0,1] is li(θ,A)=Pθ ⁣(YiA) l_i(\vtheta, \evA) = \probPwrt{\vtheta}{Y_i \in \evA}

The most important special case is an observation A=(y1,y2]\evA = (y_1, y_2] for which we can compute the likelihood contribution as li(θ,A)=FY(y2,θ)FY(y1,θ) l_i(\vtheta, \evA) = F_Y(y_2, \vtheta) - F_Y(y_1, \vtheta)

Example (Wildlife-vehicle collisions): We observe YiY_i wildlife-vehicle collisions counted per year on a specific road segment. We have lost the records of December and observed 1010 wildlife-vehicle collsisions between January and November. The observation is then A=(9,)\evA = (9, \infty) and the likelihood is li(θ,A)=1FY(9,θ)l_i(\vtheta, \evA) = 1 - F_Y(9,\vtheta).
Note: We denote with A1,,An\evA_1, \ldots, \evA_n the respective observations of the random variables Y1,,YnY_1, \ldots, Y_n.
Definition 2.2 (Likelihood function): Given the observations A1,,An\evA_1, \ldots, \evA_n of Y1,,YnY_1, \ldots, Y_n, the likelihood function L:θ×B(R)n[0,1]L : \vtheta \times \borelB(\R)^n \to [0,1] is L(θ,A1,,An)=Pθ ⁣(i=1nYiAi)=i=1nli(θ,Ai) L(\vtheta, \evA_1, \ldots, \evA_n) = \probPwrt{\vtheta}{\bigcap_{i=1}^n Y_i \in \evA_i} = \prod_{i=1}^n l_i(\vtheta, \evA_i)
Note: For notational convenience, we often suppress the dependency of the likelihood when the observations are clear from the context and write l(θ)l(\vtheta) and L(θ)=i=1nli(θ)L(\vtheta) = \prod_{i=1}^n l_i(\vtheta).

2.2 Log-Likelihood Function

Definition 2.3 (Log-likelihood contribution): The log-likelihood contribution i:Θ×B(R)R\ell_i : \Theta \times \borelB(\R) \to \R is the logarithm of the likelihood contribution lil_i, i.e. i(θ,A)=logli(θ,A) \ell_i(\vtheta, \evA) = \log l_i(\vtheta, \evA)
Definition 2.4 (Log-likelihood function): The log-likelihood function :θ×B(R)nR\ell : \vtheta \times \borelB(\R)^n \to \R is the logarithm of the likelihood function, i.e. (θ,A1,,An)=logL(θ,A1,,An)=i=1ni(θ,Ai) \ell(\vtheta, \evA_1, \ldots, \evA_n) = \log L(\vtheta, \evA_1, \ldots, \evA_n) = \sum_{i=1}^n \ell_i(\vtheta, \evA_i)

The log-likelihood is computationally convenient, because it is much easier to optimise sums instead of products.

2.3 Likelihood Approximation

Let YY be an absolutely continuous response for which the probability density function fY(y,θ)=FY(y,θ)θf_Y(y, \vtheta) = \pd{F_Y(y, \vtheta)}{\vtheta} exists. It is almost surely not possible to observe singleton events y{y} as P ⁣(Yy)=0\probP{Y \in {y}} = 0. Instead an outcome always has some deviation ε\epsilon.

Proposition 2.5 (Likelihood approximation): If we observe the event Ai=(yε,y+ε]\evA_i = (y - \epsilon, y + \epsilon] for ε>0\epsilon > 0 neglegibly small we have li(θ,Ai)fY(y,θ) l_i(\vtheta, \evA_i) \propto f_{Y}(y, \vtheta)

Hence it makes sense to evaluate the likelihood contribution for outcomes by the density fY(y,θ)f_{Y}(y, \vtheta). The proportionality relation to li(θ)l_i(\vtheta) doesnt matter when interested in the maximum likelihood. We need to keep in mind however that this is an approximation which is only valid if ε\epsilon is small and that it fails if fY(y,θ)=f_{Y}(y, \vtheta) = \infty is feasible in the interested range.

2.4 Log-Likelihood Density

In the context of the aforementioned approximation and under the assumption that YY has a probability density function fY(y,θ)f_{Y}(y, \vtheta), we redefine the log-likelihood using the density.

Definition 2.6 (Log-likelihood density contribution): The log-likelihood density contribution i:Θ×RR\ell_i: \Theta \times \R \to \R is the approximation of the log-likelihood contribution using the density, i.e. i(θ,y)=logfY(y,θ) \ell_i(\vtheta, y) = \log f_{Y}(y, \vtheta)
Note: As the log-likelihood density contribution approximates the log-likelihood contribution, we use the same symbol i\ell_i. Note also, that as the approximation only holds for A{y}\evA \approx \set{y} we have i:Θ×RR\ell_i: \Theta \times \R \to \R.
Example (Gaussian response): Let YN(μ,σ2)Y \sim \lawN(\mu, \sigma^2) be an absolutely continuous Gaussian response. The probability density function is fY(y,μ,σ)=12πσexp((yμ)22σ2) f_{Y}(y, \mu, \sigma) = \frac{1}{\sqrt{2\pi} \sigma} \exp\pa{-\frac{(y - \mu)^2}{2\sigma^2}} and the log-likelihood contribution for some observation (yε,y+ε](y - \epsilon, y + \epsilon] is i(μ,σ,y)logσ(yμ)22σ2 \ell_i(\mu, \sigma, y) \eqsim -\log{\sigma} - \frac{(y - \mu)^2}{2\sigma^2}
Example (Poisson response): Let YPois(λ)Y \sim \lawPois(\lambda) be a count Poisson response. The probability mass function is pY(k,λ)=exp(λ)λkk! p_{Y}(k, \lambda) = \frac{\exp\pa{-\lambda}\lambda^k}{k!} and the log-likelihood contribution for some observation (kε,k+ε](k - \epsilon, k + \epsilon] is i(λ,k)λ+klogλlogk! \ell_i(\lambda, k) \eqsim -\lambda + k \log \lambda - \log k!
Definition 2.7 (Log-likelihood density function): The log-likelihood density function :Θ×RnR\ell: \Theta \times \R^n \to \R is the approximation of the log-likelihood function using the density, i.e. (θ,y)=i=1ni(θ,yi)=i=1nlogfY(yi,θ) \ell(\vtheta, \vy) = \sum_{i=1}^n \ell_i(\vtheta, y_i) = \sum_{i=1}^n \log f_{Y}(y_i, \vtheta)
Note: For notational convenience, we often supress the dependency on y\vy if it is clear from context and write (θ)\ell(\vtheta).
Example (Gaussian response): The log-likelihood function for the Gaussian response is (μ,σ,y)nlogσi=1n(yiμ)22σ2 \ell(\mu, \sigma, \vy) \eqsim - n \log \sigma - \sum_{i=1}^n \frac{(y_i - \mu)^2}{2\sigma^2}