For the moment, we assume that the response Y is univariate and that its distribution is parametrised in terms of a vector of unknown parameters θ∈Θ⊆Rp, i.e. FY:R×Θ→R. Further, reasonable statistical inference requires multiple observations. We assume a sample of niid observations Y1,…,Yn with Yi∼iidFY(⋅,θ⋆) where θ⋆∈Θ denotes the true parameter.
Note: We denote with Pθ,Eθ and Varθ, that the random variables follow the distribution Yi∼FY(⋅,θ) for a fixed θ.
2.1 Likelihood Function
Definition 2.1 (Likelihood contribution): Given a single observation A of Yi, the likelihood contribution li:Θ×B(R)→[0,1] is
li(θ,A)=Pθ(Yi∈A)
The most important special case is an observation A=(y1,y2] for which we can compute the likelihood contribution as
li(θ,A)=FY(y2,θ)−FY(y1,θ)
Example (Wildlife-vehicle collisions): We observe Yi wildlife-vehicle collisions counted per year on a specific road segment. We have lost the records of December and observed 10 wildlife-vehicle collsisions between January and November. The observation is then A=(9,∞) and the likelihood is li(θ,A)=1−FY(9,θ).
Note: We denote with A1,…,An the respective observations of the random variables Y1,…,Yn.
Definition 2.2 (Likelihood function): Given the observations A1,…,An of Y1,…,Yn, the likelihood function L:θ×B(R)n→[0,1] is
L(θ,A1,…,An)=Pθ(i=1⋂nYi∈Ai)=i=1∏nli(θ,Ai)
Note: For notational convenience, we often suppress the dependency of the likelihood when the observations are clear from the context and write l(θ) and L(θ)=∏i=1nli(θ).
2.2 Log-Likelihood Function
Definition 2.3 (Log-likelihood contribution): The log-likelihood contribution ℓi:Θ×B(R)→R is the logarithm of the likelihood contribution li, i.e.
ℓi(θ,A)=logli(θ,A)
Definition 2.4 (Log-likelihood function): The log-likelihood function ℓ:θ×B(R)n→R is the logarithm of the likelihood function, i.e.
ℓ(θ,A1,…,An)=logL(θ,A1,…,An)=i=1∑nℓi(θ,Ai)
The log-likelihood is computationally convenient, because it is much easier to optimise sums instead of products.
2.3 Likelihood Approximation
Let Y be an absolutely continuous response for which the probability density function fY(y,θ)=∂θ∂FY(y,θ) exists. It is almost surely not possible to observe singleton events y as P(Y∈y)=0. Instead an outcome always has some deviation ε.
Proposition 2.5 (Likelihood approximation): If we observe the event Ai=(y−ε,y+ε] for ε>0 neglegibly small we have
li(θ,Ai)∝fY(y,θ)
Hence it makes sense to evaluate the likelihood contribution for outcomes by the density fY(y,θ). The proportionality relation to li(θ) doesnt matter when interested in the maximum likelihood. We need to keep in mind however that this is an approximation which is only valid if ε is small and that it fails if fY(y,θ)=∞ is feasible in the interested range.
2.4 Log-Likelihood Density
In the context of the aforementioned approximation and under the assumption that Y has a probability density function fY(y,θ), we redefine the log-likelihood using the density.
Definition 2.6 (Log-likelihood density contribution): The log-likelihood density contribution ℓi:Θ×R→R is the approximation of the log-likelihood contribution using the density, i.e.
ℓi(θ,y)=logfY(y,θ)
Note: As the log-likelihood density contribution approximates the log-likelihood contribution, we use the same symbol ℓi. Note also, that as the approximation only holds for A≈{y} we have ℓi:Θ×R→R.
Example (Gaussian response): Let Y∼N(μ,σ2) be an absolutely continuous Gaussian response. The probability density function is
fY(y,μ,σ)=2πσ1exp(−2σ2(y−μ)2)
and the log-likelihood contribution for some observation (y−ε,y+ε] is
ℓi(μ,σ,y)≂−logσ−2σ2(y−μ)2
Example (Poisson response): Let Y∼Pois(λ) be a count Poisson response. The probability mass function is
pY(k,λ)=k!exp(−λ)λk
and the log-likelihood contribution for some observation (k−ε,k+ε] is
ℓi(λ,k)≂−λ+klogλ−logk!
Definition 2.7 (Log-likelihood density function): The log-likelihood density function ℓ:Θ×Rn→R is the approximation of the log-likelihood function using the density, i.e.
ℓ(θ,y)=i=1∑nℓi(θ,yi)=i=1∑nlogfY(yi,θ)
Note: For notational convenience, we often supress the dependency on y if it is clear from context and write ℓ(θ).
Example (Gaussian response): The log-likelihood function for the Gaussian response is
ℓ(μ,σ,y)≂−nlogσ−i=1∑n2σ2(yi−μ)2