3. Categorical Likelihoods

Although the likelihood applies to all measurement scales, we need to discuss aspects of categorical likelihoods in a little more detail.

3.1 Likelihood for Categorical Responses

Let YY be ordered categorical with support S={y1,,yK}\evS = \set{y_1, \ldots, y_K}. Then YY follows the Categorical distribution YCateg(π)Y \sim \lawCateg(\gvec \pi), where π=(π1,,πK)\gvec \pi = (\pi_1, \ldots, \pi_K) with k=1Kπk=1\sum_{k=1}^K \pi_k = 1 are the probabilities for the KK categories.

Assuming that we have nn observations consisting of single outcomes, i.e. Ai={yki}\evA_i = \set{y_{k_i}}, the likelihood contributions are li(π)=Pπ ⁣(Yi=yki)=πki l_i(\gvec\pi) = \probPwrt{\gvec\pi}{Y_i = y_{k_i}} = \pi_{k_i} and the joint likelihood function is L(π)=i=1nli(π)=i=1nπki=πknk L(\gvec \pi) = \prod_{i=1}^n l_i(\gvec\pi) = \prod_{i=1}^n \pi_{k_i} = \pi_{k}^{n_k} with nk=i=1n1{yki=yk}n_k = \sum_{i=1}^n \ind{y_{k_i} = y_k}. This is equivalent to the single likelihood function of the multinomial distribution Mult(π,n)\lawMult(\gvec \pi, n).

Proposition 3.1 (Maximum likelihood for multinomial): The likelihood l(π)=πknkl(\gvec \pi) = \pi_{k}^{n_k} of a random vector distributed via Mult(π,n)\lawMult(\gvec \pi, n) is maximised by π^=(n1n,,nKn) \hat{\gvec \pi} = \pa{\frac{n_1}{n}, \ldots, \frac{n_K}{n}}

One may observe events of the form Ai={ypi,yqi}\evA_i = \set{y_{p_i},y_{q_i}} for which the likelihood contribution becomes li(π)=Pπ ⁣(Yi{ypi,yqi})=πpi+πqi l_i(\gvec \pi) = \probPwrt{\gvec \pi}{Y_i \in \set{y_{p_i},y_{q_i}}} = \pi_{p_i} + \pi_{q_i}

3.2 The Nonparametric Likelihood

TODO