Next: Probabilistic computations for MLP Up: Standard probability distributions Previous: Normal distribution Contents

Dirichlet distribution

The multinomial distribution is a discrete distribution which gives the probability of choosing a given collection of items from a set of items with repetitions and the probabilities of each choice given by $p_1, \ldots, p_n$ . These probabilities are the parameters of the multinomial distribution [16].

The Dirichlet distribution is the conjugate prior of the parameters of the multinomial distribution. The probability density of the Dirichlet distribution for variables $\mathbf{p} = (p_1, \ldots, p_n)$ with parameters $\mathbf{u} = (u_1, \ldots, u_n)$ is defined by

$\displaystyle p(\mathbf{p}) = \ensuremath{\text{Dirichlet}}(\mathbf{p};\; \mathbf{u}) = \frac{1}{Z(\mathbf{u})} \prod_{i=1}^n p_i^{u_i-1}$

(A.7)

when $p_1, \ldots, p_n \ge 0; \sum_{i=1}^n p_i = 1$ and $u_1, \ldots, u_n > 0$ . The parameters

can be interpreted as ``prior observation counts'' for events governed by

. The normalisation constant

becomes

$\displaystyle Z(\mathbf{u}) = \frac{\prod_{i=1}^n \Gamma(u_i)}{\Gamma( \sum_{i=1}^n u_i )}.$

(A.8)

Let $u_0 = \sum_{i=1}^n u_i$ . The mean and variance of the distribution are [16]

$\displaystyle \operatorname{E}[ p_i ] = \frac{u_i}{ u_0 }$

(A.9)

and

$\displaystyle \operatorname{Var}[ p_i ] = \frac{ u_i ( u_0 - u_i) }{u_0^2 (u_0 + 1) }.$

(A.10)

When $u_i \rightarrow 0$ , the distribution becomes noninformative. The means of all the stay the same if all are scaled with the same multiplicative constant. The variances will, however, get smaller as the parameters grow. The pdfs of the Dirichlet distribution with certain parameter values are shown in Figure A.2.

**Figure A.2:** Plots of one component of a two dimensional Dirichlet distribution. The parameters are chosen such that with the values for shown above each individual image. Because both the parameters of the distribution are equal, the distribution of the other component will be exactly the same.
$\includegraphics[width=.9\textwidth]{pics/dirichlets}$

In addition to the standard statistics given above, using ensemble learning for parameters with Dirichlet distribution requires the evaluation of the expectation $\operatorname{E}[ \log p_i ]$ and the negative differential entropy $\operatorname{E}[ \log p(\mathbf{p}) ]$ .

The first expectation can be reduced to evaluating the expectation over a two dimensional Dirichlet distribution for

$\displaystyle (p, 1-p) \sim \ensuremath{\text{Dirichlet}}( u_i, u_0-u_i )$

(A.11)

which is given by the integral

$\displaystyle \operatorname{E}[ \log p_i ] = \int\limits_0^1 \frac{\Gamma(u_0)}{\Gamma(u_i) \Gamma(u_0-u_i)} p^{u_i-1} (1-p)^{u_0-u_i-1} \log p \, dp.$

(A.12)

This can be evaluated analytically to yield

$\displaystyle \operatorname{E}[ \log p_i ] = \Psi(u_i) - \Psi(u_0)$

(A.13)

where $\Psi(x) = \frac{d}{dx} \ln(\Gamma(x))$ is also known as the digamma function.

By using this result, the negative differential entropy can be evaluated

$\begin{displaymath}\begin{split}\operatorname{E}[ \log p(\mathbf{p}) ] &= \opera... ...}) + \sum_{i=1}^n (u_i - 1) [\Psi(u_i) - \Psi(u_0)] \end{split}\end{displaymath}$

(A.14)

Next: Probabilistic computations for MLP Up: Standard probability distributions Previous: Normal distribution Contents

Antti Honkela 2001-05-30