next up previous contents
Next: Probabilistic computations for MLP Up: Standard probability distributions Previous: Normal distribution   Contents

Dirichlet distribution

The multinomial distribution is a discrete distribution which gives the probability of choosing a given collection of $ m$ items from a set of $ n$ items with repetitions and the probabilities of each choice given by $ p_1, \ldots, p_n$. These probabilities are the parameters of the multinomial distribution [16].

The Dirichlet distribution is the conjugate prior of the parameters of the multinomial distribution. The probability density of the Dirichlet distribution for variables $ \mathbf{p} = (p_1, \ldots, p_n)$ with parameters $ \mathbf{u} = (u_1, \ldots, u_n)$ is defined by

$\displaystyle p(\mathbf{p}) = \ensuremath{\text{Dirichlet}}(\mathbf{p};\; \mathbf{u}) = \frac{1}{Z(\mathbf{u})} \prod_{i=1}^n p_i^{u_i-1}$ (A.7)

when $ p_1, \ldots, p_n \ge 0; \sum_{i=1}^n p_i = 1$ and $ u_1, \ldots,
u_n > 0$. The parameters $ u_i$ can be interpreted as ``prior observation counts'' for events governed by $ p_i$. The normalisation constant $ Z(u)$ becomes

$\displaystyle Z(\mathbf{u}) = \frac{\prod_{i=1}^n \Gamma(u_i)}{\Gamma( \sum_{i=1}^n u_i )}.$ (A.8)

Let $ u_0 = \sum_{i=1}^n u_i$. The mean and variance of the distribution are [16]

$\displaystyle \operatorname{E}[ p_i ] = \frac{u_i}{ u_0 }$ (A.9)

and

$\displaystyle \operatorname{Var}[ p_i ] = \frac{ u_i ( u_0 - u_i) }{u_0^2 (u_0 + 1) }.$ (A.10)

When $ u_i \rightarrow 0$, the distribution becomes noninformative. The means of all the $ p_i$ stay the same if all $ u_i$ are scaled with the same multiplicative constant. The variances will, however, get smaller as the parameters $ u_i$ grow. The pdfs of the Dirichlet distribution with certain parameter values are shown in Figure A.2.

Figure A.2: Plots of one component of a two dimensional Dirichlet distribution. The parameters are chosen such that $ u_1 = u_2 = u$ with the values for $ u$ shown above each individual image. Because both the parameters of the distribution are equal, the distribution of the other component will be exactly the same.
\includegraphics[width=.9\textwidth]{pics/dirichlets}

In addition to the standard statistics given above, using ensemble learning for parameters with Dirichlet distribution requires the evaluation of the expectation $ \operatorname{E}[ \log p_i ]$ and the negative differential entropy $ \operatorname{E}[ \log p(\mathbf{p}) ]$.

The first expectation can be reduced to evaluating the expectation over a two dimensional Dirichlet distribution for

$\displaystyle (p, 1-p) \sim \ensuremath{\text{Dirichlet}}( u_i, u_0-u_i )$ (A.11)

which is given by the integral

$\displaystyle \operatorname{E}[ \log p_i ] = \int\limits_0^1 \frac{\Gamma(u_0)}{\Gamma(u_i) \Gamma(u_0-u_i)} p^{u_i-1} (1-p)^{u_0-u_i-1} \log p \, dp.$ (A.12)

This can be evaluated analytically to yield

$\displaystyle \operatorname{E}[ \log p_i ] = \Psi(u_i) - \Psi(u_0)$ (A.13)

where $ \Psi(x) =
\frac{d}{dx} \ln(\Gamma(x))$ is also known as the digamma function.

By using this result, the negative differential entropy can be evaluated

\begin{displaymath}\begin{split}\operatorname{E}[ \log p(\mathbf{p}) ] &= \opera...
...}) + \sum_{i=1}^n (u_i - 1) [\Psi(u_i) - \Psi(u_0)] \end{split}\end{displaymath} (A.14)


next up previous contents
Next: Probabilistic computations for MLP Up: Standard probability distributions Previous: Normal distribution   Contents
Antti Honkela 2001-05-30