A Simple Tool for Choosing Dirichlet Priors

The Dirichlet distribution, also known as the multivariate beta distribution, is a probability distribution often employed to model the probabilities of a set of correlated, categorical outcomes/events. This is natural since the Dirichlet is a multivariate distribution with support on a probability simplex, where each dimension of the distribution is distributed univariate beta. In practical terms, this means that for each vector1 sampled from a Dirichlet, each and every value falls in the $(0,1)$ interval while the elements of the vector sum to one. This is why the Dirichlet is often considered a “distribution over distributions.” It is particularly appealing in Bayesian inference where its conjugacy to other distributions for modeling categorical data can reduce computational demands by reducing the need for resource-intensive numerical integration.

For those new to the Dirichlet distribution, how its parameters affect the prior probability assigned to each categorical event (i.e. each dimension) is not always intuitive. This can serve as an obstacle to assigning sensible Dirichlet priors during Bayesian modeling. While some rules of thumb apply based on known properties of the Dirichlet and knowledge of Beta distributions can produce reasonable starting points for prior specification, simulation and visualization can greatly aid in prior selection.

Two different types of visualization are often useful in specifying Dirichlet priors. The first is a multi-dimensional visualization of the probability density across the event space. This is the most common approach to visually communicating information about the distribution of the joint probabilities of each event. For example, leveraging the fact that each vector is constrained via a simplex, most tools allow users to visualize how the probabilities of each event varies across different parameters of a $Dirichlet_{K=3}(\alpha)$ using a triangular heat map. This approach helps provide a rough sense of how prior probability shifts between each event with different Dirichlet distributions. However, the triangle ceases to be able to summarize the densities as the dimensionality of the Dirichlet distribution increases - making visualization and interpretation difficult-to-impossible for high-dimensional distributions.

The second approach is to visualize the marginal prior probability density distribution for each event across different Dirichlet distributions. This involves visualization of each of the $K$ distributions of sampled values for each $Dirichlet_{K}(\alpha)$, where $\alpha \sim Beta(\theta, \beta)$. The appeal of this approach is that it lends itself to clearer visualization of the priors each Dirichlet implies for the marginal probability of each event. Furthermore, it is straightforward to understand even for high-dimensional Dirichlet distributions.

As I did not find a simple tool for this latter type of visualization, I put one together. You can find the tool here.

  1. Since the Dirichlet is a multivariate distribution, each sample is a vector. ^
Brett J. Gall
Data Scientist