Show contents
Table of Contents

Mixture modelling

Mixture models identify clusters of samples by modelling marker expression as a mixture, or sum, of Gaussian (normal) distributions. The Expectation-Maximisation (EM) algorithm [Dempster et al. 1977 J. Roy. Stat. Soc. B Met. 39:1] is used to fit nine different models containing from one to nine clusters. The best performing model is selected using the Bayesian Information Criterion (BIC) [Schwarz 1978 Ann. Statist. 6:461]. The result is plotted as a line graph, along with score density shown as a line (using kernel density estimation, or KDE) and a histogram.

Available for: continuous scoring

[Top]Viewing the results

To view mixture modelling results, click on the marker's name in the tabs near the top of the page. When a marker has been selected, you will see the mixture model plot on the left, and descriptive statistics on the right.

A Gaussian mixture model is a way of modelling marker scores using a sum of Gaussian distributions (also called normal distributions). The number of Gaussians used is called the modality of the model. The methodology is summarised at the top of this page.

The mixture model plot includes a density plot and histogram, overlaid with a Gaussian mixture model - all of the same marker's scores. The centres of each cluster are shown as dashed vertical line(s), each centre-point corresponds to the average expression value (mean, mode and median are all the same for a Gaussian distribution).

The histogram and the density plot are representations of the protein expression. It is important to note that both are approximations of the underlying distribution - the histogram relies on estimating how many bars to use and where their boundaries lie, and the density line relies on kernel density estimation (KDE) using adaptive bandwidth estimation. KDE involves estimation of a parameter called bandwidth (an estimate of how smooth or irregular a distribution is). Therefore, in some cases the mixture model doesn't exactly match the KDE approximation, which could be due to a lack of precision in the KDE approximation or the mixture being a poor fit to the data, or a combination of both.

The dark blue dashed line running from left to right indicates the mixture model, which is computed by summing up the Gaussian distributions that contribute to the model. Statistics for each of these distributions are given on the right hand side. Together, these provide the mathematical formula for the mixture model.

The buttons on the right-hand side are:

Statistics are provided in the box on the right-hand side as follows:

Back to top