Gaussian Mixture Model: ValueError: pvals < 0, pvals > 1 or pvals contains NaNs

I’m struggling to sample from a Gaussian Mixture Model. I have a very simple example where there’s actually only one component (so, not actually a mixture). Then I fit it using standard normal data. However, the mixture’s weights end up being greater than 1 for the one mixture, causing an error:

import numpy as np
from sklearn.mixture import GaussianMixture

dataset = np.random.standard_normal(10).reshape(-1, 1)
mixture = GaussianMixture(n_components=1)
mixture.fit(dataset)
mixture.sample(10)
ValueError: pvals < 0, pvals > 1 or pvals contains NaNs

It’s evident to me that this is caused by the weights of the first component being greater than 1:

> print(mixture.weights_[0])
1.0000000000000002

This kind of seems like a bug. But maybe I’m doing something wrong here?

Answer

Although technically this seems to be a bug indeed, truth is that, as already explained in the other answer, the real issue stems from the fact that asking for a Gaussian Mixture with n_components=1 does not make sense from a modelling perspective; one could argue that an exception (or at least a warning) should be caused earlier, i.e. whenever a GaussianMixture(n_components=1) is requested. I guess it may be a design choice not to do so, but in any case this is arguably something to be discussed in the scikit-learn Github repo as a possible issue, and not here.

That said, a workaround here is pretty straighforward: in the special case when n_components=1, force mixture.weights_[0] to be equal to 1.0:

import numpy as np
from sklearn.mixture import GaussianMixture

dataset = np.random.standard_normal(10).reshape(-1, 1)
mixture = GaussianMixture(n_components=1)
mixture.fit(dataset)

mixture.weights_[0]
# 1.0000000000000002

mixture.sample(10)
# ValueError: pvals < 0, pvals > 1 or pvals contains NaNs

# force weight to 1.0:
mixture.weights_[0] = 1.

mixture.sample(10)
# result:
(array([[ 0.51371178],
        [ 0.1530927 ],
        [-0.56327362],
        [-1.22308348],
        [ 1.26889771],
        [ 1.11849849],
        [-1.47091749],
        [-0.41259178],
        [ 1.93872769],
        [ 0.26282224]]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

Apparently, there should not be any theoretical concerns here, since by definition the weight of a single component in a Gaussian mixture is 1.0; it is just that, as demonstrated in the other answer, in the limit of a low number of available samples, the GMM algorithm fails to give a weight of exactly 1.0 within the available machine precision.

Leave a Reply

Your email address will not be published. Required fields are marked *