I’m struggling to sample from a Gaussian Mixture Model. I have a very simple example where there’s actually only one component (so, not actually a mixture). Then I fit it using standard normal data. However, the mixture’s weights end up being greater than 1 for the one mixture, causing an error:

import numpy as np from sklearn.mixture import GaussianMixture dataset = np.random.standard_normal(10).reshape(-1, 1) mixture = GaussianMixture(n_components=1) mixture.fit(dataset) mixture.sample(10)

ValueError: pvals < 0, pvals > 1 or pvals contains NaNs

It’s evident to me that this is caused by the weights of the first component being greater than 1:

> print(mixture.weights_[0]) 1.0000000000000002

This kind of seems like a bug. But maybe I’m doing something wrong here?

## Answer

Although *technically* this seems to be a bug indeed, truth is that, as already explained in the other answer, the real issue stems from the fact that asking for a Gaussian Mixture with `n_components=1`

does not make sense from a *modelling* perspective; one could argue that an exception (or at least a warning) should be caused earlier, i.e. whenever a `GaussianMixture(n_components=1)`

is requested. I guess it may be a design choice not to do so, but in any case this is arguably something to be discussed in the scikit-learn Github repo as a possible issue, and not here.

That said, a workaround here is pretty straighforward: in the special case when `n_components=1`

, *force* `mixture.weights_[0]`

to be equal to 1.0:

import numpy as np from sklearn.mixture import GaussianMixture dataset = np.random.standard_normal(10).reshape(-1, 1) mixture = GaussianMixture(n_components=1) mixture.fit(dataset) mixture.weights_[0] # 1.0000000000000002 mixture.sample(10) # ValueError: pvals < 0, pvals > 1 or pvals contains NaNs # force weight to 1.0: mixture.weights_[0] = 1. mixture.sample(10) # result: (array([[ 0.51371178], [ 0.1530927 ], [-0.56327362], [-1.22308348], [ 1.26889771], [ 1.11849849], [-1.47091749], [-0.41259178], [ 1.93872769], [ 0.26282224]]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

Apparently, there should not be any theoretical concerns here, since *by definition* the weight of a single component in a Gaussian mixture is 1.0; it is just that, as demonstrated in the other answer, in the limit of a low number of available samples, the GMM algorithm fails to give a weight of exactly 1.0 within the available machine precision.