I’m struggling to sample from a Gaussian Mixture Model. I have a very simple example where there’s actually only one component (so, not actually a mixture). Then I fit it using standard normal data. However, the mixture’s weights end up being greater than 1 for the one mixture, causing an error:
import numpy as np from sklearn.mixture import GaussianMixture dataset = np.random.standard_normal(10).reshape(-1, 1) mixture = GaussianMixture(n_components=1) mixture.fit(dataset) mixture.sample(10)
ValueError: pvals < 0, pvals > 1 or pvals contains NaNs
It’s evident to me that this is caused by the weights of the first component being greater than 1:
> print(mixture.weights_) 1.0000000000000002
This kind of seems like a bug. But maybe I’m doing something wrong here?
Although technically this seems to be a bug indeed, truth is that, as already explained in the other answer, the real issue stems from the fact that asking for a Gaussian Mixture with
n_components=1 does not make sense from a modelling perspective; one could argue that an exception (or at least a warning) should be caused earlier, i.e. whenever a
GaussianMixture(n_components=1) is requested. I guess it may be a design choice not to do so, but in any case this is arguably something to be discussed in the scikit-learn Github repo as a possible issue, and not here.
That said, a workaround here is pretty straighforward: in the special case when
mixture.weights_ to be equal to 1.0:
import numpy as np from sklearn.mixture import GaussianMixture dataset = np.random.standard_normal(10).reshape(-1, 1) mixture = GaussianMixture(n_components=1) mixture.fit(dataset) mixture.weights_ # 1.0000000000000002 mixture.sample(10) # ValueError: pvals < 0, pvals > 1 or pvals contains NaNs # force weight to 1.0: mixture.weights_ = 1. mixture.sample(10) # result: (array([[ 0.51371178], [ 0.1530927 ], [-0.56327362], [-1.22308348], [ 1.26889771], [ 1.11849849], [-1.47091749], [-0.41259178], [ 1.93872769], [ 0.26282224]]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))
Apparently, there should not be any theoretical concerns here, since by definition the weight of a single component in a Gaussian mixture is 1.0; it is just that, as demonstrated in the other answer, in the limit of a low number of available samples, the GMM algorithm fails to give a weight of exactly 1.0 within the available machine precision.