I’m attempting to generate a density function, but the sum of the components of a generated histogram does not appear to be close to 1.

What is the cause of this and how to make the sum of the density function close (even if not exactly equal) to 1?

Minimal example:

import numpy as np x = np.random.normal(0, 0.5, 1000) # mu, sigma, num bins = np.linspace(min(x), max(x), num=50) # lower and upper bounds hist, hist_bins = np.histogram(x, bins=bins, density = True) print(np.sum(hist)) >>> 10.4614

If I’m not specifying the bins edges, the output is smaller but still greater than 1:

import numpy as np x = np.random.normal(0, 0.5, 1000) # mu, sigma, num hist, hist_bins = np.histogram(x, density = True) print(np.sum(hist)) >>> 3.1332

## Answer

The reason for this behavior is stated in the docs:

density: bool, optional

If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.

In addition, a sample is provided showing that the sum of the histograms is not equal to 1.0:

import numpy as np a = np.arange(5) hist, bin_edges = np.histogram(a, density=True) print(hist) # hist --> [0.5, 0. , 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. , 0.5] print(hist.sum()) # --> 2.4999999999999996 print(np.sum(hist * np.diff(bin_edges))) # --> 1.0

So we can apply this to your code snippet:

x = np.random.normal(0, 0.5, 1000) # mu, sigma, num bins = np.linspace(min(x), max(x), num=50) # lower and upper bounds hist, hist_bins = np.histogram(x, bins=bins, density=True) print(hist) print(np.sum(hist)) print(np.sum(hist * np.diff(hist_bins))) # --> 1.0

In addition, you should think about how you have chosen your bins and make sure that using a `.linspace()`

is a reasonable way.