I have a numpy matrix
A where the data is organised column-vector-vise i.e
A[:,0] is the first data vector,
A[:,1] is the second and so on. I wanted to know whether there was a more elegant way to zero out the mean from this data. I am currently doing it via a
mean=A.mean(axis=1) for k in range(A.shape): A[:,k]=A[:,k]-mean
So does numpy provide a function to do this? Or can it be done more efficiently another way?
As is typical, you can do this a number of ways. Each of the approaches below works by adding a dimension to the
mean vector, making it a 4 x 1 array, and then NumPy’s broadcasting takes care of the rest. Each approach creates a view of
mean, rather than a deep copy. The first approach (i.e., using
newaxis) is likely preferred by most, but the other methods are included for the record.
In addition to the approaches below, see also ovgolovin’s answer, which uses a NumPy matrix to avoid the need to reshape
For the methods below, we start with the following code and example array
import numpy as np A = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) mean = A.mean(axis=1)
>>> A - mean[:, np.newaxis] array([[-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.]])
The documentation states that
None can be used instead of
newaxis. This is because
>>> np.newaxis is None True
Therefore, the following accomplishes the task.
>>> A - mean[:, None] array([[-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.]])
newaxis is clearer and should be preferred. Also, a case can be made that
newaxis is more future proof. See also: Numpy: Should I use newaxis or None?
>>> A - mean.reshape((mean.shape), 1) array([[-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.]])
You can alternatively change the shape of
>>> mean.shape = (mean.shape, 1) >>> A - mean array([[-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.]])