I am trying to count the number of events with various thresholds. So I used for loop to use it as thresholds but the number of events is too many so it takes too much time. So I want to vectorize this macro and reduce compute time. Can I get some help?
array_ = np.array(bin_number) for i in range(bin_number): mask_1 = array_ML[:,0] > i masked_array = array_ML[mask_1] mask_2 = masked_array[:,2] == 0 masked_array = masked_array[mask_2] array_[i] = masked_array.shape
There may be a dedicated function in NumPy that does this for you, but otherwise, the following simplifications are likely to speed up your code significantly:
import numpy as np # Create example data array_ML = np.random.randint(0, 1000, (10000, 200)) array_ML[:, 2] = np.where(array_ML[:, 2] > 500, 0, 1) bin_number = 100 array_ = np.zeros(bin_number, dtype=int) # filter what we can, before the loop mask = array_ML[:, 2] == 0 temp = array_ML[mask, 0] # Just count, by summing the condition for i in range(bin_number): array_[i] = np.sum(temp > i)
With the above example data, my timings (using
%%time in Jupyter notebook cells) reduce from 439 ms (original code) to 3.86 ms (code above).
Of course, the timing decreases are heavily dependent on your input data shape, distribution of data, and
bin_number; my timings serve as an indication.