Python – Speed-up for loops – large data sets

I’m new in Python, and I need to speed-up this simple code.

I created this code in Matlab, and it runs “instantly”. I tried to ‘convert’ it in Python, but its very slow…

In my final code, this piece operation has to be looped thousand times… So that at the end, this particular part of the code needs to be as efficient as possible…

# a and B define the size of the data
a=9000  
b=4000
c=np.ones((a,))   # not one in my code
d=np.random.rand(a,b)  # not random in my code

res=np.zeros((b,1))  # pre-alloc

# here is the loop to be speed up !
tic = time.time()
for x in range(b):
    res[x]=sum(c*d[:,x])
toc = time.time()
print(toc-tic)    

Just to make it harder… In theory, “a” can be as big as ‘millions’… and “b” as big a hundreds of thousands… Not fun…

Any suggestions ?

Thanks a lot for your help !

Binabik

Answer

In general, loops should be avoided while working with numpy arrays since numpy vectorized operations are much more efficient. This code gives the same result:

a=9000  
b=4000
c=np.ones((a,))
d=np.random.rand(a,b)
res=np.zeros((b,1))

# replaces the loop
(c*d.T).sum(axis=1).reshape(-1, 1)