MemoryError when trying to normalize an array of images

I have a folder containing 110k images with shape (256, 256, 3) each. I’m reading one by one, converting to a numpy array and storing in a list. After that, I convert the list to a numpy array. The shape of the numpy array is (110000, 256, 256, 3). Then, when I try to normalize the images with images = images / float(255), this error is displayed:

 File "loading_images.py", line 25, in <module>
    images = images / float(255)
MemoryError: Unable to allocate 161. GiB for an array with shape (110000, 256, 256, 3) and data type float64

Is there any other way to do that?

My current code is this:

files = glob.glob(dir + "*.png")
images = []
for f in files
    im = cv2.imread(f)
    img_arr = np.asarray(im)
    images.append(img_arr)

images = np.asarray(images)
images = images / float(255)
 

Answer

Think your issue is that cv2 gives an int8 (correct me if I’m wrong), and you’re trying to cast the values into float64’s

import numpy as np 
print(np.float64(255).itemsize)
print(np.int8(255).itemsize)

Which means that after the typecast, you’re left with approximately 8 times the bytes. you have 110000×256×256×3=21GB of image data to begin with, which is probably just within your RAM limitation. After the conversion to float, you get 8×21 = 168GB of data, which is above the RAM limit of anyone I know haha.

This is no solution however, do you really need to load all the images at the same time?

Leave a Reply

Your email address will not be published. Required fields are marked *