How to map prediction output of keras model built from data generator (flow_from_directories)

I have two CSV files, “train.csv” and “test.csv”, which looks something like this

Image_ID Target
ID_7xd1 0
ID_8xk1 1

This is the example of train.csv, in test.csv I have just the Image_ID column and the goal is to predict its target with the images provided. The images folder are as follows

Images
├── test
│   ├── ID_12ls.tif
│   └── ID_1sfk.tif
│   └── ...
└── train
    ├── 0
    │   ├── ID_7xd1.tif
    │   └── ID_9xd0.tif
    │   └──...
    └── 1
        ├── ID_0xkd0.tif
        └── ID_8xdk1.tif
        └── ...

Each Image_ID in train.csv and test.csv represent an image and is tracked by the name of the image itself. Since I had lots of images so I decided to use Keras ImageDataGenerator.flow_from_directories

# data generators
datagen_train = ImageDataGenerator(rescale=1./255, validation_split=0.2, )
datagen_test = ImageDataGenerator(rescale=1./255)

# load and iterate training dataset
train_it = datagen_train.flow_from_directory('train/', target_size= (224, 224), class_mode='binary', batch_size=64, seed=0, subset='training')

# load and iterate validation dataset
val_it = datagen_train.flow_from_directory('train/', target_size= (224, 224), class_mode='binary', batch_size=64, seed=0, subset='validation')

# load and iterate test dataset
test_it = datagen_test.flow_from_directory('test/', target_size = (224, 224), class_mode=None, batch_size=1, seed=0)

model

model2 = Sequential()
model2.add(Conv2D(32,3,padding="valid", activation="relu", input_shape=(224,224,3)))
model2.add(MaxPool2D())
model2.add(Dropout(0.4))

model2.add(Flatten())
model2.add(Dense(128,activation="relu"))
model2.add(Dense(1, activation="sigmoid"))

opt = tf.keras.optimizers.Adam(lr=0.000001)
model2.compile(optimizer = opt , loss = 'binary_crossentropy' , metrics = ['accuracy'])

# callbacks
mc_loss = ModelCheckpoint('model2svd.h5', monitor='val_loss', mode='min', verbose=1, save_best_only=True)

history2 = model2.fit_generator(generator=train_it, steps_per_epoch=step_size_t, validation_data= val_it, validation_steps=step_size_v,
                               epochs=100, shuffle=True, callbacks=[mc_loss])

Problem

Now after training the model with model.fit_generator(), I made prediction on the testing dataset with model.predict_generator(). It gave me array of 1,m where m is total examples.

The problem is how do I map this output with my test.csv Image_ID. Or is the output is in the order of test.csv’s Image_ID.

please let me know me if you need more details

Answer

in your test generator set shuffle=False. Also model.predict_generator is depreciated so just use model.predict. Now with shuffle=False in test generaotr you can get the sequence of predicted image files in the order they were processed as

test_files=test_it.filenames

to ensure you go through the test set samples EXACTLY once determine the test batch size and test steps such that test_batch_size X test_steps= number of test samples using the code below:

length=len(test_files)
test_batch_size=sorted([int(length/n) for n in range(1,length+1) if length % n ==0 and length/n<=80],reverse=True)[0]  
test_steps=int(length/test_batch_size)
print ( 'test batch size: ' ,test_batch_size, '  test steps: ', test_steps)

then do

preds=model.predict(test_it, batch_size=test_batch_size, steps=test_steps)

then iterate through the preds

labels=[]
for  p in preds:    
    if p > .5:
        label=1
    else:
        label=0
    labels.append(label)
Fseries=pd.Series(test_files, name='Image Id')
Lseries=pd.Series(labels, name='Target')
predictions_df= pd.concat([Fseries, Lseries], axis=1)

print (predictions_df.head())