How to map prediction output of keras model built from data generator (flow_from_directories)

I have two CSV files, “train.csv” and “test.csv”, which looks something like this

Image_ID Target
ID_7xd1 0
ID_8xk1 1

This is the example of train.csv, in test.csv I have just the Image_ID column and the goal is to predict its target with the images provided. The images folder are as follows

├── test
│   ├── ID_12ls.tif
│   └── ID_1sfk.tif
│   └── ...
└── train
    ├── 0
    │   ├── ID_7xd1.tif
    │   └── ID_9xd0.tif
    │   └──...
    └── 1
        ├── ID_0xkd0.tif
        └── ID_8xdk1.tif
        └── ...

Each Image_ID in train.csv and test.csv represent an image and is tracked by the name of the image itself. Since I had lots of images so I decided to use Keras ImageDataGenerator.flow_from_directories

# data generators
datagen_train = ImageDataGenerator(rescale=1./255, validation_split=0.2, )
datagen_test = ImageDataGenerator(rescale=1./255)

# load and iterate training dataset
train_it = datagen_train.flow_from_directory('train/', target_size= (224, 224), class_mode='binary', batch_size=64, seed=0, subset='training')

# load and iterate validation dataset
val_it = datagen_train.flow_from_directory('train/', target_size= (224, 224), class_mode='binary', batch_size=64, seed=0, subset='validation')

# load and iterate test dataset
test_it = datagen_test.flow_from_directory('test/', target_size = (224, 224), class_mode=None, batch_size=1, seed=0)


model2 = Sequential()
model2.add(Conv2D(32,3,padding="valid", activation="relu", input_shape=(224,224,3)))

model2.add(Dense(1, activation="sigmoid"))

opt = tf.keras.optimizers.Adam(lr=0.000001)
model2.compile(optimizer = opt , loss = 'binary_crossentropy' , metrics = ['accuracy'])

# callbacks
mc_loss = ModelCheckpoint('model2svd.h5', monitor='val_loss', mode='min', verbose=1, save_best_only=True)

history2 = model2.fit_generator(generator=train_it, steps_per_epoch=step_size_t, validation_data= val_it, validation_steps=step_size_v,
                               epochs=100, shuffle=True, callbacks=[mc_loss])


Now after training the model with model.fit_generator(), I made prediction on the testing dataset with model.predict_generator(). It gave me array of 1,m where m is total examples.

The problem is how do I map this output with my test.csv Image_ID. Or is the output is in the order of test.csv’s Image_ID.

please let me know me if you need more details


in your test generator set shuffle=False. Also model.predict_generator is depreciated so just use model.predict. Now with shuffle=False in test generaotr you can get the sequence of predicted image files in the order they were processed as


to ensure you go through the test set samples EXACTLY once determine the test batch size and test steps such that test_batch_size X test_steps= number of test samples using the code below:

test_batch_size=sorted([int(length/n) for n in range(1,length+1) if length % n ==0 and length/n<=80],reverse=True)[0]  
print ( 'test batch size: ' ,test_batch_size, '  test steps: ', test_steps)

then do

preds=model.predict(test_it, batch_size=test_batch_size, steps=test_steps)

then iterate through the preds

for  p in preds:    
    if p > .5:
Fseries=pd.Series(test_files, name='Image Id')
Lseries=pd.Series(labels, name='Target')
predictions_df= pd.concat([Fseries, Lseries], axis=1)

print (predictions_df.head())