Tensorflow Model fit format data correctly — TypeError: Cannot convert a symbolic Keras input/output to a numpy array

For a NLP task, my input dataset is transformed to look like this : a list of list of integers. Features and Labels are the same dataset.

>>>training_data = [[    0     4    79  3179    11    44     8     1 11245   173   152    10
      1  1138  1079]
 [    0     0     4    79  3179    11    44     8 11566   173   152     8
      1  1138  1079]
 [    0     0     0     0     0     0     0     9    15   333    44     3
     61    63   533]
 [    0     0     0     0     0     0     3    19   253    28    44     3
     61    63   533]
 [    0     0     0     0     0     0     0     0     0     0     0     2
      3    49  4395]
 [    0     0     0     0     0     0     0     0     0     0     0     0
     75    65  4395]
 [    3     1  7128  3388   289    10   446   200   675     8  3320    14
     32    82   234]
 [    7    74   268   577    23    49    31     5  1032    98    10  4270
   5026    12  6570]
 [    0     0     0     0     0     0     0     2     3    39     7    27
    155    29  4534]
 [    0     0     0     0     0     2     3    19    39     7    27   155
     29    34  4534]]

The validation dataset is an excerpt of the main dataset, same format.

I then call the fit() method – my model is vae

n_steps = (800000 / 2) / batch_size   
for counter in range(nb_epoch):
    print('-------epoch: ',counter,'--------')
    vae.fit(x=np.array(training_data),y=np.array(training_data), steps_per_epoch=n_steps,
        epochs=1, callbacks=[checkpointer], validation_data=(data_1_val, data_1_val))

which gives this error

   TypeError: Cannot convert a symbolic Keras input/output to a numpy array. 
   This error may indicate that you're trying to pass a symbolic value to a NumPy call, 
   which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to 
   a TF API that does not register dispatching, preventing Keras from automatically 
      converting the API call to a lambda layer in the Functional Model.

I tried

vae.fit(x=training_data,y=training_data, steps_per_epoch=n_steps,
            epochs=1, callbacks=[checkpointer], validation_data=(data_1_val, data_1_val))

as well with the same error.

Any nice solution or hint on how to format data towards training is welcome, using lists, np.arrays or generators.

EDIT: some code

training_data = pad_sequences(sequences, maxlen = MAX_SEQUENCE_LENGTH)
len_val = int(np.floor ( len(texts) * 0.2 )) # num samples for validation
data_1_val = data_1[-len_val:] #select len_val sentences as validation data

Building and training the model

x = Input(batch_shape=(None, max_len))
x_embed = Embedding(NB_WORDS, emb_dim, weights=[glove_embedding_matrix],
                        input_length=max_len, trainable=False)(x)

[…]

loss_layer = CustomVariationalLayer()([x, x_decoded_mean])
vae = Model(x, [loss_layer])
opt = Adam(lr=0.01) #SGD(lr=1e-2, decay=1e-6, momentum=0.9, nesterov=True)
vae.compile(optimizer='adam', loss=[zero_loss])

nb_epoch = 100
n_steps = (800000 / 2) / batch_size   

for counter in range(nb_epoch):
    print('-------epoch: ',counter,'--------')
    vae.fit(training_data,training_data, steps_per_epoch=n_steps,
        epochs=1, callbacks=[checkpointer], validation_data=(data_1_val, data_1_val))

In the original github code a generator was used as an input for fit() with a deprecated method in Keras, fit_generator

for counter in range(nb_epoch):
    print('-------epoch: ',counter,'--------')
    vae.fit_generator(sent_generator(TRAIN_DATA_FILE, batch_size/2),
                      steps_per_epoch=n_steps, epochs=1, callbacks=[checkpointer],
                      validation_data=(data_1_val, data_1_val))

Since fit() also supports a generator argument I first tried

for counter in range(nb_epoch):
    print('-------epoch: ',counter,'--------')
    vae.fit(sent_generator(TRAIN_DATA_FILE, batch_size/2),
                      steps_per_epoch=n_steps, epochs=1, callbacks=[checkpointer],
                      validation_data=(data_1_val, data_1_val))

which is crashing, with the same error as above.

Answer

Issue:

TypeError: Cannot convert a symbolic Keras input/output to a numpy array.
This error may indicate that you’re trying to pass a symbolic value to a NumPy call, which is not supported. Or, you may be trying to pass Keras symbolic inputs/outputs to a TF API that does not register dispatching, preventing Keras from automatically converting the API call to a lambda layer in the Functional Model.

Investigation

Because of the nature of this TypeError, I suggested you check the error code when disabling eager execution:

from tensorflow.python.framework.ops import disable_eager_execution 
disable_eager_execution()

You had no error but this warning: WARNING:tensorflow:When passing input data as arrays, do not specify steps_per_epoch/steps argument. Please use batch_size instead.

Understand the issue

I’ll first explain the reason of this suggestion. The behavior of models created with the Functional API, could seem rather unpredictable whith eager execution enabled. But we will understand why it occurs and how to fix it.

Here you’ll find the TypeError coming from the KerasTensor class: https://github.com/keras-team/keras/blob/4a978914d2298db2c79baa4012af5ceff4a4e203/keras/engine/keras_tensor.py#L244

Why disabling eager execution seems to solve the problem:

Let’s first read this quotation from https://www.tensorflow.org/guide/eager

Enabling eager execution changes how TensorFlow operations behave—now they immediately evaluate and return their values to Python. tf.Tensor objects reference concrete values instead of symbolic handles to nodes in a computational graph. Since there isn’t a computational graph to build and run later in a session, it’s easy to inspect results using print() or a debugger. Evaluating, printing, and checking tensor values does not break the flow for computing gradients.

Eager execution works nicely with NumPy. NumPy operations accept tf.Tensor arguments. The TensorFlow tf.math operations convert Python objects and NumPy arrays to tf.Tensor objects. The tf.Tensor.numpy method returns the object’s value as a NumPy ndarray.

But eager execution should work nicely with Numpy, why the error seems to happen while working with an numpy array?

This error is not thrown by Tensorflow’s eager execution. This error is thrown by Keras and more specifically by KerasTensor.

During the Functional API construction of your model, KerasTensors are created to represent the “symbolic inputs” and outputs of each Keras layers. Your input is an np.ndarray. Keras takes your array and put it in a tf.keras.Input` layer, producing a KerasTensor. The error is thrown because your model will try converting this symbolic input/output into an np.ndarray.

But why this behavior?

Remember during eager execution tf.Tensor objects reference concrete values instead of symbolic handles to nodes in a computational graph. Therefore eager execution will try and get a concrete value from your KerasTensor which would throw this error TypeError: Cannot convert a symbolic Keras input/output to a numpy array.

When disabling eager execution you’ll never try to get a concrete value from your KerasTensor and this error will never be thrown.

Pleas read this 2 quotations from the KerasTensor’s class if you’d like to better understand what’s happening inside your Functional model:

Passing a KerasTensor to a tf.keras.Layer __call__ lets the layer know that you are building a Functional model. The layer __call__ will infer the output signature and return KerasTensors with tf.TypeSpecs corresponding to the symbolic outputs of that layer call. These output KerasTensors will have all of the internal KerasHistory metadata attached to them that Keras needs to construct a Functional Model. Currently, layers infer the output signature by:
* creating a scratch FuncGraph
* making placeholders in the scratch graph that match the input typespecs
* Calling layer.call on these placeholders
* extracting the signatures of the outputs before clearing the scratch graph

If you are passing a KerasTensor to a TF API that supports dispatching, Keras will automatically turn that API call into a lambda layer in the Functional model, and return KerasTensors representing the
symbolic outputs.

Suggested solution:

Disabling eager execution is not a satisfying solution.

I suggest you try converting training_data as a Dataset with tf.data.Dataset class or as a tensor with the tf.Tensor class prior to model.fit.

Also, if the issue is still not resolved it would helps, if you were able to provide some code to reproduce the error.