I am trying to implement zReLU presented in “On Complex Valued Convolutional Neural Networks” from Nitzan Guberman (2016).
This activation functions let’s the output as the input if both real and imaginary parts are positive. There are several ways I imagine how to implement it but they all use
tf.keras.backend.switch which is just a way of doing
else if statements. Here one example.
def zrelu(z: Tensor) -> Tensor: angle = tf.math.angle(z) return tf.keras.backend.switch(0 <= angle, tf.keras.backend.switch(angle <= pi / 2, z, tf.cast(0., dtype=z.dtype)), tf.cast(0., dtype=z.dtype))
This gives me the desired output, and when testing the activation function with data it works correctly, however, I have a problem when using it on models like this:
model = tf.keras.Sequential([ cvnn.layers.ComplexInput((4)), cvnn.layers.ComplexDense(1, activation=tf.keras.layers.Activation(zrelu)), cvnn.layers.ComplexDense(1, activation='linear') ])
TypeError: unsupported operand type(s) for +: 'NoneType' and 'int' on the initializer line:
return tf.math.sqrt(6. / (fan_in + fan_out)). I believe, as there is a switch, tf ignores the size of the activation function output and therefore outputs
None shape, which then conflicts with the next layer. This is strange because the output shape is actually forced by
tf.keras.layers.Activation as there is the function
compute_output_shape, which to my understanding tells tf that the output will have that shape.
My problem can be solved by either of these two options:
- Understand why
compute_output_shapeand how to tell tf not to worry
- An alternative way of implementing the activation function in which tensorflow can understand the output shape.
I found an option that solved the issue:
def zrelu(z: Tensor, epsilon=1e-7) -> Tensor: imag_relu = tf.nn.relu(tf.math.imag(z)) real_relu = tf.nn.relu(tf.math.real(z)) ret_real = imag_relu*real_relu / (imag_relu + epsilon) ret_imag = imag_relu*real_relu / (real_relu + epsilon) ret_val = tf.complex(ret_real, ret_imag) return ret_val
This works but has to use an epsilon value which I don’t like the idea as it changes a little bit the results. I am still open to better options (and if better I will mark them as the new solution).