Implementing z_relu

I am trying to implement zReLU presented in “On Complex Valued Convolutional Neural Networks” from Nitzan Guberman (2016).

This activation functions let’s the output as the input if both real and imaginary parts are positive. There are several ways I imagine how to implement it but they all use tf.keras.backend.switch which is just a way of doing else if statements. Here one example.

def zrelu(z: Tensor) -> Tensor:
    angle = tf.math.angle(z)
    return tf.keras.backend.switch(0 <= angle,
                                   tf.keras.backend.switch(angle <= pi / 2,
                                                           z,
                                                           tf.cast(0., dtype=z.dtype)),
                                   tf.cast(0., dtype=z.dtype))

This gives me the desired output, and when testing the activation function with data it works correctly, however, I have a problem when using it on models like this:

model = tf.keras.Sequential([
    cvnn.layers.ComplexInput((4)),
    cvnn.layers.ComplexDense(1, activation=tf.keras.layers.Activation(zrelu)),
    cvnn.layers.ComplexDense(1, activation='linear')
])

It gives TypeError: unsupported operand type(s) for +: 'NoneType' and 'int' on the initializer line: return tf.math.sqrt(6. / (fan_in + fan_out)). I believe, as there is a switch, tf ignores the size of the activation function output and therefore outputs None shape, which then conflicts with the next layer. This is strange because the output shape is actually forced by tf.keras.layers.Activation as there is the function compute_output_shape, which to my understanding tells tf that the output will have that shape.

My problem can be solved by either of these two options:

  1. Understand why compute_output_shape and how to tell tf not to worry
  2. An alternative way of implementing the activation function in which tensorflow can understand the output shape.

Answer

I found an option that solved the issue:

def zrelu(z: Tensor, epsilon=1e-7) -> Tensor:
    imag_relu = tf.nn.relu(tf.math.imag(z))
    real_relu = tf.nn.relu(tf.math.real(z))
    ret_real = imag_relu*real_relu / (imag_relu + epsilon)
    ret_imag = imag_relu*real_relu / (real_relu + epsilon)
    ret_val = tf.complex(ret_real, ret_imag)
    return ret_val

This works but has to use an epsilon value which I don’t like the idea as it changes a little bit the results. I am still open to better options (and if better I will mark them as the new solution).