Understanding LSTM through example

I want to code up one time step in a LSTM. My focus is on understanding the functioning of the forget gate layer, input gate layer, candidate values, present and future cell states.

Lets assume that my hidden state at t-1 and xt are the following. For simplicity, lets assume that the weight matrices are identity matrices, and all biases are zero.

htminus1 = np.array( [0, 0.5, 0.1, 0.2, 0.6] )
xt = np.array( [-0.1, 0.3, 0.1, -0.25, 0.1] )

I understand that forget state is sigmoid of htminus1 and xt

So, is it?

ft = 1 / ( 1 + np.exp( -( htminus1 + xt ) ) )

>> ft = array([0.47502081, 0.68997448, 0.549834  , 0.4875026 , 0.66818777])

I am referring to this link to implement of one iteration of one block LSTM. The link says that ft should be 0 or 1. Am I missing something here?

How do I get the forget gate layer as per schema given in the below mentioned picture? An example will be illustrative for me.

enter image description here

Along the same lines, how do I get the input gate layer, it and vector of new candidate values, tilde{C}_t as per the following picture?

enter image description here

Finally, how do I get the new hidden state ht as per the scheme given in the following picture?

A simple, example will be helpful for me in understanding. Thanks in advance.

enter image description here

Answer

So this is not obvious from the figures, but here is how it works –

  1. If you see two lines joining to form a single line, it’s a concatenation operation. You have interpreted it as an addition.

  2. Wherever you see sigmoid or tanh blocks, a multiplication with a trainable weight matrix is implied.

  3. If two lines are joined by an explicit x or +, you are doing element wise multiplication and addition respectively.

So instead of sigmoid(htminus1+xt), which is what you have, the correct operation would be sigmoid(Wf * np.concatenate(htminus1+xt)) + bf. Wf is the matrix of trainable parameters and bf is the corresponding bias terms.

Note that I have just written the equations on the right side of the images in numpy, not much else. Interpret [a, b] as the concetenation operations between a and b.

You can define the other operations similarly.

ft = sigmoid(Wf * np.concatenate(htminus1, xt)) + bf
it = sigmoid(Wi * np.concatenate(htminus1, xt)) + bi
Ctt = tanh(Wc * np.concatenate(htminus1, xt)) + bc
Ot = sigmoid(Wo * np.concatenate(htminus1, xt)) + bo

Ct = (C_{t-1} * ft) + (Ctt * it)
ht = Ot * tanh(Ct)

Note: I have represented C^{tilda} as Ctt