I want to code up one time step in a LSTM. My focus is on understanding the functioning of the **forget gate layer**, **input gate layer**, **candidate values**, **present** and **future** **cell states**.

Lets assume that my hidden state at t-1 and xt are the following. For simplicity, lets assume that the weight matrices are identity matrices, and all biases are zero.

htminus1 = np.array( [0, 0.5, 0.1, 0.2, 0.6] ) xt = np.array( [-0.1, 0.3, 0.1, -0.25, 0.1] )

I understand that forget state is sigmoid of `htminus1`

and `xt`

So, is it?

ft = 1 / ( 1 + np.exp( -( htminus1 + xt ) ) ) >> ft = array([0.47502081, 0.68997448, 0.549834 , 0.4875026 , 0.66818777])

I am referring to this link to implement of one iteration of one block LSTM. The link says that `ft`

should be 0 or 1. Am I missing something here?

**How do I get the forget gate layer as per schema given in the below mentioned picture? An example will be illustrative for me.**

Along the same lines, **how do I get the input gate layer, it and vector of new candidate values, tilde{C}_t as per the following picture?**

Finally, **how do I get the new hidden state ht as per the scheme given in the following picture?**

A simple, example will be helpful for me in understanding. Thanks in advance.

## Answer

So this is not obvious from the figures, but here is how it works –

If you see two lines joining to form a single line, it’s a concatenation operation. You have interpreted it as an addition.

Wherever you see

`sigmoid`

or`tanh`

blocks, a multiplication with a trainable weight matrix is implied.If two lines are joined by an explicit

`x`

or`+`

, you are doing element wise multiplication and addition respectively.

So instead of `sigmoid(htminus1+xt)`

, which is what you have, the correct operation would be `sigmoid(Wf * np.concatenate(htminus1+xt)) + bf`

. `Wf`

is the matrix of trainable parameters and `bf`

is the corresponding bias terms.

Note that I have just written the equations on the right side of the images in numpy, not much else. Interpret `[a, b]`

as the concetenation operations between `a`

and `b`

.

You can define the other operations similarly.

ft = sigmoid(Wf * np.concatenate(htminus1, xt)) + bf it = sigmoid(Wi * np.concatenate(htminus1, xt)) + bi Ctt = tanh(Wc * np.concatenate(htminus1, xt)) + bc Ot = sigmoid(Wo * np.concatenate(htminus1, xt)) + bo Ct = (C_{t-1} * ft) + (Ctt * it) ht = Ot * tanh(Ct)

Note: I have represented `C^{tilda}`

as `Ctt`