# Understanding LSTM through example

I want to code up one time step in a LSTM. My focus is on understanding the functioning of the forget gate layer, input gate layer, candidate values, present and future cell states.

Lets assume that my hidden state at t-1 and xt are the following. For simplicity, lets assume that the weight matrices are identity matrices, and all biases are zero.

```htminus1 = np.array( [0, 0.5, 0.1, 0.2, 0.6] )
xt = np.array( [-0.1, 0.3, 0.1, -0.25, 0.1] )
```

I understand that forget state is sigmoid of `htminus1` and `xt`

So, is it?

```ft = 1 / ( 1 + np.exp( -( htminus1 + xt ) ) )

>> ft = array([0.47502081, 0.68997448, 0.549834  , 0.4875026 , 0.66818777])
```

I am referring to this link to implement of one iteration of one block LSTM. The link says that `ft` should be 0 or 1. Am I missing something here?

How do I get the forget gate layer as per schema given in the below mentioned picture? An example will be illustrative for me.

Along the same lines, how do I get the input gate layer, `it` and vector of new candidate values, `tilde{C}_t` as per the following picture?

Finally, how do I get the new hidden state `ht` as per the scheme given in the following picture?

A simple, example will be helpful for me in understanding. Thanks in advance.

So this is not obvious from the figures, but here is how it works –

1. If you see two lines joining to form a single line, it’s a concatenation operation. You have interpreted it as an addition.

2. Wherever you see `sigmoid` or `tanh` blocks, a multiplication with a trainable weight matrix is implied.

3. If two lines are joined by an explicit `x` or `+`, you are doing element wise multiplication and addition respectively.

So instead of `sigmoid(htminus1+xt)`, which is what you have, the correct operation would be `sigmoid(Wf * np.concatenate(htminus1+xt)) + bf`. `Wf` is the matrix of trainable parameters and `bf` is the corresponding bias terms.

Note that I have just written the equations on the right side of the images in numpy, not much else. Interpret `[a, b]` as the concetenation operations between `a` and `b`.

You can define the other operations similarly.

```ft = sigmoid(Wf * np.concatenate(htminus1, xt)) + bf
it = sigmoid(Wi * np.concatenate(htminus1, xt)) + bi
Ctt = tanh(Wc * np.concatenate(htminus1, xt)) + bc
Ot = sigmoid(Wo * np.concatenate(htminus1, xt)) + bo

Ct = (C_{t-1} * ft) + (Ctt * it)
ht = Ot * tanh(Ct)
```

Note: I have represented `C^{tilda}` as `Ctt`