Handling long sequence with short term memory recurrent neural networks

June 27, 2017, 8:32 a.m. By: Pranjal Kumar

Recurrent neural network

Long short term Memory recurrent neural networks are capable of learning and remembering over a long sequence of inputs. LSTMs work well if your problem has only one output for every input like text translation. But it can be challenging when you have long input sequence and only one or a handful of inputs. This is often called sequence labelling.

Some of the examples can be the clarification of coding genes for a sequence of thousands of DNA base pair or in NLP. This sequence labelling requires special handling when using the neural network. SO here are few methods for handling very long sequence with long short memory recurrent neural network.

1. Truncated sequence

This is the most common method of handling long sequence. This can be done by selectively removing the time stamp from the beginning of the input sequence. It allows to force the sequence to manageable length as the costs of losing data. The only drawback is the if the data removed is necessary for the calculation.

2. Summarising sequence

In some cases, input sequence can be summarised. This is useful where input sequence are words. You can remove certain words from the input sequence that are below the certain threshold frequency. It can help in both focusing the problem on most salient parts of the input sequence and reducing the length of the input sequence.

3. Use Truncated Backpropagation through time

You could estimate the gradient from the subset of the last time steps instead of updating the model based on the entire sequence. This is known as the Truncated Backpropagation through time. It speeds up the learning process of recurrent neural network. This allows all the sequence to be provided as input and execute the forward pass, but only the last tens of time steps would be used to estimate the gradient and used in the weight updates.

4. You can split the input sequence into multiple fixed length subsequence and use this to train the model (parallel input sequence)

5. You can use sequence-aware encoding scheme, hashing, projection methods in order to reduce the number of time steps.

6. You can also use random sampling to reduce the length of the input sequence.

Image Source: SlidePlayer