Types of RNN
For processing sequential input, recurrent neural networks (RNNs), a form of neural network, are frequently employed in deep learning. Here are a few common RNN types:
- Vanilla RNN
- LSTM
- GRU
- Bidirectional RNN
- Deep RNN
- Hierarchical RNN
Vanilla RNN
- Definition: The Elman network, also referred to as a vanilla RNN, is the most basic type of RNN. Each recurrent neuron in its single layer receives an input, processes it, and then feeds the result back into the network at the following time step.
- Purpose/Objective: By identifying dependencies and patterns across time, vanilla RNNs are used to process sequential data.
- Invention: In 1988, Paul Werbos presented the idea of recurrent neural networks.
- Architecture: The architecture consists of a single layer of recurrent neurons, where each neuron receives input and generates an output that is sent back into the network at the following time step.
- Working: The network analyses the input sequence one element at a time, updating the hidden state at each time step to enable the network to keep track of previous data.
- Use: Sequential data-based tasks like language modeling, audio recognition, and time series prediction are all performed using vanilla RNNs.
Long Short-Term Memory (LSTM)
- This solution to the vanishing gradient issue is an extension of the RNN architecture. In order to enable the network to selectively remember or forget information over lengthy sequences, memory cells, and gating mechanisms are introduced.
- Definition: The vanishing gradient problem is addressed by memory cells and gating mechanisms in the advanced RNN architecture known as the LSTM.
- Purpose/Goal: The aim of LSTM is to selectively retain or forget information across lengthy sequences and to capture long-term dependencies in sequential data.
- Invention: Sepp Hochreiter and Jürgen Schmidhuber invented the LSTM in 1997.
- Architecture: It has input gates, output gates, forget gates, and memory cells. In addition to allowing the network to govern the memory cells, these gates regulate the information flow.
LSTM(Source: Polychord.io)
- Working: The gates allow the LSTM to control the flow of information and choose which data to output, forget, or store. The network can identify long-term dependencies because the memory cells store information for a long time.
- Use: LSTM is frequently utilized in lengthy sequence tasks like sentiment analysis, machine translation, natural language processing, and handwriting recognition.
Gated Recurrent Unit(GRU)
- Another RNN version that tackles the vanishing gradient issue is the gated recurrent unit (GRU). By combining the forget and input gates into a single update gate, it streamlines the design while combining the memory cell and gating processes of LSTM.
- Definition: By merging memory cells and gating mechanisms, the GRU architecture, another form of the RNN, solves the vanishing gradient issue.
- Purpose/Goal: Similar to LSTM, GRU aims to identify long-term dependencies in sequential data. However, by combining some of the gates in the LSTM, it streamlines the architecture.
- Invention: GRU was created in 2014 by Kyunghyun Cho and colleagues.
- Architecture: It has reset gates and update gates, which regulate the information flow. They forget and input gates of the LSTM are combined into a single update gate by GRU.
GRU(Source: Polychod.io)
- Working: The reset gate governs how much of the past information should be forgotten, whereas the update gate decides how much prior information should be carried forward. These gate values are used to update the network's hidden state.
- Use: GRU is frequently used for applications involving sequential data, including video analysis, speech recognition, and machine translation.
Bidirectional RNN
- Definition: Bidirectional RNNs process the input sequence concurrently in the forward and backward directions, enabling the network to access information from the past and the future.
- Purpose: Bidirectional RNNs are designed with the purpose of capturing relationships and patterns from both past and future contexts in order to better grasp the input sequence.
- Invention: Bidirectional RNNs were initially developed in 1997 by Schuster and Paliwal.
- Architecture: The network is divided into two halves, one processing the sequence ahead and the other processing it backward. Usually, the outputs from both directions are merged or concatenated.
Deep RNN(Source: Wikipedia)
- Working: The forward and backward RNNs independently process the input sequence, gathering data from the past and future, respectively. To create the final representation, the outputs are concatenated. RNNs operate in both directions: Normally, information travels from the past to the future. When an RNN is bidirectional, it is divided into two sections: one that processes the sequence ahead and the other that processes it backward. This enables the network to access information from the past as well as the future.
- Use: Tasks including named entity recognition, sentiment analysis, machine translation, and speech recognition can benefit from the use of bidirectional RNNs.
Deep RNN
- Definition: Deep RNN, also known as a multi-layered recurrent neural network, is an RNN architecture that can capture hierarchical representations and intricate relationships. An RNN design with numerous layers of recurrent units is referred recognized as a "deep RNN."
- Each layer gets data from the layer below it and generates output that is sent to the layer above it. Deep RNNs are capable of capturing complicated dependencies and hierarchical representations in sequential data.
- Purpose/Goal: Deep RNNs are designed to learn progressively abstract representations of sequential input by utilizing the network's depth.
- Invention: Geoffrey Hinton and his collaborators invented the Deep RNN and played a vital role in the development and understanding of the Deep RNN architectures.
- Architecture: A deep recurrent neural network (DRNN) is made up of many layers of recurrent units, each of which takes input from the previous layer and outputs it for the following layer to process.
Deep RNN(Source: Goodfellow et al – Deep Learning)
- Working: Each layer captures higher-level representations depending on the outputs of the previous layer as it processes the input sequence layer by layer. The output of the final layer is often applied to the current task.
- Use: Deep RNNs are employed for complicated sequential data applications like sentiment analysis, speech recognition, natural language processing, and music production.
Hierarchical RNN(HRNN)
- Definition: RNN architecture that can recognize hierarchical structures in sequential data is called a hierarchical RNN (HRNN). It has several levels, each of which processes a certain level of granularity in the input sequence. The network may learn representations at many scales since the outputs from lower levels are used as inputs by higher levels.
- By processing several granularity levels, the HRNN RNN architecture is intended to capture hierarchical structures in sequential data.
- Purpose: In order to enable the network to learn representations at numerous levels of abstraction, the purpose of HRNN is to capture dependencies at various scales and hierarchies.
- Invention: HRNN was created when Li et al. made their 2018 proposal.
- Architecture: HRNN is a multi-level architecture, with each level analyzing an individual level of granularity in the input sequence. Higher levels use the results from lower levels as inputs.
- Working: The input sequence is processed by HRNN at several granularities, each of which captures a particular set of features or patterns. The network can learn representations at various scales thanks to its hierarchical nature.
- Use: Employed in tasks where capturing hierarchical dependencies is essential, such as document categorization, sentiment analysis, and image captioning.
Implementation
Let's implement the Vanilla and LSTM RNN using keras and visualize the output.
Source Code
# Import the required libraries
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN, LSTM
# Define the input sequence
X = np.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
# Define the target sequence (shifted by 1)
y = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
# Reshape the input and target sequences
X = np.reshape(X, (1, 10, 1))
y = np.reshape(y, (1, 10, 1))
# Create a list to store the predicted sequences
predictions = []
# Define the RNN architectures
rnn_architectures = [
('Vanilla RNN', SimpleRNN),
('LSTM', LSTM)
]
# Iterate over each RNN architecture
for rnn_name, rnn_layer in rnn_architectures:
# Create the RNN model
model = Sequential()
model.add(rnn_layer(units=32, return_sequences=True, input_shape=(10, 1)))
model.add(Dense(units=1))
# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam')
# Train the model
model.fit(X, y, epochs=100, batch_size=1, verbose=0)
# Generate predictions on the input sequence
rnn_predictions = model.predict(X)
predictions.append((rnn_name, rnn_predictions))
# Visualize the predicted sequences
plt.figure(figsize=(12, 8))
for i, (rnn_name, rnn_predictions) in enumerate(predictions):
plt.subplot(2, 1, i+1)
plt.plot(y[0], marker='o', linestyle='-', label='True')
plt.plot(rnn_predictions[0], marker='o', linestyle='--', label='Predicted')
plt.title(rnn_name)
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.legend()
plt.tight_layout()
plt.show()
1/1 [==============================] - 0s 162ms/step
1/1 [==============================] - 0s 452ms/step
Description
- Two straightforward RNN designs, Vanilla RNN and LSTM, are implemented in the code you provided to predict a sequence of integers. This is a quick explanation:
- The input sequence is described as a 1D array of numbers from 0 to 9, while the target sequence is described as the input sequence with one position moved (1 to 10).
- Reshaping: In order to make the input and target sequences match the predicted input shape of the RNN models, they are reshaped to have dimensions (1, 10, 1).
- RNN Architectures: Vanilla RNN and LSTM are the two RNN architectures that are defined in the code. A tuple comprising the name of the architecture and the appropriate Keras RNN layer class identifies each architecture.
- Model construction: A sequential model is constructed for each RNN architecture. The first layer created is an RNN layer with 32 units and return_sequences=True, which is followed by a Dense layer with one unit.
- Model Compilation: The Adam optimizer and mean squared error loss function is used to compile the model.
- Model Training: Using a batch size of 1, the model is trained on the input and target sequences for 100 iterations. Calling the fit method on the model carries out the training.
- Prediction: Following training, the learned models are called using the prediction technique to produce predictions on the input sequence.
- Visualization: Matplotlib is used to display the predicted sequences for each RNN architecture. With labels and titles that identify the associated RNN architecture, each subplot displays the actual sequence and the expected sequence.
Key Points to Remember
- Recurrent neural networks (RNNs) can be classified into two categories: vanilla RNNs and fully connected RNNs.
- It has the vanishing gradient problem, which makes it challenging to learn long-term dependencies because the gradients might get incredibly small.
- Long Short-Term Memory (LSTM): A kind of RNN made to deal with the vanishing gradient issue.
- To regulate the information flow, it adds memory cells and gates (input, forget, and output).
- By selectively remembering or discarding information, LSTMs can learn long-term dependencies.
- GRU (Gated Recurrent Unit): A more streamlined version of the LSTM architecture.
- The cell state and hidden state are merged, and the forget and input gates are combined into a single "update gate".
- In actual use, GRUs outperform LSTMs while having fewer parameters.
- Combining two RNNs, one of which processes the input sequence forward and the other backward, creates a bidirectional RNN.
- both past and future contexts are captured, which is beneficial for tasks when the complete input sequence is available at once.
- Multiple recurrent layers are stacked on top of one another in a deep RNN.
- enables the learning of sequential data's hierarchical representations.
- The deep RNN has multiple layers, each of which accepts input from the layer before it and outputs it to the layer above.
Conclusion
- To sum up, RNNs (Recurrent Neural Networks) are a class of neural networks made for processing sequential data. Here is a brief description of the many RNN types:
- Vanilla RNN: The most basic type of RNN, although it has the vanishing gradient issue.
- To manage long-term dependencies, the LSTM (Long Short-Term Memory) architecture introduces memory cells and gates.
- GRU (Gated Recurrent Unit): A condensed LSTM that combines hidden states and gates.
- RNN processes input sequences in both directions, forward and backward, capturing context from both the past and the future.
- Deep RNN: Learns hierarchical representations of sequential input by stacking many recurrent layers.
- Each variety of RNNs has advantages and applications that handle issues like capturing long-term dependency and vanishing gradients. Understanding these various architectures makes it easier to select the best RNN for a given task and enhances the efficiency of sequential data analysis and prediction.
References
[1] Polychord.io