Introduction to Deep Learning

Bidirectional Recurrent Neural Network

Last Updated : 23 Jul, 2025

Recurrent Neural Networks (RNNs) are designed to handle sequential data such as speech, text and time series. Unlike traditional feedforward neural networks which process inputs as fixed-length vectors, RNNs can manage variable-length sequences by maintaining a hidden state that stores information from previous steps in the sequence.

This memory mechanism enables RNNs to capture key features within the sequence. However traditional RNNs face challenges such as the vanishing gradient problem where gradients become too small during backpropagation making training difficult. To address this issue advanced RNN architectures like the Bidirectional Recurrent Neural Network (BRNN) have been developed. In this article, we will explore BRNNs in more detail.

Overview of Bidirectional Recurrent Neural Networks (BRNNs)

A Bidirectional Recurrent Neural Network (BRNN) is an extension of the traditional RNN that processes sequential data in both forward and backward directions. This allows the network to utilize both past and future context when making predictions providing a more comprehensive understanding of the sequence.

Like a traditional RNN, a BRNN moves forward through the sequence, updating the hidden state based on the current input and the prior hidden state at each time step. The key difference is that a BRNN also has a backward hidden layer which processes the sequence in reverse, updating the hidden state based on the current input and the hidden state of the next time step.

Compared to unidirectional RNNs BRNNs improve accuracy by considering both the past and future context. This is because the two hidden layers i.e forward and backward complement each other and predictions are made using the combined outputs of both layers.

Example:

Consider the sentence: "I like apple. It is very healthy."

In a traditional unidirectional RNN the network might struggle to understand whether "apple" refers to the fruit or the company based on the first sentence. However a BRNN would have no such issue. By processing the sentence in both directions, it can easily understand that "apple" refers to the fruit, thanks to the future context provided by the second sentence ("It is very healthy.").

Bi-directional Recurrent Neural Network

Working of Bidirectional Recurrent Neural Networks (BRNNs)

1. Inputting a Sequence: A sequence of data points each represented as a vector with the same dimensionality is fed into the BRNN. The sequence may have varying lengths.

2. Dual Processing: BRNNs process data in two directions:

Forward direction: The hidden state at each time step is determined by the current input and the previous hidden state.
Backward direction: The hidden state at each time step is influenced by the current input and the next hidden state.

3. Computing the Hidden State: A non-linear activation function is applied to the weighted sum of the input and the previous hidden state creating a memory mechanism that allows the network to retain information from earlier steps.

4. Determining the Output: A non-linear activation function is applied to the weighted sum of the hidden state and output weights to compute the output at each step. This output can either be:

The final output of the network.
An input to another layer for further processing.

Implementation of Bi-directional Recurrent Neural Network

Here’s a simple implementation of a Bidirectional RNN using Keras and TensorFlow for sentiment analysis on the IMDb dataset available in keras:

1. Loading and Preprocessing Data

We first load the IMDb dataset and preprocess it by padding the sequences to ensure uniform length.

warnings.filterwarnings('ignore') suppresses any warnings during execution.
imdb.load_data(num_words=features) loads the IMDb dataset, considering only the top 2000 most frequent words.
pad_sequences(X_train, maxlen=max_len) and pad_sequences(X_test, maxlen=max_len) pad the training and test sequences to a maximum length of 50 words ensuring consistent input size.

Python

import warnings
warnings.filterwarnings('ignore')
from keras.datasets import imdb
from keras.preprocessing.sequence import pad_sequences

features = 2000  # Number of most frequent words to consider
max_len = 50     # Maximum length of each sequence

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=features)

X_train = pad_sequences(X_train, maxlen=max_len)
X_test = pad_sequences(X_test, maxlen=max_len)

2. Defining the Model Architecture

We define a Bidirectional Recurrent Neural Network model using Keras. The model uses an embedding layer with 128 dimensions, a Bidirectional SimpleRNN layer with 64 hidden units and a dense output layer with a sigmoid activation for binary classification.

Embedding() layer maps input features to dense vectors of size embedding (128), with an input length of len.
Bidirectional(SimpleRNN(hidden)) adds a bidirectional RNN layer with hidden (64) units.
Dense(1, activation='sigmoid') adds a dense output layer with 1 unit and a sigmoid activation for binary classification.
model.compile() configures the model with Adam optimizer, binary cross-entropy loss and accuracy as the evaluation metric.

Python

from keras.models import Sequential
from keras.layers import Embedding, Bidirectional, SimpleRNN, Dense

embedding_dim = 128  
hidden_units = 64    

model = Sequential()

model.add(Embedding(features, embedding_dim, input_length=max_len))

model.add(Bidirectional(SimpleRNN(hidden_units)))

model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

3. Training the Model

As we have compiled our model successfully and the data pipeline is also ready so, we can move forward toward the process of training our BRNN.

batch_size=32 defines how many samples are processed together in one iteration.
epochs=5 sets the number of times the model will train on the entire dataset.
model.fit() trains the model on the training data and evaluates it using the provided validation data.

Python

batch_size = 32
epochs = 5

model.fit(X_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(X_test, y_test))

Output:

training — Training the Model

4. Evaluating the Model

Now as we have our model ready let’s evaluate its performance on the validation data using different evaluation metrics. For this purpose we will first predict the class for the validation data using this model and then compare the output with the true labels.

model.evaluate(X_test, y_test) evaluates the model's performance on the test data (X_test, y_test), returning the loss and accuracy.

Python

loss, accuracy = model.evaluate(X_test, y_test)

print('Test accuracy:', accuracy)

Output :

Test accuracy: 0.7429199814796448

Here we achieved a accuracy of 74% and we can increase it accuracy by more fine tuning.

5. Predict on Test Data

We will use the model to predict on the test data and compare the predictions with the true labels.

model.predict(X_test) generates predictions for the test data.
y_pred = (y_pred > 0.5) converts the predicted probabilities into binary values (0 or 1) based on a threshold of 0.5.
classification_report(y_test, y_pred, target_names=['Negative', 'Positive']) generates and prints a classification report including precision, recall, f1-score and support for the negative and positive classes.

Python

from sklearn.metrics import classification_report

y_pred = model.predict(X_test)

y_pred = (y_pred > 0.5)

print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))

Output:

prediction — Predict on Test Data

Advantages of BRNNs

Enhanced Context Understanding: Considers both past and future data for improved predictions.
Improved Accuracy: Particularly effective for NLP and speech processing tasks.
Better Handling of Variable-Length Sequences: More flexible than traditional RNNs making it suitable for varying sequence lengths.
Increased Robustness: Forward and backward processing help filter out noise and irrelevant information, improving robustness.

Challenges of BRNNs

High Computational Cost: Requires twice the processing time compared to unidirectional RNNs.
Longer Training Time: More parameters to optimize result in slower convergence.
Limited Real-Time Applicability: Since predictions depend on the entire sequence hence they are not ideal for real-time applications like live speech recognition.
Less Interpretability: The bidirectional nature of BRNNs makes it more difficult to interpret predictions compared to standard RNNs.

Applications of Bidirectional Recurrent Neural Networks (BRNNs)

BRNNs are widely used in various natural language processing (NLP) tasks, including:

Sentiment Analysis: By considering both past and future context they can better classify the sentiment of a sentence.
Named Entity Recognition (NER): It helps to identify entities in sentences by analyzing the context in both directions.
Machine Translation: In encoder-decoder models, BRNNs allow the encoder to capture the full context of the source sentence in both directions hence improving translation accuracy.
Speech Recognition: By considering both previous and future speech elements it enhance the accuracy of transcribing audio.

Bidirectional Recurrent Neural Network

Introduction to Deep Learning

S

shivammiglani09

Improve

Article Tags :

Practice Tags :

python

Similar Reads

Deep Learning Tutorial

Deep Learning is a subset of Artificial Intelligence (AI) that helps machines to learn from large datasets using multi-layered neural networks. It automatically finds patterns and makes predictions and eliminates the need for manual feature extraction. Deep Learning tutorial covers the basics to adv