Open In App

Text Generation using Recurrent Long Short Term Memory Network

Last Updated : 27 Feb, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

LSTMs are a type of neural network that are well-suited for tasks involving sequential data such as text generation. They are particularly useful because they can remember long-term dependencies in the data which is crucial when dealing with text that often has context that spans over multiple words or sentences. In this article we will learn how to build a text generator using a Recurrent Long Short Term Memory (LSTM) Network

Implementing Text Generation using Long Short Term Memory Network

Text generation is a part of NLP where we train our model on dataset that involves vast amount of textual data and our LSTM model will use it to train model. Here is the step by step implementation of text generation:

1. Importing Required Libraries

Python
from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Activation, LSTM
from keras.optimizers import RMSprop
from keras.callbacks import LambdaCallback, ModelCheckpoint, ReduceLROnPlateau
import random
import sys
import pandas as pd
  • NumPy: is a fundamental package for numerical computing in Python.
  • Pandas: is for reading and processing the dataset.
  • TensorFlow: is an open-source deep learning framework used to build and train machine learning models.
  • Keras: is a high-level neural networks API that runs on top of TensorFlow.
  • RMSprop: is an optimizer that adjusts the learning rate during training to speed up convergence.
  • Callbacks: are used to modify the training process.
  • ModelCheckpoint: saves the best model
  • ReduceLROnPlateau: reduces the learning rate if the model’s performance plateaus.

2. Loading the Dataset

You can download dataset from here. It contains vast amount of textual data for training.

Python
df = pd.read_csv('/mnt/data/train.csv')

text = " ".join(df['text'].dropna().values)
  • pd.read_csv: reads the CSV file into a DataFrame.
  • df['text'].dropna().values: We extract the text column and drop any rows with missing values using dropna().
  • " ".join(): Concatenate all the text entries into a single string that the model will use for training.

3. Mapping Each Unique Character to a Unique Number
 

Python
vocabulary = sorted(list(set(text)))

char_to_indices = dict((c, i) for i, c in enumerate(vocabulary))
indices_to_char = dict((i, c) for i, c in enumerate(vocabulary))

print(vocabulary)
  • set(text): Converts the text into a set to get all unique characters.
  • sorted(list): Sorts the characters alphabetically to create the vocabulary.
  • char_to_indices: Maps each character to a unique index.
  • indices_to_char: Maps indices back to characters (this is useful for decoding the model’s predictions).
  • The vocabulary will give us a list of all unique characters in the dataset.

Output:

[‘\t’, ‘ ‘, ‘!’, ‘”‘, ‘#’, ‘$’, ‘%’, ‘&’, “‘”, ‘(‘, ‘)’, ‘*’, ‘+’, ‘,’, ‘-‘, ‘.’, ‘/’, ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘:’, ‘;’, ‘<‘, ‘=’, ‘>’, ‘?’, ‘@’, ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, ‘I’, ‘J’, ‘K’, ‘L’, ‘M’, ‘N’, ‘O’, ‘P’, ‘Q’, ‘R’, ‘S’, ‘T’, ‘U’, ‘V’, ‘W’, ‘X’, ‘Y’, ‘Z’, ‘[‘, ‘\\’, ‘]’, ‘^’, ‘_’, ‘`’, ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’, ‘k’, ‘l’, ‘m’, ‘n’, ‘o’, ‘p’, ‘q’, ‘r’, ‘s’, ‘t’, ‘u’, ‘v’, ‘w’, ‘x’, ‘y’, ‘z’, ‘{‘, ‘|’, ‘}’, ‘\xa0’, ‘\xad’, ‘¯’, ‘°’, ‘´’, ‘¿’, ‘à’, ‘á’, ‘â’, ‘ç’, ‘é’, ‘ê’, ‘í’, ‘ñ’, ‘ó’, ‘ö’, ‘ü’, ‘ā’, ‘ō’, ‘\u200a’, ‘\u200b’, ‘\u200e’, ‘\u200f’, ‘–’, ‘—’, ‘‘’, ‘’’, ‘“’, ‘”’, ‘•’, ‘…’, ‘☑’, ‘➡’, ‘ツ’, ‘️’]

4. Pre-processing the Data 

Python
max_length = 100
steps = 5
sentences = []
next_chars = []

for i in range(0, len(text) - max_length, steps):
    sentences.append(text[i: i + max_length])
    next_chars.append(text[i + max_length])

X = np.zeros((len(sentences), max_length, len(vocabulary)), dtype=bool)
y = np.zeros((len(sentences), len(vocabulary)), dtype=bool)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_to_indices[char]] = 1
    y[i, char_to_indices[next_chars[i]]] = 1
  • max_length = 100: We define the length of each input subsequence the model will learn to predict.
  • steps = 5: The step size to slide over the text. A step size of 5 means the next subsequence starts 5 characters after the current one.
  • sentences.append(text[i: i + max_length]): We divide the text into subsequences of length max_length.
  • next_chars.append(): We store the next character after each subsequence which will be the model’s target output.
  • np.zeros((len(sentences), max_length, len(vocabulary)), dtype=bool): Initializes the input array X for one-hot encoding.
  • For each subsequence we store the one-hot encoded next character in y.

5. Building the LSTM Model
 

Python
model = Sequential()
model.add(LSTM(128, input_shape=(max_length, len(vocabulary))))
model.add(Dense(len(vocabulary)))
model.add(Activation('softmax'))

optimizer = RMSprop(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
  • Sequential(): Initializes the model as a stack of layers.
  • model.add(LSTM(128): Adds an LSTM layer with 128 units
  • model.add(Dense(len(vocabulary))): Adds a Dense layer that outputs a vector with size equal to the number of unique characters (vocabulary size).
  • model.add(Activation('softmax')): Applies softmax activation to the output layer to produce a probability distribution over all characters.
  • We use RMSprop with a learning rate of 0.01 to optimize the model.
  • model.compile(): Compiles the model with categorical cross-entropy loss suitable for multi-class classification.

6. Defining Helper Functions for training

Note: The first two functions given below have been referred from the documentation of the official text generation example from Keras.

a) Helper function to sample the next character: 
 

Python
def sample_index(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)
  • np.log(preds) / temperature: Applies logarithmic scaling to the predicted probabilities. The temperature controls the randomness of the output. Lower temperatures make the model more deterministic while higher values make it more random.

b) Helper function to generate text after each epoch
 

Python
def on_epoch_end(epoch, logs):
    start_index = random.randint(0, len(text) - max_length - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        generated = ''
        sentence = text[start_index: start_index + max_length]
        generated += sentence
        sys.stdout.write(generated)
  • generated = '': Initializes an empty string to accumulate the generated text.
  • sentence = text[start_index: start_index + max_length]: Extracts a random seed sequence of max_length characters from the text.
  • sys.stdout.write(generated): Prints the generated text without newline characters. 

7. Training the LSTM model
 

Python
model.fit(X, y, batch_size=128, epochs=500, callbacks=callbacks)

8. Generating new random text

Python
generated_text = generate_text(30, 0.5)
print("Generated Text:\n", generated_text)

Output:

Generated Text: over how strong the government’s case would be. Richard Primus, a professor of constitutional law aTsfgLi4EIte(vBa9xn7jCtOwF’AVmt

Here we generate 30 characters of text with a diversity of 0.5 after training.



Next Article

Similar Reads