Long short-term memory (LSTM) RNN in Tensorflow
Last Updated :
28 May, 2025
Long Short-Term Memory (LSTM) where designed to address the vanishing gradient issue faced by traditional RNNs in learning from long-term dependencies in sequential data. LSTMs are capable of maintaining information over extended periods because of memory cells and gating mechanisms. These memory cells are managed by three primary gates: the input gate, the forget gate and the output gate.
In this article we will learn how to implement Long Short-Term Memory Networks using TensorFlow.
1. Importing Libraries
In this step, we will import the necessary libraries like pandas, numpy, matplotlib, scikit-learn and tensorflow. Here tensorflow library is used to create the LSTM Model.
Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
2. Data Loading, Preparing and Scaling
Here we are using a dataset of monthly milk production using LSTM. You can download dataset from here.
- We load the dataset of monthly milk production. The "Date" column is converted to datetime format for time series analysis.
- We scale the data to a range of [0, 1] using MinMaxScaler to help the model train more effectively.
Python
data = pd.read_csv('monthly_milk_production.csv')
data['Date'] = pd.to_datetime(data['Date'])
data.set_index('Date', inplace=True)
production = data['Production'].astype(float).values.reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(production)
3. Creating Sequences and Train-Test Split
Here we generate sequences of input data and split the dataset into training and testing sets.
- We use a sliding window of 12 months (1 year) of past data to predict the next month's production.
- The dataset is split into training and testing sets and reshaped to match the LSTM input shape.
- We split 80% data for training and 20% for testing purposes.
Python
window_size = 12
X = []
y = []
target_dates = data.index[window_size:]
for i in range(window_size, len(scaled_data)):
X.append(scaled_data[i - window_size:i, 0])
y.append(scaled_data[i, 0])
X = np.array(X)
y = np.array(y)
X_train, X_test, y_train, y_test, dates_train, dates_test = train_test_split(
X, y, target_dates, test_size=0.2, shuffle=False
)
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
4. Building the LSTM Model
This step involves defining and building the LSTM model architecture.
- The model consists of two LSTM layers, each with 128 units and a dropout layer after each to prevent overfitting.
- The model concludes with a Dense layer to predict a single value (next month's production).
Python
model = Sequential()
model.add(LSTM(units=128, return_sequences=True, input_shape=(X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=128))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
5. Training and Evaluating the Model
In this step, we train the model on the training data and evaluate its performance.
- The model is trained for 100 epochs using a batch size of 32, with 10% of the training data used for validation.
- After training the model is used to make predictions on the test set and we calculate the Root Mean Squared Error (RMSE) to evaluate performance.
Python
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.1)
predictions = model.predict(X_test)
predictions = scaler.inverse_transform(predictions).flatten()
y_test = scaler.inverse_transform(y_test.reshape(-1,1)).flatten()
rmse = np.sqrt(np.mean((y_test - predictions)**2))
print(f'RMSE: {rmse:.2f}')
Output:
Training the ModelModel trains for 100 epochs with a batch size of 32, using 10% of the training data for validation.
In this step, we visualize the actual vs predicted values. A plot is generated to compare the actual milk production against the predicted values, allowing us to evaluate how well the model performs over time.
Python
plt.figure(figsize=(12, 6))
plt.plot(dates_test, y_test, label='Actual Production')
plt.plot(dates_test, predictions, label='Predicted Production')
plt.title('Actual vs Predicted Milk Production')
plt.xlabel('Date')
plt.ylabel('Production (pounds per cow)')
plt.legend()
plt.show()
Output:
Actual vs Predicted Milk Production Using LSTM ModelThe LSTM model successfully captures the trends and patterns in the time series data. As observed, the predicted values closely follow the actual values with small variations during transitions between peaks and lows. This demonstrates the effectiveness of LSTM for time series prediction tasks such as forecasting milk production.
You can download source code from here.
You can implement LSTM using PyTorch also: Long Short Term Memory (LSTM) Networks using PyTorch
Similar Reads
What is LSTM - Long Short Term Memory? Long Short-Term Memory (LSTM) is an enhanced version of the Recurrent Neural Network (RNN) designed by Hochreiter and Schmidhuber. LSTMs can capture long-term dependencies in sequential data making them ideal for tasks like language translation, speech recognition and time series forecasting. Unlike
5 min read
Long Short-Term Memory (LSTM) using R Implementing Long Short-Term Memory (LSTM) networks in R involves using libraries that support deep learning frameworks like TensorFlow or Keras. These frameworks provide high-level interfaces for efficiently building and training LSTM models. Here's a step-by-step guide to implementing LSTM using R
5 min read
Tensorflow.js tf.memory() Function Tensorflow.js is an open-source library developed by Google for running machine learning models and deep learning neural networks in the browser or node environment. Tensorflow.js tf.memory() function is used to get memory info of the program at the current time. This function returns a memoryInfo o
2 min read
Long Short Term Memory (LSTM) Networks using PyTorch Long Short-Term Memory (LSTM) where designed to overcome the vanishing gradient problem which traditional RNNs face when learning long-term dependencies in sequential data. LSTMs are capable of retaining information for long periods by using memory cells and gating mechanisms. These memory cells wor
4 min read
Tensorflow.js tf.layers.rnn() Function Tensorflow.js is an open-source library developed by Google for running machine learning models and deep learning neural networks in the browser or node environment. Tensorflow.js tf.layers.rnn() function is basically base class for recurrent layers. Syntax: tf.layers.rnn( args ); Parameters: arg
4 min read
Tensor Indexing in Tensorflow In the realm of machine learning and deep learning, tensors are fundamental data structures used to represent numerical data with multiple dimensions. TensorFlow, a powerful numerical computation library, equips you with an intuitive and versatile set of operations for manipulating and accessing dat
10 min read
Tensorflow.js tf.oneHot() Function Tensorflow.js is an open-source library developed by Google for running machine learning models and deep learning neural networks in the browser or node environment. The tf.oneHot() function is used to create a one-hot tf.Tensor. The locations represented by indices take the value as 1 (default valu
2 min read
Tensorflow.js tf.tensor5d() Function Tensorflow.js is an open-source library developed by Google for running machine learning models and deep learning neural networks in the browser or node environment. The .tensor5d() function is used to create a new 5-dimensional tensor with the parameters namely value, shape, and datatype. Syntax :
3 min read
String tensors in Tensorflow TensorFlow is a comprehensive open-source library for data science, it offers various data types for handling complex operations. The tf.string data type is used to represent string values. Unlike numeric data types that have a fixed size, strings are variable-length and can contain sequences of cha
5 min read
Recurrent Layers in TensorFlow Recurrent layers are used in Recurrent Neural Networks (RNNs), which are designed to handle sequential data. Unlike traditional feedforward networks, recurrent layers maintain information across time steps, making them suitable for tasks such as speech recognition, machine translation, and time seri
2 min read