Extreme Rare Event Classification Using Autoencoders in Keras - by Chitta Ranjan - Towards Data Science
Extreme Rare Event Classification Using Autoencoders in Keras - by Chitta Ranjan - Towards Data Science
Search
Listen Share
Background
What is an extreme rare event?
In a rare-event problem, we have an unbalanced dataset. Meaning, we have fewer
positively labeled samples than negative. In a typical rare-event problem, the
positively labeled data are around 5–10% of the total. In an extreme rare event
problem, we have less than 1% positively labeled data. For example, in the dataset
used here it is around 0.6%.
Such extreme rare event problems are quite common in the real-world, for example,
sheet-breaks and machine failure in manufacturing, clicks or purchase in an online
industry.
Classifying these rare events is quite challenging. Recently, Deep Learning has been
quite extensively used for classification. However, the small number of positively
labeled samples prohibits Deep Learning application. No matter how large the
data, the use of Deep Learning gets limited by the amount of positively labeled
samples.
Why should we still bother to use Deep Learning?
This is a legitimate question. Why should we not think of using some another
Machine Learning approach?
If the data is sufficient, Deep Learning methods are potentially more capable. It also
allows flexibility for model improvement by using different architectures. We will,
therefore, attempt to use Deep Learning methods.
In this post, we will learn how we can use a simple dense layers autoencoder to
build a rare event classifier. The purpose of this post is to demonstrate the
implementation of an Autoencoder for extreme rare-event classification. We will
leave the exploration of different architecture and configuration of the Autoencoder
on the user. Please share in the comments if you find anything interesting.
The encoder learns the underlying features of a process. These features are
typically in a reduced dimension.
The decoder can recreate the original data from these underlying features.
Figure 1. Illustration of an autoencoder. [Source: Autoencoder by Prof. Seungchul Lee
iSystems Design Lab]
The negatively labeled data is treated as normal state of the process. A normal
state is when the process is eventless.
We will ignore the positively labeled data, and train an Autoencoder on only
negatively labeled data.
This Autoencoder has now learned the features of the normal process.
A well-trained Autoencoder will predict any new data that is coming from the
normal state of the process (as it will have the same pattern or distribution).
This will make the reconstruction error high during the rare-event.
We can catch such high reconstruction errors and label them as a rare-event
prediction.
This procedure is similar to anomaly detection methods.
Implementation
Data and problem
This is a binary labeled data from a pulp-and-paper mill for sheet breaks. Sheet
breaks is severe problem in paper manufacturing. A single sheet break causes loss
of several thousand dollars, and the mills see at least one or more break every day.
This causes millions of dollors of yearly losses and work hazards.
The data we have contains about 18k rows collected over 15 days. The column y
contains the binary labels, with 1 denoting a sheet break. The rest columns are
predictors. There are about 124 positive labeled sample (~0.6%).
Code
Import the desired libraries.
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from pylab import rcParams
import tensorflow as tf
from keras.models import Model, load_model
from keras.layers import Input, Dense
from keras.callbacks import ModelCheckpoint, TensorBoard
from keras import regularizers
rcParams['figure.figsize'] = 8, 6
LABELS = ["Normal","Break"]
Note that we are setting the random seeds for reproducibility of the result.
Data preprocessing
df = pd.read_csv("data/processminer-rare-event-mts - data.csv")
Make row (n-2) and (n-1) equal to 1. This will help the classifier learn up to 4
minute ahead prediction.
Delete row n. Because we do not want the classifier to learn predicting a break
when it has happened.
Output
df A dataframe with the binary labels shifted by shift.
'''
vector = df['y'].copy()
for s in range(abs(shift_by)):
tmp = vector.shift(sign(shift_by))
tmp = tmp.fillna(0)
vector += tmp
labelcol = 'y'
# Add vector to the df
df.insert(loc=0, column=labelcol+'tmp', value=vector)
# Remove the rows with labelcol == 1.
df = df.drop(df[df[labelcol] == 1].index)
# Drop labelcol and rename the tmp col as labelcol
df = df.drop(labelcol, axis=1)
df = df.rename(columns={labelcol+'tmp': labelcol})
# Make the labelcol binary
df.loc[df[labelcol] > 0, labelcol] = 1
return df
Before moving forward, we will drop the time, and also the categorical columns for
simplicity.
Now, we divide the data into train, valid, and test sets. Then we will take the subset
of data with only 0s to train the autoencoder.
df_train_0 = df_train.loc[df['y'] == 0]
df_train_1 = df_train.loc[df['y'] == 1]
df_train_0_x = df_train_0.drop(['y'], axis=1)
df_train_1_x = df_train_1.drop(['y'], axis=1)
df_valid_0 = df_valid.loc[df['y'] == 0]
df_valid_1 = df_valid.loc[df['y'] == 1]
df_valid_0_x = df_valid_0.drop(['y'], axis=1)
df_valid_1_x = df_valid_1.drop(['y'], axis=1)
df_test_0 = df_test.loc[df['y'] == 0]
df_test_1 = df_test.loc[df['y'] == 1]
df_test_0_x = df_test_0.drop(['y'], axis=1)
df_test_1_x = df_test_1.drop(['y'], axis=1)
Standardization
scaler = StandardScaler().fit(df_train_0_x)
df_train_0_x_rescaled = scaler.transform(df_train_0_x)
df_valid_0_x_rescaled = scaler.transform(df_valid_0_x)
df_valid_x_rescaled = scaler.transform(df_valid.drop(['y'], axis =
1))
df_test_0_x_rescaled = scaler.transform(df_test_0_x)
df_test_x_rescaled = scaler.transform(df_test.drop(['y'], axis = 1))
Autoencoder Classifier
Initialization
nb_epoch = 200
batch_size = 128
input_dim = df_train_0_x_rescaled.shape[1] #num of predictor
variables,
encoding_dim = 32
hidden_dim = int(encoding_dim / 2)
learning_rate = 1e-3
input_layer = Input(shape=(input_dim, ))
encoder = Dense(encoding_dim, activation="relu",
activity_regularizer=regularizers.l1(learning_rate))(input_layer)
encoder = Dense(hidden_dim, activation="relu")(encoder)
decoder = Dense(hidden_dim, activation="relu")(encoder)
decoder = Dense(encoding_dim, activation="relu")(decoder)
decoder = Dense(input_dim, activation="linear")(decoder)
autoencoder = Model(inputs=input_layer, outputs=decoder)
autoencoder.summary()
Training
We will train the model and save it in a file. Saving a trained model is a good
practice for saving time for future analysis.
autoencoder.compile(metrics=['accuracy'],
loss='mean_squared_error',
optimizer='adam')
cp = ModelCheckpoint(filepath="autoencoder_classifier.h5",
save_best_only=True,
verbose=0)
tb = TensorBoard(log_dir='./logs',
histogram_freq=0,
write_graph=True,
write_images=True)
history = autoencoder.fit(df_train_0_x_rescaled,
df_train_0_x_rescaled,
epochs=nb_epoch,
batch_size=batch_size,
shuffle=True,
validation_data=(df_valid_0_x_rescaled,
df_valid_0_x_rescaled),
verbose=1,
callbacks=[cp, tb]).history
Figure 2. Loss for Autoencoder Training.
Classification
In the following, we show how we can use an Autoencoder reconstruction error for
the rare-event classification.
valid_x_predictions = autoencoder.predict(df_valid_x_rescaled)
mse = np.mean(np.power(df_valid_x_rescaled - valid_x_predictions,
2), axis=1)
error_df = pd.DataFrame({'Reconstruction_error': mse,
'True_class': df_valid['y']})
We should not estimate the classification threshold from the test data. It will result in
overfitting.
test_x_predictions = autoencoder.predict(df_test_x_rescaled)
mse = np.mean(np.power(df_test_x_rescaled - test_x_predictions, 2),
axis=1)
error_df_test = pd.DataFrame({'Reconstruction_error': mse,
'True_class': df_test['y']})
error_df_test = error_df_test.reset_index()
threshold_fixed = 0.4
groups = error_df_test.groupby('True_class')
fig, ax = plt.subplots()
for name, group in groups:
ax.plot(group.index, group.Reconstruction_error, marker='o',
ms=3.5, linestyle='',
label= "Break" if name == 1 else "Normal")
ax.hlines(threshold_fixed, ax.get_xlim()[0], ax.get_xlim()[1],
colors="r", zorder=100, label='Threshold')
ax.legend()
plt.title("Reconstruction error for different classes")
plt.ylabel("Reconstruction error")
plt.xlabel("Data point index")
plt.show();
Figure 4. Using threshold = 0.4 for classification. The orange and blue dots above the threshold line
represents the True Positive and False Positive, respectively.
In Figure 4, the orange and blue dot above the threshold line represents the True
Positive and False Positive, respectively. As we can see, we have good number of
false positives. To have a better look, we can see a confusion matrix.
We could predict 8 out of 41 breaks instances. Note that these instances include 2 or
4 minute ahead predictions. This is around 20%, which is a good recall rate for the
paper industry. The False Positive Rate is around 6%. This is not ideal but not
terrible for a mill.
Still, this model can be further improved to increase the recall rate with smaller
False Positive Rate. We will look at the AUC below and then talk about the next
approach for improvement.
Github repository
The entire code with comments are present here.
cran2367/autoencoder_classifier
Autoencoder model for rare event classification. Contribute to
cran2367/autoencoder_classifier development by creating…
github.com
LSTM Autoencoder
The problem discussed here is a (multivariate) time series. However, in the
Autoencoder model we are not taking into account the temporal
information/patterns. In the next post, we will explore if it is possible with an RNN.
We will try a LSTM autoencoder.
Conclusion
We worked on an extreme rare event binary labeled data from a paper mill to build
an Autoencoder Classifier. We achieved reasonable accuracy. The purpose here was
to demonstrate the use of a basic Autoencoder for rare event classification. We will
further work on developing other methods, including an LSTM Autoencoder that
can extract the temporal features for better accuracy.
The next post on LSTM Autoencoder is here, LSTM Autoencoder for rare event
classification.
Build the right Autoencoder — Tune and Optimize using PCA principles. Part II.
References
1. Ranjan, C., Mustonen, M., Paynabar, K., & Pourak, K. (2018). Dataset: Rare Event
Classification in Multivariate Time Series. arXiv preprint arXiv:1809.10717.
2. https://www.datascience.com/blog/fraud-detection-with-tensorflow
Disclaimer: The scope of this post is limited to a tutorial for building a Dense Layer
Autoencoder and using it as a rare-event classifier. A practitioner is expected to achieve
better results for this data by network tuning. The purpose of the article is helping Data
Scientists implement an Autoencoder.