In recent years, autoencoders have emerged as powerful tools in unsupervised learning, especially in image compression and reconstruction. The Perceptual Autoencoder is a specialized type of autoencoder that takes image reconstruction to the next level by optimizing for pixel-wise accuracy and perceptual similarity. This method ensures that reconstructed images are close to the original in terms of pixel values and retain high-level features that make them perceptually similar to the human eye.
In this article, we will explore the concept of perceptual autoencoders, how they differ from traditional autoencoders, and their practical applications in the field of computer vision.
Understanding Perceptual Autoencoders
A Perceptual Autoencoder goes beyond pixel-level reconstruction and focuses on preserving high-level features in the image, which humans perceive as important. Instead of using a standard loss function like MSE, it leverages Perceptual Loss by incorporating a pre-trained deep neural network (e.g., VGG19). This ensures that the autoencoder captures and retains essential features of the original image, leading to more visually appealing reconstructions.
Key Components:
- Perceptual Loss: The loss function is computed based on the difference in high-level features extracted from both the original and reconstructed images. These features are often taken from a pre-trained model like VGG19, which has learned meaningful image representations.
- Skip Connections: To improve the reconstruction further, perceptual autoencoders often include skip connections that allow features from the encoder to bypass directly to the decoder.
Perceptual Loss: The Core of Perceptual Autoencoders
What is Perceptual Loss?
Perceptual loss is calculated by comparing feature maps generated by a pre-trained convolutional neural network (CNN) like VGG19. These feature maps represent high-level abstractions of the image, such as textures, shapes, and object parts.
The most commonly used model for calculating perceptual loss is VGG19, which is pre-trained on the ImageNet dataset. Instead of calculating the loss directly on the pixel values, perceptual loss uses the activations from specific convolutional layers of the pre-trained model to compare the true and reconstructed images.
How Perceptual Loss Works:
- Feature Extraction: Both the true image and the reconstructed image are passed through a pre-trained CNN (e.g., VGG19) to extract feature maps from a specific layer.
- Loss Calculation: The difference between the feature maps of the true image and the reconstructed image is computed. The autoencoder is trained to minimize this difference, resulting in a better perceptual similarity between the images.
Architecture of a Perceptual Autoencoder
Encoder-Decoder Architecture with Skip Connections
The core structure of a perceptual autoencoder follows the traditional encoder-decoder model, with skip connections to retain low-level features, allowing them to pass from the encoder directly to the decoder. The autoencoder architecture consists of:
- Convolutional Layers: Used in both encoder and decoder to capture spatial hierarchies in the image.
- MaxPooling and UpSampling Layers: MaxPooling reduces the spatial dimensions in the encoder, while UpSampling increases them back in the decoder.
- Skip Connections: These connections allow the decoder to receive not just the encoded representation but also important features directly from corresponding layers in the encoder.
The skip connections improve both the quality of the reconstruction and the ability of the autoencoder to capture fine details.
Implementation: Perceptual Autoencoder for Enhanced Image Compression
Step 1: Installing Required Libraries
You need to install the required libraries to ensure the proper environment is set up for running the autoencoder and image processing.
!pip install --upgrade pip
!apt-get install graphviz -y
!pip install pydot plotly tensorflow
Step 2: Initializing the TPU Strategy
In this step, the code checks for the presence of a TPU (Tensor Processing Unit) and initializes a TPU strategy for distributed training. If no TPU is available, it defaults to a regular strategy.
Note: If you are working on google colab, you can change the runtime types as TPU.
- Go to Runtime > Change runtime type.
- Select TPU from the Hardware accelerator dropdown and click Save.
Python
import tensorflow as tf
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.TPUStrategy(tpu)
except:
print('No TPU present.')
tpu_strategy = tf.distribute.get_strategy()
Step 3: Loading the CIFAR-10 Dataset
The CIFAR-10 dataset is loaded and normalized (scaled to [0,1] range). It is used to train and test the autoencoder model.
Python
(x_train, _), (x_test, _) = tf.keras.datasets.cifar10.load_data()
print(f'Training data shape: {x_train.shape}')
print(f'Test data shape: {x_test.shape}')
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
Output:
Training data shape: (50000, 32, 32, 3)
Test data shape: (10000, 32, 32, 3)
Step 4: Defining the VGG Model for Perceptual Loss
This step defines a custom VGG model to extract features for perceptual loss calculation, which will be used to evaluate the difference between the original and reconstructed images.
Python
from tensorflow.keras.applications import VGG19
from tensorflow.keras.models import Model
def get_vgg_model():
vgg = VGG19(weights='imagenet', include_top=False, input_shape=(None, None, 3))
vgg.trainable = False
loss_model = Model(inputs=vgg.input, outputs=vgg.get_layer('block5_conv4').output)
loss_model.trainable = False
return loss_model
vgg_model = get_vgg_model()
Step 5: Defining Perceptual Loss Function
This function calculates the perceptual loss by computing the difference between the feature representations of the true and predicted images.
Python
def perceptual_loss(y_true, y_pred):
y_true_features = vgg_model(y_true)
y_pred_features = vgg_model(y_pred)
return tf.reduce_mean(tf.square(y_true_features - y_pred_features))
Step 6: Building the Autoencoder Model
This step builds an autoencoder model with skip connections for image reconstruction. The model is built within the TPU strategy scope for distributed training.
Python
from tensorflow.keras import layers, models
def build_autoencoder():
with tpu_strategy.scope():
input_img = layers.Input(shape=(None, None, 3))
# Encoder
x1 = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x1_p = layers.MaxPooling2D((2, 2), padding='same')(x1)
x2 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x1_p)
x2_p = layers.MaxPooling2D((2, 2), padding='same')(x2)
x3 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x2_p)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x3)
# Decoder
x4 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(encoded)
x4_up = layers.UpSampling2D((2, 2))(x4)
x4_concat = layers.Concatenate()([x4_up, x3])
x5 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x4_concat)
x5_up = layers.UpSampling2D((2, 2))(x5)
x5_conv = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x5_up)
x5_concat = layers.Concatenate()([x5_conv, x2])
x6 = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x5_concat)
x6_up = layers.UpSampling2D((2, 2))(x6)
x6_conv = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x6_up)
x6_concat = layers.Concatenate()([x6_conv, x1])
x6_concat = layers.Dropout(0.3)(x6_concat)
decoded = layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x6_concat)
autoencoder = models.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss=perceptual_loss)
return autoencoder
autoencoder = build_autoencoder()
autoencoder.summary()
Step 7: Model Training with Callbacks
The model is trained using callbacks for early stopping, reducing learning rate, saving the best model, and TensorBoard for logging.
Python
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, TensorBoard
import datetime
checkpoint_path = "/content/model_checkpoint.keras"
checkpoint_callback = ModelCheckpoint(filepath=checkpoint_path, save_best_only=True, verbose=1)
early_stopping_callback = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True, verbose=1)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=0.0001)
log_dir = "/content/logs" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)
callbacks = [checkpoint_callback, early_stopping_callback, reduce_lr, tensorboard_callback]
history = autoencoder.fit(x_train, x_train, epochs=5, validation_data=(x_test, x_test), callbacks=callbacks)
Output:
Model: "functional_3"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer_1 │ (None, None, │ 0 │ - │
│ (InputLayer) │ None, 3) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d (Conv2D) │ (None, None, │ 896 │ input_layer_1[0]… │
│ │ None, 32) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d │ (None, None, │ 0 │ conv2d[0][0] │
│ (MaxPooling2D) │ None, 32) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_1 (Conv2D) │ (None, None, │ 18,496 │ max_pooling2d[0]… │
│ │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_1 │ (None, None, │ 0 │ conv2d_1[0][0] │
│ (MaxPooling2D) │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_2 (Conv2D) │ (None, None, │ 73,856 │ max_pooling2d_1[… │
│ │ None, 128) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_2 │ (None, None, │ 0 │ conv2d_2[0][0] │
│ (MaxPooling2D) │ None, 128) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_3 (Conv2D) │ (None, None, │ 147,584 │ max_pooling2d_2[… │
│ │ None, 128) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ up_sampling2d │ (None, None, │ 0 │ conv2d_3[0][0] │
│ (UpSampling2D) │ None, 128) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate │ (None, None, │ 0 │ up_sampling2d[0]… │
│ (Concatenate) │ None, 256) │ │ conv2d_2[0][0] │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_4 (Conv2D) │ (None, None, │ 147,520 │ concatenate[0][0] │
│ │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ up_sampling2d_1 │ (None, None, │ 0 │ conv2d_4[0][0] │
│ (UpSampling2D) │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_5 (Conv2D) │ (None, None, │ 36,928 │ up_sampling2d_1[… │
│ │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_1 │ (None, None, │ 0 │ conv2d_5[0][0], │
│ (Concatenate) │ None, 128) │ │ conv2d_1[0][0] │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_6 (Conv2D) │ (None, None, │ 36,896 │ concatenate_1[0]… │
│ │ None, 32) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ up_sampling2d_2 │ (None, None, │ 0 │ conv2d_6[0][0] │
│ (UpSampling2D) │ None, 32) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_7 (Conv2D) │ (None, None, │ 9,248 │ up_sampling2d_2[… │
│ │ None, 32) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_2 │ (None, None, │ 0 │ conv2d_7[0][0], │
│ (Concatenate) │ None, 64) │ │ conv2d[0][0] │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout (Dropout) │ (None, None, │ 0 │ concatenate_2[0]… │
│ │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_8 (Conv2D) │ (None, None, │ 1,731 │ dropout[0][0] │
│ │ None, 3) │ │ │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 473,155 (1.80 MB)
Trainable params: 473,155 (1.80 MB)
Non-trainable params: 0 (0.00 B)
-------------------------------------------------------------------------------------------------------
Epoch 1: val_loss improved from inf to 0.03253, saving model to /kaggle/working/model_checkpoint.keras
196/196 ━━━━━━━━━━━━━━━━━━━━ 97s 388ms/step - loss: 0.0680 - val_loss: 0.0325 - learning_rate: 0.0010
Epoch 2/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 0s 166ms/step - loss: 0.0379
Epoch 2: val_loss improved from 0.03253 to 0.01777, saving model to /kaggle/working/model_checkpoint.keras
196/196 ━━━━━━━━━━━━━━━━━━━━ 55s 273ms/step - loss: 0.0379 - val_loss: 0.0178 - learning_rate: 0.0010
Epoch 3/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 0s 168ms/step - loss: 0.0322
Epoch 3: val_loss improved from 0.01777 to 0.01523, saving model to /kaggle/working/model_checkpoint.keras
196/196 ━━━━━━━━━━━━━━━━━━━━ 55s 271ms/step - loss: 0.0321 - val_loss: 0.0152 - learning_rate: 0.0010
.
.
.
Epoch 15/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 0s 168ms/step - loss: 0.0244
Epoch 15: val_loss did not improve from 0.00889
196/196 ━━━━━━━━━━━━━━━━━━━━ 55s 272ms/step - loss: 0.0244 - val_loss: 0.0118 - learning_rate: 5.0000e-04
Epoch 16/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 0s 168ms/step - loss: 0.0246
Epoch 16: val_loss did not improve from 0.00889
196/196 ━━━━━━━━━━━━━━━━━━━━ 55s 272ms/step - loss: 0.0246 - val_loss: 0.0101 - learning_rate: 5.0000e-04
Epoch 16: early stopping
Restoring model weights from the end of the best epoch: 11.
Step 8: Saving the Model Architecture
The model architecture is saved as a JSON file for future reference or reloading.
Python
with open('model.json', 'w') as json_file:
json_file.write(autoencoder.to_json())
'''This file will open in Netron(online).'''
Output:
'This file will open in Netron(online).'
Step 9: Model Evaluation Using MAE, PSNR, and SSIM
This step evaluates the trained autoencoder by calculating metrics such as Mean Absolute Error (MAE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM).
Python
from skimage.metrics import mean_squared_error as mse, peak_signal_noise_ratio as psnr, structural_similarity as ssim
# Calculate MAE
decoded_imgs = autoencoder.predict(x_test)
mae_values = [mse(x_test[i], decoded_imgs[i]) for i in range(len(x_test))]
average_mae = np.mean(mae_values)
print(f'Average MAE: {average_mae:.2f}')
# Calculate PSNR and SSIM
psnr_values = [psnr(x_test[i], decoded_imgs[i]) for i in range(len(x_test))]
ssim_values = [ssim(x_test[i], decoded_imgs[i], win_size=3, data_range=1.0, multichannel=True) for i in range(len(x_test))]
average_psnr = np.mean(psnr_values)
average_ssim = np.mean(ssim_values)
print(f'Average PSNR: {average_psnr:.2f}')
print(f'Average SSIM: {average_ssim:.4f}')
Output:
Average MAE: 0.02
Average PSNR: 16.94
Average SSIM: 0.5967
Compression Rate: 0.0833
Step 10: Visualizing the Results
Here, the original, compressed, and reconstructed images are visualized to compare the performance of the autoencoder.
Python
import matplotlib.pyplot as plt
n = 10
plt.figure(figsize=(20, 6))
for i in range(n):
# Display original
ax = plt.subplot(3, n, i + 1)
plt.imshow(x_test[i])
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# Display compressed
compressed_img = autoencoder.predict(x_test[i:i+1])[0]
compressed_img = np.mean(compressed_img, axis=-1).squeeze()
ax = plt.subplot(3, n, i + 1 + n)
plt.imshow(compressed_img, cmap='gray')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# Display reconstruction
reconstructed_img = decoded_imgs[i]
ax = plt.subplot(3, n, i + 1 + 2 * n)
plt.imshow(reconstructed_img)
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
Output:
The first row represents the original images, the second row represents the compressed images and the third row represents the reconstructed image. Complete Code
Python
!pip install --upgrade pip
!apt-get install graphviz -y
!pip install pydot plotly
import tensorflow as tf
try:
# detect and init the TPU
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
# instantiate a distribution strategy
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.TPUStrategy(tpu)
except:
print('No TPU present.')
tpu_strategy = tf.distribute.get_strategy()
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras import layers, models
from tensorflow.keras.applications import VGG19
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from skimage.metrics import peak_signal_noise_ratio as psnr, structural_similarity as ssim
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard, ReduceLROnPlateau
import datetime
# Load the Dataset
(x_train, _), (x_test, _) = tf.keras.datasets.cifar10.load_data()
print(f'Training data shape: {x_train.shape}')
print(f'Test data shape: {x_test.shape}')
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
# Define the VGG model for perceptual loss
def get_vgg_model():
vgg = VGG19(weights='imagenet', include_top=False, input_shape=(None, None, 3))
vgg.trainable = False
loss_model = Model(inputs=vgg.input, outputs=vgg.get_layer('block5_conv4').output)
loss_model.trainable = False
return loss_model
vgg_model = get_vgg_model()
def perceptual_loss(y_true, y_pred):
y_true_features = vgg_model(y_true)
y_pred_features = vgg_model(y_pred)
return tf.reduce_mean(tf.square(y_true_features - y_pred_features))
# Define the autoencoder model with skip connections
def build_autoencoder():
with tpu_strategy.scope():
input_img = layers.Input(shape=(None, None, 3))
# Encoder
x1 = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x1_p = layers.MaxPooling2D((2, 2), padding='same')(x1)
x2 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x1_p)
x2_p = layers.MaxPooling2D((2, 2), padding='same')(x2)
x3 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x2_p)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x3)
# Decoder
x4 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(encoded)
x4_up = layers.UpSampling2D((2, 2))(x4)
x4_concat = layers.Concatenate()([x4_up, x3])
x5 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x4_concat)
x5_up = layers.UpSampling2D((2, 2))(x5)
x5_conv = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x5_up)
x5_concat = layers.Concatenate()([x5_conv, x2])
x6 = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x5_concat)
x6_up = layers.UpSampling2D((2, 2))(x6)
x6_conv = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x6_up)
x6_concat = layers.Concatenate()([x6_conv, x1])
x6_concat = layers.Dropout(0.3)(x6_concat)
decoded = layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x6_concat)
autoencoder = models.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss=perceptual_loss)
return autoencoder
checkpoint_path = "/content/model_checkpoint.keras"
checkpoint_callback = ModelCheckpoint(filepath=checkpoint_path, save_best_only=True, verbose=1)
early_stopping_callback = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True, verbose=1)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=0.0001)
log_dir = "/content/logs" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)
callbacks = [checkpoint_callback, early_stopping_callback, tensorboard_callback, reduce_lr]
from tensorflow.keras.utils import plot_model
with tpu_strategy.scope():
autoencoder = build_autoencoder()
autoencoder.summary()
plot_model(autoencoder, to_file='model_plot.png', show_shapes=True)
print('\n-------------------------------------------------------------------------------------------------------\n')
history = autoencoder.fit(x_train, x_train, epochs=50, validation_data=(x_test, x_test), callbacks=callbacks)
with open('model.json', 'w') as json_file:
json_file.write(autoencoder.to_json())
'''This file will open in Netron(online).'''
# Plot the training and validation loss
plt.figure(figsize=(15,6))
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='train_loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss')
plt.show()
# Calculate Metrics
from skimage.metrics import mean_squared_error as mse
# Adjust win_size to a suitable value based on your image size
win_size = 3
data_range = 1.0
# Calculate MAE
decoded_imgs = autoencoder.predict(x_test)
mae_values = [mse(x_test[i], decoded_imgs[i]) for i in range(len(x_test))]
average_mae = np.mean(mae_values)
print(f'Average MAE: {average_mae:.2f}')
# Calculate PSNR & SSIM
psnr_values = [psnr(x_test[i], decoded_imgs[i]) for i in range(len(x_test))]
ssim_values = [ssim(x_test[i], decoded_imgs[i], win_size=win_size, data_range=data_range, multichannel=True) for i in range(len(x_test))]
# Compute average PSNR and SSIM
average_psnr = np.mean(psnr_values)
average_ssim = np.mean(ssim_values)
print(f'Average PSNR: {average_psnr:.2f}')
print(f'Average SSIM: {average_ssim:.4f}')
# Calculate compression rate
original_size = x_test.shape[1] * x_test.shape[2] * x_test.shape[3]
encoded_size = 32 * 32 * 128 // (8 * 8 * 8) # considering the deepest layer
compression_rate = encoded_size / original_size
print(f'Compression Rate: {compression_rate:.4f}')
n = 10
plt.figure(figsize=(20, 6))
for i in range(n):
# Display original
ax = plt.subplot(3, n, i + 1)
plt.imshow(x_test[i])
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# Display compressed
compressed_img = autoencoder.predict(x_test[i:i+1])[0]
compressed_img = np.mean(compressed_img, axis=-1).squeeze()
ax = plt.subplot(3, n, i + 1 + n)
plt.imshow(compressed_img, cmap='gray')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# Display reconstruction
reconstructed_img = decoded_imgs[i]
ax = plt.subplot(3, n, i + 1 + 2 * n)
plt.imshow(reconstructed_img)
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
Image Compression and Reconstruction on Unknown Data
The model is used to compress and reconstruct an external image. The PSNR and SSIM values are computed to evaluate the quality of reconstruction.
Python
from PIL import Image
# Function to load and preprocess an image
def load_image(file_path, target_size):
img = Image.open(file_path)
img = img.convert('RGB')
img = img.resize(target_size, Image.LANCZOS) # or Image.BILINEAR
img_array = np.array(img) / 255.0 # Normalize to [0, 1]
return img_array
# Function to compress and reconstruct an image using the autoencoder
def process_image(img):
compressed_img = autoencoder.predict(np.expand_dims(img, axis=0))[0]
reconstructed_img = autoencoder.predict(np.expand_dims(img, axis=0))[0]
return compressed_img, reconstructed_img
# Function to calculate PSNR and SSIM
def calculate_metrics(img1, img2):
psnr_value = psnr(img1, img2)
ssim_value = ssim(img1, img2, win_size=3, data_range=1.0, multichannel=True)
return psnr_value, ssim_value
# Example usage
input_image_path = '4k-images.jpeg'
target_size = (1920,1080)
compressed_image, reconstructed_image = process_image(input_image)
psnr_value, ssim_value = calculate_metrics(input_image, reconstructed_image)
# Displaying results
plt.figure(figsize=(15, 7))
# Original image
plt.subplot(1, 3, 1)
plt.imshow(input_image)
plt.title('Original Image')
plt.axis('off')
# Compressed image
plt.subplot(1, 3, 2)
plt.imshow(compressed_image)
plt.title('Compressed Image')
plt.axis('off')
# Reconstructed image
plt.subplot(1, 3, 3)
plt.imshow(reconstructed_image)
plt.title('Reconstructed Image\nPSNR: {:.2f}, SSIM: {:.4f}'.format(psnr_value, ssim_value))
plt.axis('off')
plt.tight_layout()
plt.show()
# Additional information
original_size = input_image.nbytes / 1024 # in KB
compressed_size = compressed_image.nbytes / 1024 # in KB
compression_rate = compressed_size / original_size
print(f'Original Image Size: {original_size:.2f} KB')
print(f'Compressed Image Size: {compressed_size:.2f} KB')
print(f'Compression Rate: {compression_rate:.2f}')
Output:
Original Image Size: 48600.00 KB
Compressed Image Size: 24300.00 KB
Compression Rate: 0.50
Applications of Perceptual Autoencoders
Perceptual autoencoders find several practical applications in image processing and computer vision tasks:
- Image Super-Resolution: Perceptual autoencoders are widely used in super-resolution tasks, where low-resolution images are upscaled to higher resolutions. By optimizing for perceptual loss, these models ensure that the high-resolution image is sharp and contains realistic textures.
- Image Denoising: Autoencoders trained with perceptual loss are effective at removing noise from images while preserving important details, resulting in clearer and more visually appealing images.
- Image Compression: Perceptual autoencoders can be used for image compression by encoding images into a compressed latent space and reconstructing them without significant loss of quality. The use of perceptual loss ensures that important visual features are retained.
- Style Transfer: In style transfer applications, perceptual autoencoders are used to apply the artistic style of one image to another while retaining the content structure of the target image.
Evaluating Perceptual Autoencoders: Key Metrics
- Peak Signal-to-Noise Ratio (PSNR): PSNR measures the difference between the original and reconstructed images in terms of pixel intensity. A higher PSNR value indicates better reconstruction.
- Structural Similarity Index (SSIM): SSIM compares the structural similarity between the original and reconstructed images, taking into account luminance, contrast, and texture. A higher SSIM value indicates greater perceptual similarity.
- Mean Absolute Error (MAE): MAE calculates the average absolute difference between pixel values of the original and reconstructed images. Although perceptual autoencoders prioritize perceptual similarity, MAE can still provide insights into pixel-level accuracy.
Conclusion
Perceptual autoencoders represent a significant advancement in image reconstruction by focusing on perceptual similarity rather than pixel-wise accuracy. By incorporating perceptual loss functions and deep convolutional models like VGG19, these autoencoders deliver high-quality reconstructions that maintain essential image features as perceived by the human eye.
Similar Reads
Implement Deep Autoencoder in PyTorch for Image Reconstruction
Since the availability of staggering amounts of data on the internet, researchers and scientists from industry and academia keep trying to develop more efficient and reliable data transfer modes than the current state-of-the-art methods. Autoencoders are one of the key elements found in recent times
8 min read
Enhancing Deep Learning Models with Albumentations: A Guide to Image Augmentation
Albumentattion is a powerful Python library derived from and is for image augmentation. This library offers a wide range of image transformations that can be applied to image datasets before using them for training models. Image augmentation is a crucial preprocessing step as it increases dataset di
7 min read
Sparse Autoencoders in Deep Learning
Sparse autoencoders are a specific form of autoencoder that's been trained for feature learning and dimensionality reduction. As opposed to regular autoencoders, which are trained to reconstruct the input data in the output, sparse autoencoders add a sparsity penalty that encourages the hidden layer
5 min read
Denoising AutoEncoders In Machine Learning
Autoencoders are types of neural network architecture used for unsupervised learning. The architecture consists of an encoder and a decoder. The encoder encodes the input data into a lower dimensional space while the decoder decodes the encoded data back to the original input. The network is trained
10 min read
Masked Autoencoders in Deep Learning
Masked autoencoders are neural network models designed to reconstruct input data from partially masked or corrupted versions, helping the model learn robust feature representations. They are significant in deep learning for tasks such as data denoising, anomaly detection, and improving model general
11 min read
Holistically-Nested Edge Detection with OpenCV and Deep Learning
Holistically-nested edge detection (HED) is a deep learning model that uses fully convolutional neural networks and deeply-supervised nets to do image-to-image prediction. HED develops rich hierarchical representations automatically (directed by deep supervision on side replies) that are critical fo
3 min read
Human Activity Recognition - Using Deep Learning Model
Human activity recognition using smartphone sensors like accelerometer is one of the hectic topics of research. HAR is one of the time series classification problem. In this project various machine learning and deep learning models have been worked out to get the best final result. In the same seque
6 min read
Deep Learning with PyTorch | An Introduction
PyTorch in a lot of ways behaves like the arrays we love from Numpy. These Numpy arrays, after all, are just tensors. PyTorch takes these tensors and makes it simple to move them to GPUs for the faster processing needed when training neural networks. It also provides a module that automatically calc
7 min read
Python OpenCV - Super resolution with deep learning
Super-resolution (SR) implies the conversion of an image from a lower resolution (LR) to images with a higher resolution (HR). It makes wide use of augmentation. It forms the basis of most computer vision and image processing models. However, with the advancements in deep learning technologies, deep
9 min read
Train a Deep Learning Model With Pytorch
Neural Network is a type of machine learning model inspired by the structure and function of human brain. It consists of layers of interconnected nodes called neurons which process and transmit information. Neural networks are particularly well-suited for tasks such as image and speech recognition,
6 min read