Open In App

Perceptual Autoencoder: Enhancing Image Reconstruction with Deep Learning

Last Updated : 04 Sep, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In recent years, autoencoders have emerged as powerful tools in unsupervised learning, especially in image compression and reconstruction. The Perceptual Autoencoder is a specialized type of autoencoder that takes image reconstruction to the next level by optimizing for pixel-wise accuracy and perceptual similarity. This method ensures that reconstructed images are close to the original in terms of pixel values and retain high-level features that make them perceptually similar to the human eye.

In this article, we will explore the concept of perceptual autoencoders, how they differ from traditional autoencoders, and their practical applications in the field of computer vision.

Understanding Perceptual Autoencoders

A Perceptual Autoencoder goes beyond pixel-level reconstruction and focuses on preserving high-level features in the image, which humans perceive as important. Instead of using a standard loss function like MSE, it leverages Perceptual Loss by incorporating a pre-trained deep neural network (e.g., VGG19). This ensures that the autoencoder captures and retains essential features of the original image, leading to more visually appealing reconstructions.

Key Components:

  • Perceptual Loss: The loss function is computed based on the difference in high-level features extracted from both the original and reconstructed images. These features are often taken from a pre-trained model like VGG19, which has learned meaningful image representations.
  • Skip Connections: To improve the reconstruction further, perceptual autoencoders often include skip connections that allow features from the encoder to bypass directly to the decoder.

Perceptual Loss: The Core of Perceptual Autoencoders

What is Perceptual Loss?

Perceptual loss is calculated by comparing feature maps generated by a pre-trained convolutional neural network (CNN) like VGG19. These feature maps represent high-level abstractions of the image, such as textures, shapes, and object parts.

The most commonly used model for calculating perceptual loss is VGG19, which is pre-trained on the ImageNet dataset. Instead of calculating the loss directly on the pixel values, perceptual loss uses the activations from specific convolutional layers of the pre-trained model to compare the true and reconstructed images.

How Perceptual Loss Works:

  1. Feature Extraction: Both the true image and the reconstructed image are passed through a pre-trained CNN (e.g., VGG19) to extract feature maps from a specific layer.
  2. Loss Calculation: The difference between the feature maps of the true image and the reconstructed image is computed. The autoencoder is trained to minimize this difference, resulting in a better perceptual similarity between the images.

Architecture of a Perceptual Autoencoder

Encoder-Decoder Architecture with Skip Connections

The core structure of a perceptual autoencoder follows the traditional encoder-decoder model, with skip connections to retain low-level features, allowing them to pass from the encoder directly to the decoder. The autoencoder architecture consists of:

  • Convolutional Layers: Used in both encoder and decoder to capture spatial hierarchies in the image.
  • MaxPooling and UpSampling Layers: MaxPooling reduces the spatial dimensions in the encoder, while UpSampling increases them back in the decoder.
  • Skip Connections: These connections allow the decoder to receive not just the encoded representation but also important features directly from corresponding layers in the encoder.

The skip connections improve both the quality of the reconstruction and the ability of the autoencoder to capture fine details.

Implementation: Perceptual Autoencoder for Enhanced Image Compression

Step 1: Installing Required Libraries

You need to install the required libraries to ensure the proper environment is set up for running the autoencoder and image processing.

!pip install --upgrade pip
!apt-get install graphviz -y
!pip install pydot plotly tensorflow

Step 2: Initializing the TPU Strategy

In this step, the code checks for the presence of a TPU (Tensor Processing Unit) and initializes a TPU strategy for distributed training. If no TPU is available, it defaults to a regular strategy.

Note: If you are working on google colab, you can change the runtime types as TPU.

  • Go to Runtime > Change runtime type.
  • Select TPU from the Hardware accelerator dropdown and click Save.
Python
import tensorflow as tf

try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    tf.tpu.experimental.initialize_tpu_system(tpu)
    tpu_strategy = tf.distribute.TPUStrategy(tpu)
except:
    print('No TPU present.')
    tpu_strategy = tf.distribute.get_strategy()

Step 3: Loading the CIFAR-10 Dataset

The CIFAR-10 dataset is loaded and normalized (scaled to [0,1] range). It is used to train and test the autoencoder model.

Python
(x_train, _), (x_test, _) = tf.keras.datasets.cifar10.load_data()
print(f'Training data shape: {x_train.shape}')
print(f'Test data shape: {x_test.shape}')

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

Output:

Training data shape: (50000, 32, 32, 3)
Test data shape: (10000, 32, 32, 3)

Step 4: Defining the VGG Model for Perceptual Loss

This step defines a custom VGG model to extract features for perceptual loss calculation, which will be used to evaluate the difference between the original and reconstructed images.

Python
from tensorflow.keras.applications import VGG19
from tensorflow.keras.models import Model

def get_vgg_model():
    vgg = VGG19(weights='imagenet', include_top=False, input_shape=(None, None, 3))
    vgg.trainable = False
    loss_model = Model(inputs=vgg.input, outputs=vgg.get_layer('block5_conv4').output)
    loss_model.trainable = False
    return loss_model

vgg_model = get_vgg_model()


Step 5: Defining Perceptual Loss Function

This function calculates the perceptual loss by computing the difference between the feature representations of the true and predicted images.

Python
def perceptual_loss(y_true, y_pred):
    y_true_features = vgg_model(y_true)
    y_pred_features = vgg_model(y_pred)
    return tf.reduce_mean(tf.square(y_true_features - y_pred_features))


Step 6: Building the Autoencoder Model

This step builds an autoencoder model with skip connections for image reconstruction. The model is built within the TPU strategy scope for distributed training.

Python
from tensorflow.keras import layers, models

def build_autoencoder():
    with tpu_strategy.scope():
        input_img = layers.Input(shape=(None, None, 3))

        # Encoder
        x1 = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
        x1_p = layers.MaxPooling2D((2, 2), padding='same')(x1)
        x2 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x1_p)
        x2_p = layers.MaxPooling2D((2, 2), padding='same')(x2)
        x3 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x2_p)
        encoded = layers.MaxPooling2D((2, 2), padding='same')(x3)

        # Decoder
        x4 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(encoded)
        x4_up = layers.UpSampling2D((2, 2))(x4)
        x4_concat = layers.Concatenate()([x4_up, x3])
        x5 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x4_concat)
        x5_up = layers.UpSampling2D((2, 2))(x5)
        x5_conv = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x5_up)
        x5_concat = layers.Concatenate()([x5_conv, x2])
        x6 = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x5_concat)
        x6_up = layers.UpSampling2D((2, 2))(x6)
        x6_conv = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x6_up)
        x6_concat = layers.Concatenate()([x6_conv, x1])
        x6_concat = layers.Dropout(0.3)(x6_concat)

        decoded = layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x6_concat)

        autoencoder = models.Model(input_img, decoded)
        autoencoder.compile(optimizer='adam', loss=perceptual_loss)
        return autoencoder

autoencoder = build_autoencoder()
autoencoder.summary()


Step 7: Model Training with Callbacks

The model is trained using callbacks for early stopping, reducing learning rate, saving the best model, and TensorBoard for logging.

Python
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, TensorBoard
import datetime

checkpoint_path = "/content/model_checkpoint.keras"
checkpoint_callback = ModelCheckpoint(filepath=checkpoint_path, save_best_only=True, verbose=1)
early_stopping_callback = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True, verbose=1)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=0.0001)
log_dir = "/content/logs" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)

callbacks = [checkpoint_callback, early_stopping_callback, reduce_lr, tensorboard_callback]

history = autoencoder.fit(x_train, x_train, epochs=5, validation_data=(x_test, x_test), callbacks=callbacks)

Output:

Model: "functional_3"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ input_layer_1 │ (None, None, │ 0 │ - │
│ (InputLayer) │ None, 3) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d (Conv2D) │ (None, None, │ 896 │ input_layer_1[0]… │
│ │ None, 32) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d │ (None, None, │ 0 │ conv2d[0][0] │
│ (MaxPooling2D) │ None, 32) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_1 (Conv2D) │ (None, None, │ 18,496 │ max_pooling2d[0]… │
│ │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_1 │ (None, None, │ 0 │ conv2d_1[0][0] │
│ (MaxPooling2D) │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_2 (Conv2D) │ (None, None, │ 73,856 │ max_pooling2d_1[… │
│ │ None, 128) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ max_pooling2d_2 │ (None, None, │ 0 │ conv2d_2[0][0] │
│ (MaxPooling2D) │ None, 128) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_3 (Conv2D) │ (None, None, │ 147,584 │ max_pooling2d_2[… │
│ │ None, 128) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ up_sampling2d │ (None, None, │ 0 │ conv2d_3[0][0] │
│ (UpSampling2D) │ None, 128) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate │ (None, None, │ 0 │ up_sampling2d[0]… │
│ (Concatenate) │ None, 256) │ │ conv2d_2[0][0] │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_4 (Conv2D) │ (None, None, │ 147,520 │ concatenate[0][0] │
│ │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ up_sampling2d_1 │ (None, None, │ 0 │ conv2d_4[0][0] │
│ (UpSampling2D) │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_5 (Conv2D) │ (None, None, │ 36,928 │ up_sampling2d_1[… │
│ │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_1 │ (None, None, │ 0 │ conv2d_5[0][0], │
│ (Concatenate) │ None, 128) │ │ conv2d_1[0][0] │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_6 (Conv2D) │ (None, None, │ 36,896 │ concatenate_1[0]… │
│ │ None, 32) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ up_sampling2d_2 │ (None, None, │ 0 │ conv2d_6[0][0] │
│ (UpSampling2D) │ None, 32) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_7 (Conv2D) │ (None, None, │ 9,248 │ up_sampling2d_2[… │
│ │ None, 32) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ concatenate_2 │ (None, None, │ 0 │ conv2d_7[0][0], │
│ (Concatenate) │ None, 64) │ │ conv2d[0][0] │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ dropout (Dropout) │ (None, None, │ 0 │ concatenate_2[0]… │
│ │ None, 64) │ │ │
├─────────────────────┼───────────────────┼────────────┼───────────────────┤
│ conv2d_8 (Conv2D) │ (None, None, │ 1,731 │ dropout[0][0] │
│ │ None, 3) │ │ │
└─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 473,155 (1.80 MB)
Trainable params: 473,155 (1.80 MB)
Non-trainable params: 0 (0.00 B)
-------------------------------------------------------------------------------------------------------
Epoch 1: val_loss improved from inf to 0.03253, saving model to /kaggle/working/model_checkpoint.keras
196/196 ━━━━━━━━━━━━━━━━━━━━ 97s 388ms/step - loss: 0.0680 - val_loss: 0.0325 - learning_rate: 0.0010
Epoch 2/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 0s 166ms/step - loss: 0.0379
Epoch 2: val_loss improved from 0.03253 to 0.01777, saving model to /kaggle/working/model_checkpoint.keras
196/196 ━━━━━━━━━━━━━━━━━━━━ 55s 273ms/step - loss: 0.0379 - val_loss: 0.0178 - learning_rate: 0.0010
Epoch 3/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 0s 168ms/step - loss: 0.0322
Epoch 3: val_loss improved from 0.01777 to 0.01523, saving model to /kaggle/working/model_checkpoint.keras
196/196 ━━━━━━━━━━━━━━━━━━━━ 55s 271ms/step - loss: 0.0321 - val_loss: 0.0152 - learning_rate: 0.0010
.
.
.
Epoch 15/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 0s 168ms/step - loss: 0.0244
Epoch 15: val_loss did not improve from 0.00889
196/196 ━━━━━━━━━━━━━━━━━━━━ 55s 272ms/step - loss: 0.0244 - val_loss: 0.0118 - learning_rate: 5.0000e-04
Epoch 16/50
196/196 ━━━━━━━━━━━━━━━━━━━━ 0s 168ms/step - loss: 0.0246
Epoch 16: val_loss did not improve from 0.00889
196/196 ━━━━━━━━━━━━━━━━━━━━ 55s 272ms/step - loss: 0.0246 - val_loss: 0.0101 - learning_rate: 5.0000e-04
Epoch 16: early stopping
Restoring model weights from the end of the best epoch: 11.

Step 8: Saving the Model Architecture

The model architecture is saved as a JSON file for future reference or reloading.

Python
with open('model.json', 'w') as json_file:
    json_file.write(autoencoder.to_json())
'''This file will open in Netron(online).'''

Output:

'This file will open in Netron(online).'

Step 9: Model Evaluation Using MAE, PSNR, and SSIM

This step evaluates the trained autoencoder by calculating metrics such as Mean Absolute Error (MAE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM).

Python
from skimage.metrics import mean_squared_error as mse, peak_signal_noise_ratio as psnr, structural_similarity as ssim

# Calculate MAE
decoded_imgs = autoencoder.predict(x_test)
mae_values = [mse(x_test[i], decoded_imgs[i]) for i in range(len(x_test))]
average_mae = np.mean(mae_values)
print(f'Average MAE: {average_mae:.2f}')

# Calculate PSNR and SSIM
psnr_values = [psnr(x_test[i], decoded_imgs[i]) for i in range(len(x_test))]
ssim_values = [ssim(x_test[i], decoded_imgs[i], win_size=3, data_range=1.0, multichannel=True) for i in range(len(x_test))]

average_psnr = np.mean(psnr_values)
average_ssim = np.mean(ssim_values)

print(f'Average PSNR: {average_psnr:.2f}')
print(f'Average SSIM: {average_ssim:.4f}')

Output:

Average MAE: 0.02
Average PSNR: 16.94
Average SSIM: 0.5967
Compression Rate: 0.0833

Step 10: Visualizing the Results

Here, the original, compressed, and reconstructed images are visualized to compare the performance of the autoencoder.

Python
import matplotlib.pyplot as plt
n = 10
plt.figure(figsize=(20, 6))
for i in range(n):
    # Display original
    ax = plt.subplot(3, n, i + 1)
    plt.imshow(x_test[i])
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display compressed
    compressed_img = autoencoder.predict(x_test[i:i+1])[0]
    compressed_img = np.mean(compressed_img, axis=-1).squeeze()
    ax = plt.subplot(3, n, i + 1 + n)
    plt.imshow(compressed_img, cmap='gray')
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display reconstruction
    reconstructed_img = decoded_imgs[i]
    ax = plt.subplot(3, n, i + 1 + 2 * n)
    plt.imshow(reconstructed_img)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

Output:

Image
The first row represents the original images, the second row represents the compressed images and the third row represents the reconstructed image.

Complete Code

Python
!pip install --upgrade pip
!apt-get install graphviz -y
!pip install pydot plotly

import tensorflow as tf

try:
    # detect and init the TPU
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()

    # instantiate a distribution strategy
    tf.tpu.experimental.initialize_tpu_system(tpu)
    tpu_strategy = tf.distribute.TPUStrategy(tpu)
except:
    print('No TPU present.')
    tpu_strategy = tf.distribute.get_strategy()
    
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras import layers, models
from tensorflow.keras.applications import VGG19
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from skimage.metrics import peak_signal_noise_ratio as psnr, structural_similarity as ssim
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard, ReduceLROnPlateau
import datetime

# Load the Dataset 
(x_train, _), (x_test, _) = tf.keras.datasets.cifar10.load_data()
print(f'Training data shape: {x_train.shape}')
print(f'Test data shape: {x_test.shape}')
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

# Define the VGG model for perceptual loss
def get_vgg_model():
    vgg = VGG19(weights='imagenet', include_top=False, input_shape=(None, None, 3))
    vgg.trainable = False
    loss_model = Model(inputs=vgg.input, outputs=vgg.get_layer('block5_conv4').output)
    loss_model.trainable = False
    return loss_model

vgg_model = get_vgg_model()

def perceptual_loss(y_true, y_pred):
    y_true_features = vgg_model(y_true)
    y_pred_features = vgg_model(y_pred)
    return tf.reduce_mean(tf.square(y_true_features - y_pred_features))

# Define the autoencoder model with skip connections
def build_autoencoder():
    with tpu_strategy.scope():
        input_img = layers.Input(shape=(None, None, 3))

        # Encoder
        x1 = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
        x1_p = layers.MaxPooling2D((2, 2), padding='same')(x1)
        x2 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x1_p)
        x2_p = layers.MaxPooling2D((2, 2), padding='same')(x2)
        x3 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(x2_p)
        encoded = layers.MaxPooling2D((2, 2), padding='same')(x3)

        # Decoder
        x4 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(encoded)
        x4_up = layers.UpSampling2D((2, 2))(x4)
        x4_concat = layers.Concatenate()([x4_up, x3])
        x5 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x4_concat)
        x5_up = layers.UpSampling2D((2, 2))(x5)
        x5_conv = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(x5_up)
        x5_concat = layers.Concatenate()([x5_conv, x2])
        x6 = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x5_concat)
        x6_up = layers.UpSampling2D((2, 2))(x6)
        x6_conv = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x6_up)
        x6_concat = layers.Concatenate()([x6_conv, x1])
        x6_concat = layers.Dropout(0.3)(x6_concat)

        decoded = layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x6_concat)

        autoencoder = models.Model(input_img, decoded)
        autoencoder.compile(optimizer='adam', loss=perceptual_loss)
        return autoencoder
      
checkpoint_path = "/content/model_checkpoint.keras"
checkpoint_callback = ModelCheckpoint(filepath=checkpoint_path, save_best_only=True, verbose=1)
early_stopping_callback = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True, verbose=1)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=0.0001)
log_dir = "/content/logs" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)

callbacks = [checkpoint_callback, early_stopping_callback, tensorboard_callback, reduce_lr]

from tensorflow.keras.utils import plot_model

with tpu_strategy.scope():
    autoencoder = build_autoencoder()
    autoencoder.summary()
    plot_model(autoencoder, to_file='model_plot.png', show_shapes=True)

    print('\n-------------------------------------------------------------------------------------------------------\n')

    history = autoencoder.fit(x_train, x_train, epochs=50, validation_data=(x_test, x_test), callbacks=callbacks)
    
with open('model.json', 'w') as json_file:
    json_file.write(autoencoder.to_json())

'''This file will open in Netron(online).'''
# Plot the training and validation loss
plt.figure(figsize=(15,6))
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='train_loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Validation Loss')
plt.show()

# Calculate Metrics 
from skimage.metrics import mean_squared_error as mse

# Adjust win_size to a suitable value based on your image size
win_size = 3
data_range = 1.0

# Calculate MAE
decoded_imgs = autoencoder.predict(x_test)
mae_values = [mse(x_test[i], decoded_imgs[i]) for i in range(len(x_test))]
average_mae = np.mean(mae_values)

print(f'Average MAE: {average_mae:.2f}')

# Calculate PSNR & SSIM
psnr_values = [psnr(x_test[i], decoded_imgs[i]) for i in range(len(x_test))]
ssim_values = [ssim(x_test[i], decoded_imgs[i], win_size=win_size, data_range=data_range, multichannel=True) for i in range(len(x_test))]

# Compute average PSNR and SSIM
average_psnr = np.mean(psnr_values)
average_ssim = np.mean(ssim_values)

print(f'Average PSNR: {average_psnr:.2f}')
print(f'Average SSIM: {average_ssim:.4f}')

# Calculate compression rate
original_size = x_test.shape[1] * x_test.shape[2] * x_test.shape[3]
encoded_size = 32 * 32 * 128 // (8 * 8 * 8)                            # considering the deepest layer
compression_rate = encoded_size / original_size
print(f'Compression Rate: {compression_rate:.4f}')

n = 10
plt.figure(figsize=(20, 6))
for i in range(n):
    # Display original
    ax = plt.subplot(3, n, i + 1)
    plt.imshow(x_test[i])
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display compressed
    compressed_img = autoencoder.predict(x_test[i:i+1])[0]
    compressed_img = np.mean(compressed_img, axis=-1).squeeze()
    ax = plt.subplot(3, n, i + 1 + n)
    plt.imshow(compressed_img, cmap='gray')
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display reconstruction
    reconstructed_img = decoded_imgs[i]
    ax = plt.subplot(3, n, i + 1 + 2 * n)
    plt.imshow(reconstructed_img)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

Image Compression and Reconstruction on Unknown Data

The model is used to compress and reconstruct an external image. The PSNR and SSIM values are computed to evaluate the quality of reconstruction.

Python
from PIL import Image

# Function to load and preprocess an image
def load_image(file_path, target_size):
    img = Image.open(file_path)
    img = img.convert('RGB')
    img = img.resize(target_size, Image.LANCZOS)  # or Image.BILINEAR
    img_array = np.array(img) / 255.0            # Normalize to [0, 1]
    return img_array

# Function to compress and reconstruct an image using the autoencoder
def process_image(img):
    compressed_img = autoencoder.predict(np.expand_dims(img, axis=0))[0]
    reconstructed_img = autoencoder.predict(np.expand_dims(img, axis=0))[0]
    return compressed_img, reconstructed_img

# Function to calculate PSNR and SSIM
def calculate_metrics(img1, img2):
    psnr_value = psnr(img1, img2)
    ssim_value = ssim(img1, img2, win_size=3, data_range=1.0, multichannel=True)
    return psnr_value, ssim_value

# Example usage
input_image_path = '4k-images.jpeg'             
target_size = (1920,1080)                                                        

compressed_image, reconstructed_image = process_image(input_image)
psnr_value, ssim_value = calculate_metrics(input_image, reconstructed_image)

# Displaying results
plt.figure(figsize=(15, 7))

# Original image
plt.subplot(1, 3, 1)
plt.imshow(input_image)
plt.title('Original Image')
plt.axis('off')

# Compressed image
plt.subplot(1, 3, 2)
plt.imshow(compressed_image)
plt.title('Compressed Image')
plt.axis('off')

# Reconstructed image
plt.subplot(1, 3, 3)
plt.imshow(reconstructed_image)
plt.title('Reconstructed Image\nPSNR: {:.2f}, SSIM: {:.4f}'.format(psnr_value, ssim_value))
plt.axis('off')

plt.tight_layout()
plt.show()

# Additional information
original_size = input_image.nbytes / 1024  # in KB
compressed_size = compressed_image.nbytes / 1024  # in KB
compression_rate = compressed_size / original_size

print(f'Original Image Size: {original_size:.2f} KB')
print(f'Compressed Image Size: {compressed_size:.2f} KB')
print(f'Compression Rate: {compression_rate:.2f}')

Output:

iamge
Original Image Size: 48600.00 KB
Compressed Image Size: 24300.00 KB
Compression Rate: 0.50

Applications of Perceptual Autoencoders

Perceptual autoencoders find several practical applications in image processing and computer vision tasks:

  1. Image Super-Resolution: Perceptual autoencoders are widely used in super-resolution tasks, where low-resolution images are upscaled to higher resolutions. By optimizing for perceptual loss, these models ensure that the high-resolution image is sharp and contains realistic textures.
  2. Image Denoising: Autoencoders trained with perceptual loss are effective at removing noise from images while preserving important details, resulting in clearer and more visually appealing images.
  3. Image Compression: Perceptual autoencoders can be used for image compression by encoding images into a compressed latent space and reconstructing them without significant loss of quality. The use of perceptual loss ensures that important visual features are retained.
  4. Style Transfer: In style transfer applications, perceptual autoencoders are used to apply the artistic style of one image to another while retaining the content structure of the target image.

Evaluating Perceptual Autoencoders: Key Metrics

  1. Peak Signal-to-Noise Ratio (PSNR): PSNR measures the difference between the original and reconstructed images in terms of pixel intensity. A higher PSNR value indicates better reconstruction.
  2. Structural Similarity Index (SSIM): SSIM compares the structural similarity between the original and reconstructed images, taking into account luminance, contrast, and texture. A higher SSIM value indicates greater perceptual similarity.
  3. Mean Absolute Error (MAE): MAE calculates the average absolute difference between pixel values of the original and reconstructed images. Although perceptual autoencoders prioritize perceptual similarity, MAE can still provide insights into pixel-level accuracy.

Conclusion

Perceptual autoencoders represent a significant advancement in image reconstruction by focusing on perceptual similarity rather than pixel-wise accuracy. By incorporating perceptual loss functions and deep convolutional models like VGG19, these autoencoders deliver high-quality reconstructions that maintain essential image features as perceived by the human eye.


Next Article

Similar Reads