LB12_Implement GAN for neural style transfer (1).ipynb - Colab
LB12_Implement GAN for neural style transfer (1).ipynb - Colab
ipynb - Colab
Yellow Labrador Looking, from Wikimedia Commons by Elf. License CC BY-SA 3.0
Now, what would it look like if Kandinsky decided to paint the picture of this Dog exclusively with this style? Something like this?
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 1/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
keyboard_arrow_down Setup
keyboard_arrow_down Import and configure modules
1 import os
2 import tensorflow as tf
3 # Load compressed models from tensorflow_hub
4 os.environ['TFHUB_MODEL_LOAD_FORMAT'] = 'COMPRESSED'
1 def tensor_to_image(tensor):
2 tensor = tensor*255
3 tensor = np.array(tensor, dtype=np.uint8)
4 if np.ndim(tensor)>3:
5 assert tensor.shape[0] == 1
6 tensor = tensor[0]
7 return PIL.Image.fromarray(tensor)
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 2/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
1 def load_img(path_to_img):
2 max_dim = 512
3 img = tf.io.read_file(path_to_img)
4 img = tf.image.decode_image(img, channels=3)
5 img = tf.image.convert_image_dtype(img, tf.float32)
6
7 shape = tf.cast(tf.shape(img)[:-1], tf.float32)
8 long_dim = max(shape)
9 scale = max_dim / long_dim
10
11 new_shape = tf.cast(shape * scale, tf.int32)
12
13 img = tf.image.resize(img, new_shape)
14 img = img[tf.newaxis, :]
15 return img
1 content_image = load_img(content_path)
2 style_image = load_img(style_path)
3
4 plt.subplot(1, 2, 1)
5 imshow(content_image, 'Content Image')
6
7 plt.subplot(1, 2, 2)
8 imshow(style_image, 'Style Image')
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 3/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 4/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
Load a VGG19 and test run it on our image to ensure it's used correctly:
1 x = tf.keras.applications.vgg19.preprocess_input(content_image*255)
2 x = tf.image.resize(x, (224, 224))
3 vgg = tf.keras.applications.VGG19(include_top=True, weights='imagenet')
4 prediction_probabilities = vgg(x)
5 prediction_probabilities.shape
1 predicted_top_5 = tf.keras.applications.vgg19.decode_predictions(prediction_probabilities.numpy())[0]
2 [(class_name, prob) for (number, class_name, prob) in predicted_top_5]
Now load a VGG19 without the classification head, and list the layer names
input_layer_1
block1_conv1
block1_conv2
block1_pool
block2_conv1
block2_conv2
block2_pool
block3_conv1
block3_conv2
block3_conv3
block3_conv4
block3_pool
block4_conv1
block4_conv2
block4_conv3
block4_conv4
block4_pool
block5_conv1
block5_conv2
block5_conv3
block5_conv4
block5_pool
Choose intermediate layers from the network to represent the style and content of the image:
1 content_layers = ['block5_conv2']
2
3 style_layers = ['block1_conv1',
4 'block2_conv1',
5 'block3_conv1',
6 'block4_conv1',
7 'block5_conv1']
8
9 num_content_layers = len(content_layers)
10 num_style_layers = len(style_layers)
So why do these intermediate outputs within our pretrained image classification network allow us to define style and content representations?
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 6/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
At a high level, in order for a network to perform image classification (which this network has been trained to do), it must understand the image.
This requires taking the raw image as input pixels and building an internal representation that converts the raw image pixels into a complex
understanding of the features present within the image.
This is also a reason why convolutional neural networks are able to generalize well: they’re able to capture the invariances and defining features
within classes (e.g. cats vs. dogs) that are agnostic to background noise and other nuisances. Thus, somewhere between where the raw image
is fed into the model and the output classification label, the model serves as a complex feature extractor. By accessing intermediate layers of
the model, you're able to describe the content and style of input images.
To define a model using the functional API, specify the inputs and outputs:
This following function builds a VGG19 model that returns a list of intermediate layer outputs:
1 def vgg_layers(layer_names):
2 """ Creates a VGG model that returns a list of intermediate output values."""
3 # Load our model. Load pretrained VGG, trained on ImageNet data
4 vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
5 vgg.trainable = False
6
7 outputs = [vgg.get_layer(name).output for name in layer_names]
8
9 model = tf.keras.Model([vgg.input], outputs)
10 return model
1 style_extractor = vgg_layers(style_layers)
2 style_outputs = style_extractor(style_image*255)
3
4 #Look at the statistics of each layer's output
5 for name, output in zip(style_layers, style_outputs):
6 print(name)
7 print(" shape: ", output.numpy().shape)
8 print(" min: ", output.numpy().min())
9 print(" max: ", output.numpy().max())
10 print(" mean: ", output.numpy().mean())
11 print()
block1_conv1
shape: (1, 336, 512, 64)
min: 0.0
max: 835.5256
mean: 33.97525
block2_conv1
shape: (1, 168, 256, 128)
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 7/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
min: 0.0
max: 4625.8857
mean: 199.82687
block3_conv1
shape: (1, 84, 128, 256)
min: 0.0
max: 8789.239
mean: 230.78099
block4_conv1
shape: (1, 42, 64, 512)
min: 0.0
max: 21566.135
mean: 791.24005
block5_conv1
shape: (1, 21, 32, 512)
min: 0.0
max: 3189.2542
mean: 59.179478
It turns out, the style of an image can be described by the means and correlations across the different feature maps. Calculate a Gram matrix
that includes this information by taking the outer product of the feature vector with itself at each location, and averaging that outer product over
all locations. This Gram matrix can be calculated for a particular layer as:
𝑙 ( 𝑥) 𝐹 𝑙 ( 𝑥)
∑ 𝑖𝑗 𝐹 𝑖𝑗𝑐 𝑖𝑗𝑑
𝐺𝑙𝑐𝑑 =
𝐼𝐽
This can be implemented concisely using the tf.linalg.einsum function:
1 def gram_matrix(input_tensor):
2 result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
3 input_shape = tf.shape(input_tensor)
4 num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
5 return result/(num_locations)
1 class StyleContentModel(tf.keras.models.Model):
2 def __init__(self, style_layers, content_layers):
3 super(StyleContentModel, self).__init__()
4 self.vgg = vgg_layers(style_layers + content_layers)
5 self.style_layers = style_layers
6 self.content_layers = content_layers
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 8/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
7 self.num_style_layers = len(style_layers)
8 self.vgg.trainable = False
9
10 def call(self, inputs):
11 "Expects float input in [0,1]"
12 inputs = inputs*255.0
13 preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
14 outputs = self.vgg(preprocessed_input)
15 style_outputs, content_outputs = (outputs[:self.num_style_layers],
16 outputs[self.num_style_layers:])
17
18 style_outputs = [gram_matrix(style_output)
19 for style_output in style_outputs]
20
21 content_dict = {content_name: value
22 for content_name, value
23 in zip(self.content_layers, content_outputs)}
24
25 style_dict = {style_name: value
26 for style_name, value
27 in zip(self.style_layers, style_outputs)}
28
29 return {'content': content_dict, 'style': style_dict}
When called on an image, this model returns the gram matrix (style) of the style_layers and content of the content_layers :
Styles:
block1_conv1
shape: (1, 64, 64)
min: 0.005522845
max: 28014.555
mean: 263.79022
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 9/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
block2_conv1
shape: (1, 128, 128)
min: 0.0
max: 61479.484
mean: 9100.949
block3_conv1
shape: (1, 256, 256)
min: 0.0
max: 545623.56
mean: 7660.9766
block4_conv1
shape: (1, 512, 512)
min: 0.0
max: 4320502.0
mean: 134288.84
block5_conv1
shape: (1, 512, 512)
min: 0.0
max: 110005.34
mean: 1487.0381
Contents:
block5_conv2
shape: (1, 26, 32, 512)
min: 0.0
max: 2410.879
mean: 13.764149
1 style_targets = extractor(style_image)['style']
2 content_targets = extractor(content_image)['content']
Define a tf.Variable to contain the image to optimize. To make this quick, initialize it with the content image (the tf.Variable must be the
same shape as the content image):
1 image = tf.Variable(content_image)
Since this is a float image, define a function to keep the pixel values between 0 and 1:
1 def clip_0_1(image):
2 return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 10/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
Create an optimizer. The paper recommends LBFGS, but Adam works okay, too:
To optimize this, use a weighted combination of the two losses to get the total loss:
1 style_weight=1e-2
2 content_weight=1e4
1 def style_content_loss(outputs):
2 style_outputs = outputs['style']
3 content_outputs = outputs['content']
4 style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2)
5 for name in style_outputs.keys()])
6 style_loss *= style_weight / num_style_layers
7
8 content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2)
9 for name in content_outputs.keys()])
10 content_loss *= content_weight / num_content_layers
11 loss = style_loss + content_loss
12 return loss
1 @tf.function()
2 def train_step(image):
3 with tf.GradientTape() as tape:
4 outputs = extractor(image)
5 loss = style_content_loss(outputs)
6
7 grad = tape.gradient(loss, image)
8 opt.apply_gradients([(grad, image)])
9 image.assign(clip_0_1(image))
1 train_step(image)
2 train_step(image)
3 train_step(image)
4 tensor_to_image(image)
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 11/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
1 import time
2 start = time.time()
3
4 epochs = 10
5 steps_per_epoch = 100
6
7 step = 0
8 for n in range(epochs):
9 for m in range(steps_per_epoch):
10 step += 1
11 train_step(image)
12 print(".", end='', flush=True)
13 display.clear_output(wait=True)
14 display.display(tensor_to_image(image))
15 print("Train step: {}".format(step))
16
17 end = time.time()
18 print("Total time: {:.1f}".format(end-start))
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 12/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
1 def high_pass_x_y(image):
2 x_var = image[:, :, 1:, :] - image[:, :, :-1, :]
3 y_var = image[:, 1:, :, :] - image[:, :-1, :, :]
4
5 return x_var, y_var
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 13/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
Also, this high frequency component is basically an edge-detector. You can get similar output from the Sobel edge detector, for example:
1 plt.figure(figsize=(14, 10))
2
3 sobel = tf.image.sobel_edges(content_image)
4 plt.subplot(1, 2, 1)
5 imshow(clip_0_1(sobel[..., 0]/4+0.5), "Horizontal Sobel-edges")
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 14/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
6 plt.subplot(1, 2, 2)
7 imshow(clip 0 1(sobel[..., 1]/4+0.5), "Vertical Sobel-edges")
The regularization loss associated with this is the sum of the squares of the values:
1 def total_variation_loss(image):
2 x_deltas, y_deltas = high_pass_x_y(image)
3 return tf.reduce_sum(tf.abs(x_deltas)) + tf.reduce_sum(tf.abs(y_deltas))
1 total_variation_loss(image).numpy()
149323.84
That demonstrated what it does. But there's no need to implement it yourself, TensorFlow includes a standard implementation:
1 tf.image.total_variation(image).numpy()
array([149323.84], dtype=float32)
1 total_variation_weight=30
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 15/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
1 @tf.function()
2 def train_step(image):
3 with tf.GradientTape() as tape:
4 outputs = extractor(image)
5 loss = style_content_loss(outputs)
6 loss += total_variation_weight*tf.image.total_variation(image)
7
8 grad = tape.gradient(loss, image)
9 opt.apply_gradients([(grad, image)])
10 image.assign(clip_0_1(image))
1 import time
2 start = time.time()
3
4 epochs = 10
5 steps_per_epoch = 100
6
7 step = 0
8 for n in range(epochs):
9 for m in range(steps_per_epoch):
10 step += 1
11 train_step(image)
12 print(".", end='', flush=True)
13 display.clear_output(wait=True)
14 display.display(tensor_to_image(image))
15 print("Train step: {}".format(step))
16
17 end = time.time()
18 print("Total time: {:.1f}".format(end-start))
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 16/17
18/11/2024, 09:37 LB12_Implement GAN for neural style transfer (1).ipynb - Colab
https://colab.research.google.com/drive/11G_GdXYSD6xM-fTC_OuQdl7LYFy_mwDO#scrollTo=Atc2oL29PXu_&printMode=true 17/17