0% found this document useful (0 votes)
13 views

05 Transfer Learning With Tensorflow Part 2 Fine Tuning

This document discusses transfer learning techniques for computer vision tasks. It introduces using a pre-trained model as a feature extractor by freezing its bottom layers and only training the top classification layers on a new dataset. This allows leveraging what the model already learned from a large dataset like ImageNet to extract useful features, while fine-tuning the top layers for the new classification problem with fewer data requirements. Examples of food classification and spam detection are provided to illustrate transfer learning use cases.

Uploaded by

Akbar Shakoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

05 Transfer Learning With Tensorflow Part 2 Fine Tuning

This document discusses transfer learning techniques for computer vision tasks. It introduces using a pre-trained model as a feature extractor by freezing its bottom layers and only training the top classification layers on a new dataset. This allows leveraging what the model already learned from a large dataset like ImageNet to extract useful features, while fine-tuning the top layers for the new classification problem with fewer data requirements. Examples of food classification and spam detection are provided to illustrate transfer learning use cases.

Uploaded by

Akbar Shakoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Transfer Learning with

Part 2: Fine-tuning
Where can you get help?
“If in doubt, run the code”

• Follow along with the code


• Try it for yourself
• Press SHIFT + CMD + SPACE to read the docstring
• Search for it
• Try again
• Ask (don’t forget the Discord chat!)
(yes, including the “dumb”
questions)
“What is transfer learning?”
Surely someone has spent the time crafting the right model for the job…
Example transfer learning use cases
Computer vision

Natural language processing

To: [email protected] To: [email protected]


Hey Daniel, Hay daniel…

This deep learning course is incredible! C0ongratu1ations! U win $1139239230


I can’t wait to use what I’ve learned!

Not spam Spam

Model learns patterns/weights from similar problem space Patterns get used/tuned to speci c problem
fi
“Why use transfer learning?”
Why use transfer learning?
• Can leverage an existing neural network architecture proven to work on problems similar to our
own

• Can leverage a working network architecture which has already learned patterns on similar
data to our own (often results in great results with less data)

Learn patterns in a E cientNet archiecture Tune patterns/weights to


Model performs better
wide variety of images (already works really well our own problem
than from scratch
(using ImageNet) on computer vision tasks) (Food Vision)
ffi
Feature extraction vs. Fine-tuning
Custom final layer
Stays same
gets trained to custom 10 10
data
Top layers get


unfrozen and fine-
Layer 235 Layer 235 tuned on custom data
Changes
(unfrozen)
Layer 234 Layer 234
Working architecture


(e.g. E cientNet) pre-
trained on ImageNet
Layer 2 Layer 2
Stays same
Bottom layers (ma
(frozen) y)
stay frozen
Input Layer Input Layer


Fine-t
Might change usual uning
ly req
more d uires
ata th
featur an
Custom dataset (e.g. 10 classes of food) extrac e
tion

Feature extraction Fine-tuning


ffi
What we’re going to cover
(broadly)
• Introduce ne-tuning transfer learning with TensorFlow

• Introduce the Keras Functional API to build models

• Using a small dataset to experiment faster (e.g. 10% of training samples)

• Data augmentation (making your training set more diverse without adding
samples)

• Running a series of experiments on our Food Vision data

• Introduce the ModelCheckpoint callback to save intermediate training results

👩🍳 👩🔬
(we’ll be cooking up lots of code!)

How:
fi
Let’s code!
Dataset shapes
“Create batches of 32 images of size 224x224 split into red, green, blue colour channels.”

• Number of total samples (750 images, 75 per class)


• Number of classes (10 types of food)
• Batch size (default is 32)
• Image size (height, width)
• Number of colour channels (red, green, blue)
• Number of classes in label tensors (10 types of food)
Keras Sequential vs Functional API
Sequential API Functional API

Compiling and fitting stays the same


• Similarities: compiling, tting, evaluating

• Di erences: model construction (the Functional API is more flexible and able to produce
more sophisticated models)
ff
fi
Building a feature
extraction model
with the Keras
Functional API
Useful computer vision architectures
• tf.keras.applications and keras.applications have many of the most popular and
best performing computer vision architectures built-in & pre-trained, ready to use for your
own problems

Source: https://keras.io/api/applications/ Source: https://www.tensorflow.org/api_docs/python/tf/keras/applications


What is a feature vector?
• A feature vector is a learned representation of the input data (a compressed form of the
input data based on how the model see’s it)

May be pre-trained (e.g. on ImageNet) or from scratch

[0.940, 0.242, 0.849…]


E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html

Output
Input data Model
e r e p r es en t a t i o n o f i n p u t (feature vector)
Lea r n s f e a t ur
data
ffi
ffi
EfficientNet feature extractor
Input data
(10 classes of Food101) Changes
(same shape
as number
Stays same
of classes)
(frozen, pre-trained on ImageNet)

[0.940, 0.242, 0.849…]


Pooling

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html

y - c on n e c t e d
Full
l a s s if ie r l a y er
(dense) c
ffi
ffi
EfficientNet fine-tuning
Input data
(10 classes of Food101) y e r g et
t o t h e o u t p u t l a
Layers closer
en / fi n e - t u n e d f i r s t
unfroz
Stays same Changes
(frozen, pre-trained on ImageNet) (unfrozen, gets ne-tuned on custom data)

10
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
- c on n e c t e d
Fully
Bottom layers tend to stay frozen (or are a s s if ie r l a y e r
(dense) cl
last to get unfrozen)
ffi
fi
ffi
Modelling experiments we’re running
Experiment Data Preprocessing Model
Feature Extractor: E cientNetB0
10 classes of Food101 data (random 10%
Model 0 (baseline) None (pre-trained on ImageNet, all layers
training data only)
frozen) with no top

Random Flip, Rotation,


10 classes of Food101 data (random 1%
Model 1 Zoom, Height, Width Same as Model 0
training data only)
data augmentation

Model 2 Same as Model 0 Same as Model 1 Same as Model 0

Fine-tuning: Model 2 (E cientNetB0


pre-trained on ImageNet) with top
Model 3 Same as Model 0 Same as Model 1
layer trained on custom data, top 10
layers unfrozen

10 classes of Food101 data (100% training


Model 4 Same as Model 1 Same as Model 3
data)
ffi
ffi
🍔👁 Food Vision: Dataset(s) we’re using
Note: For randomly selected data, the Food101 dataset was downloaded and modi ed using the Image Data Modi cation Notebook

Dataset Name Source Classes Training data Testing data

750 images of pizza and steak 250 images of pizza and steak
pizza_steak Food101 Pizza, steak (2) (same as original Food101 (same as original Food101
dataset) dataset)

Chicken curry, chicken wings,


7 randomly selected images 250 images of each class
fried rice, grilled salmon,
10_food_classes_1_percent Same as above of each class (1% of original (same as original Food101
hamburger, ice cream, pizza,
training data) dataset)
ramen, steak, sushi (10)

75 randomly selected images


10_food_classes_10_percent Same as above Same as above of each class (10% of original Same as above
training data)

750 images of each class


10_food_classes_100_percent Same as above Same as above (100% of original training Same as above
data)

250 images of each class


75 images of each class (10%
101_food_classes_10_percent Same as above All classes from Food101 (101) (same as original Food101
of original Food101 dataset)
dataset)
fi
fi
Data augmentation as a layer

[0.940, 0.242, 0.849…]


augmentation
Data
E cientNetB0 architecture.
Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html

Output
Input data Model
(feature vector)

When passed as a layer to a model data augmentation is


automatically turned on during training (augments training data)
but turned o during inference (does not augment testing data
or new unseen data).
ffi
ff
ffi
What are callbacks?
• Callbacks are a tool which can add helpful functionality to your models during training,
evaluation or inference

• Some popular callbacks include:

Callback name Use case Code

Log the performance of multiple models and then view and compare
these models in a visual way on TensorBoard (a dashboard for
TensorBoard tf.keras.callbacks.TensorBoard()
inspecting neural network parameters). Helpful to compare the results
of di erent models on your data.
Save your model as it trains so you can stop training if needed and
Model checkpointing come back to continue o where you left. Helpful if training takes a tf.keras.callbacks.ModelCheckpoint()
long time and can't be done in one sitting.

Leave your model training for an arbitrary amount of time and have it
Early stopping stop training automatically when it ceases to improve. Helpful when tf.keras.callbacks.EarlyStopping()
you've got a large dataset and don't know how long training will take.
ff
ff
Original Model vs. Feature Extraction
Changes Output layer(s) gets trained
Output Layer (shape = 1000) 10 on new data


Layer 235 Layer 235

Layer 234 Layer 234


Working Stays same (frozen)
architecture …


(original model layers
(e.g. E cientNet) don’t update during training)
Layer 2 Layer 2

Input Layer Input Layer



Changes

Large dataset (e.g. ImageNet) Di erent dataset (e.g. 10 classes of food)

Original Model Feature Extraction Transfer Learning Model


ff
ffi
Kinds of Transfer Learning Top layers get trained
on new data

Output Layer (shape = 1000) Changes 10 Stays same 10


Layer 235 Layer 235 Layer 235
Changes
(unfrozen)
Layer 234 Stays same Layer 234 Layer 234
(frozen)


Layer 2 Layer 2 Layer 2
Stays same
(frozen)
Input Layer Input Layer Input Layer


Fine-t
Changes Might change usual uning
ly req
more d uires
ata th
featur an
Large dataset (e.g. ImageNet) Di erent dataset (e.g. 10 classes of food) extrac e
tion

Original Model Feature Extraction Fine-tuning


ff
Kinds of Transfer Learning
Transfer
Description What happens When to use
Learning Type

Take a pretrained model as it is and


The original model remains Helpful if you have the exact same kind of data
Original model (“As is”) apply it to your task without any
unchanged. the original model was trained on.
changes.

Take the underlying patterns (also Helpful if you have a small amount of custom
Most of the layers in the original
called weights) a pretrained model data (similar to what the original model was
Feature extraction has learned and adjust its outputs
model remain frozen during training
trained on) and want to utilise a pretrained model
(only the top 1-3 layers get updated).
to be more suited to your problem. to get better results on your speci c problem.

Helpful if you have a large amount of custom data


Take the weights of a pretrained Some (1-3+), many or all of the
and want to utilise a pretrained model and
Fine-tuning model and adjust ( ne-tune) them layers in the pretrained model are
improve its underlying patterns to your speci c
to your own problem. updated during training.
problem.
fi
fi
fi
What is TensorBoard?
• A way to visually explore your machine learning models performance and internals

• Host, track and share your machine learning experiments on TensorBoard.dev


n t e gr a t e s
o a r d a ls oi
(TensorB t s & B i a s e s )
i k e W ei gh
s i t e s l
With web

Comparing the results of two di erent model


architectures (ResNet50V2 & E cientNetB0)
on the same dataset.

Source: https://tensorboard.dev/experiment/73taSKxXQeGPQsNBcVvY3g/#scalars
ffi
ff

You might also like