0% found this document useful (0 votes)
51 views64 pages

Pytorch Tutorial: - Ntu Machine Learning Course

This document provides an overview and tutorial on PyTorch, an open source machine learning framework developed by Facebook. It discusses what PyTorch is, how to install it, its key packages and modules, concepts like tensors, variables, neural network modules, and optimizers. It also outlines how neural networks work in PyTorch using forward and backward passes to update weights, and provides a tutorial on defining neural network modules and building networks in PyTorch.

Uploaded by

Minh Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views64 pages

Pytorch Tutorial: - Ntu Machine Learning Course

This document provides an overview and tutorial on PyTorch, an open source machine learning framework developed by Facebook. It discusses what PyTorch is, how to install it, its key packages and modules, concepts like tensors, variables, neural network modules, and optimizers. It also outlines how neural networks work in PyTorch using forward and backward passes to update weights, and provides a tutorial on defining neural network modules and building networks in PyTorch.

Uploaded by

Minh Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

PyTorch Tutorial

-NTU Machine Learning Course-

Lyman Lin 林裕訓


Nov. 03, 2017
lymanblue[at]gmail.com
What is PyTorch?
• Developed by Facebook
– Python first
– Dynamic Neural Network
– This tutorial is for PyTorch 0.2.0

• Endorsed by Director of AI at Tesla


Installation
• PyTorch Web: http://pytorch.org/
Packages of PyTorch

Package Description
This Tutorial
torch a Tensor library like Numpy, with strong GPU support

torch.autograd a tape based automatic differentiation library that supports all


differentiable Tensor operations in torch
torch.nn a neural networks library deeply integrated with autograd designed for
maximum flexibility
torch.optim an optimization package to be used with torch.nn with standard
optimization methods such as SGD, RMSProp, LBFGS, Adam etc.
torch.multiprocessing python multiprocessing, but with magical memory sharing of torch
Tensors across processes. Useful for data loading and hogwild training.
torch.utils DataLoader, Trainer and other utility functions for convenience

torch.legacy(.nn/.optim) legacy code that has been ported over from torch for backward
compatibility reasons
Outline
• Neural Network in Brief
• Concepts of PyTorch
• Multi-GPU Processing
• RNN
• Transfer Learning
• Comparison with TensorFlow
Neural Network in Brief
• Supervised Learning
– Learning a function f, that f(x)=y

Trying to learn f(.), that f(x)=y


Data Label
X1 Y1
X2 Y2
… …
Neural Network in Brief
Big Data

Batch 1

Batch 2 Neural Network

1 Epoch Batch 3 Data Wi


Batch N

N=Big Data/Batch Size


Neural Network in Brief
Big Data
Forward Process: from data to label
Batch 1 Forward

Batch 2 Neural Network

1 Epoch Batch 3 Data Wi Label’


Batch N

N=Big Data/Batch Size


Neural Network in Brief
Big Data
Forward Process: from data to label
Batch 1 Forward

Batch 2 Neural Network

1 Epoch Batch 3 Data Wi Label’ Loss Label


Batch N

N=Big Data/Batch Size


Neural Network in Brief
Big Data
Forward Process: from data to label
Batch 1 Forward

Batch 2 Neural Network

1 Epoch Batch 3 Data Wi-> Wi+1 Label’ Loss Label


Batch N Optimizer

N=Big Data/Batch Size Backward

Backward Process: update the parameters


Neural Network in Brief
Inside the Neural Network
Forward
Data
W W W W … W Label’
Forward
Gradient
Backward
Neural Network

Data Wi-> Wi+1 Label’ Loss Label

Optimizer

Backward
Neural Network in Brief
Inside the Neural Network
Forward
Data
W W W W … W Label’
Forward
Gradient
Backward
Neural Network

Data Wi-> Wi+1 Label’ Loss Label

Data in the Neural Network Optimizer


- Tensor (n-dim array)
- Gradient of Functions
Backward
Concepts of PyTorch
• Modules of PyTorch
Data:
Forward
- Tensor
- Variable (for Gradient) Neural Network

Function: Data Wi-> Wi+1 Label’ Loss Label


- NN Modules
- Optimizer
Optimizer
- Loss Function
- Multi-Processing
Backward
Concepts of PyTorch
• Modules of PyTorch • Similar to Numpy
Data:
- Tensor
- Variable (for Gradient)

Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Concepts of PyTorch
• Modules of PyTorch • Operations
– z=x+y
Data:
- Tensor – torch.add(x,y, out=z)
- Variable (for Gradient) – y.add_(x) # in-place

Function:
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Concepts of PyTorch
• Modules of PyTorch • Numpy Bridge
Data:
- Tensor • To Numpy
- Variable (for Gradient) – a = torch.ones(5)
– b = a.numpy()
Function:
- NN Modules
- Optimizer • To Tensor
- Loss Function – a = numpy.ones(5)
- Multi-Processing – b = torch.from_numpy(a)
Concepts of PyTorch
• Modules of PyTorch • CUDA Tensors
Data:
- Tensor • Move to GPU
- Variable (for Gradient) – x = x.cuda()
– y = y.cuda()
Function: – x+y
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
Concepts of PyTorch
• Modules of PyTorch
Data:
Forward
- Tensor
- Variable (for Gradient) Neural Network

Function: Data Wi-> Wi+1 Label’ Loss Label


- NN Modules
- Optimizer
Optimizer
- Loss Function
- Multi-Processing
Backward
Neural Network in Brief
Inside the Neural Network
Forward
Data
W W W W … W Label’
Forward
Gradient
Backward
Neural Network

Data Wi-> Wi+1 Label’ Loss Label

Data in the Neural Network Optimizer


- Tensor (n-dim array)
- Gradient of Functions
Backward
Concepts of PyTorch
• Modules of PyTorch • Variable
Data:
- Tensor
- Variable (for Gradient)

Function:
- NN Modules Tensor data
- Optimizer
- Loss Function
- Multi-Processing For Current Backward Process
Handled by PyTorch Automatically
Concepts of PyTorch
• Modules of PyTorch • Variable
• x = Variable(torch.ones(2, 2), requires_grad=True)
• print(x)
Data:
- Tensor
- Variable (for Gradient) • y=x+2
• z=y*y*3
• out = z.mean()
Function: • out.backward()
• print(x.grad)
- NN Modules
- Optimizer
- Loss Function
- Multi-Processing
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
Define modules
(must have)

Build network
(must have)

• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x
[Channel, H, W]: 1x32x32->6x28x28
conv1

relu
pooling
Define modules
conv2
(must have)
relu
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1
[Channel, H, W]: 6x28x28
relu
pooling
Define modules
conv2
(must have)
relu
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1

relu
[Channel, H, W]: 6x28x28 -> 6x14x14
pooling
Define modules
conv2
(must have)
relu
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1

relu
pooling
Define modules [Channel, H, W]: 6x14x14 -> 16x10x10
conv2
(must have)
relu
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1

relu
pooling
Define modules
conv2
(must have)
[Channel, H, W]: 16x10x10
relu
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1

relu
pooling
Define modules
conv2
(must have)
relu
[Channel, H, W]: 16x10x10 -> 16x5x5
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1

relu
pooling
Define modules
conv2
(must have)
relu
pooling
Build network
Flatten the Tensor fc1
(must have)
16x5x5
relu

fc2
Tensor: [Batch N, Channel, H, W]
relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1

relu
pooling
Define modules
conv2
(must have)
relu
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1

relu
pooling
Define modules
conv2
(must have)
relu
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1

relu
pooling
Define modules
conv2
(must have)
relu
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1

relu
pooling
Define modules
conv2
(must have)
relu
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
x

conv1

relu
pooling
Define modules
conv2
(must have)
relu
pooling
Build network
fc1
(must have)
relu

fc2

relu

fc3
• http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
Concepts of PyTorch
• Modules of PyTorch • NN Modules (torch.nn)
– Modules built on Variable
Data: – Gradient handled by PyTorch
- Tensor
- Variable (for Gradient)
• Common Modules
Function: – Convolution layers
- NN Modules – Linear layers
- Optimizer
– Pooling layers
- Loss Function
- Multi-Processing – Dropout layers
– Etc…
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D

– Example:
– torch.nn.conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D

Input for Conv2d

Hin

Win
Cin

*: convolution
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D

Input for Conv2d 1st kernel


k
Hin * k
Cin
Win
Cin

*: convolution
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D

Input for Conv2d 1st kernel


k Hout
Hin * k =
Cin Wout
Win 1
Cin

*: convolution
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D

Input for Conv2d 1st kernel


k Hout
Hin * k =
Cin Wout
Win 1
p=1 d=1 k=3
Cin
s=1, moving step size

*: convolution
NN Modules
• Convolution Layer p=1
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D

Input for Conv2d 1st kernel


k Hout
Hin * k =
Cin Wout
Win 1
p=1 d=1 k=3
Cin
s=1, moving step size

*: convolution
NN Modules
• Convolution Layer p=1
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D k=3
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D

Input for Conv2d 1st kernel


k Hout
Hin * k =
Cin Wout
Win 1
p=1 d=1 k=3
Cin
s=1, moving step size

*: convolution
NN Modules
• Convolution Layer p=1
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D k=3
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D s=1

Input for Conv2d 1st kernel


k Hout
Hin * k =
Cin Wout
Win 1
p=1 d=1 k=3
Cin
s=1, moving step size

*: convolution
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D

Input for Conv2d 1st kernel


k Hout
Hin * k =
Cin Wout
Win … 1…
Cin
Cout-th kernel
k Hout
=
* k
*: convolution Cin Wout
1
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D

Input for Conv2d 1st kernel


k Hout
Hin * k =
Cin Wout Hout
Win … 1…
Cin Wout
Cout-th kernel
k Hout Cout
=
* k
*: convolution Cin Wout
1
NN Modules
• Convolution Layer
– N-th Batch (N), Channel (C)
– torch.nn.Conv1d: input [N, C, W] # moving kernel in 1D
– torch.nn.Conv2d: input [N, C, H, W] # moving kernel in 2D
– torch.nn.Conv3d: input [N, C, D, H, W] # moving kernel in 3D

Input for Conv2d 1st kernel


k
Hin * k # of parameters
Cin
Win …
Cin
Cout-th kernel
k
* k
*: convolution Cin
NN Modules
• Linear Layer
– torch.nn.Linear(in_features=3, out_features=5)
– y=Ax+b
NN Modules
• Dropout Layer
– torch.nn.Dropout(p)
– Random zeros the input with probability p
– Output are scaled by 1/(1-p)

If dropout here
NN Modules
• Pooling Layer
– torch.nn.AvgPool2d(kernel_size=2, stride=2, padding=0)
– torch.nn.MaxPool2d(kernel_size=2, stride=2, padding=0)

s=2, moving step size

k=2

p=0 d=1 k=2

s=2, moving step size


Concepts of PyTorch
• Modules of PyTorch • NN Modules (torch.nn)
– Modules built on Variable
Data: – Gradient handled by PyTorch
- Tensor
- Variable (for Gradient)
• Common Modules
Function: – Convolution layers
- NN Modules – Linear layers
- Optimizer
– Pooling layers
- Loss Function
- Multi-Processing – Dropout layers
– Etc…
Concepts of PyTorch
• Modules of PyTorch • Optimizer (torch.optim)
– SGD
– Adagrad
Data: – Adam
- Tensor – RMSprop
– …
- Variable (for Gradient) – 9 Optimizers (PyTorch 0.2)

Function:
• Loss (torch.nn)
- NN Modules – L1Loss
- Optimizer – MSELoss
- Loss Function – CrossEntropy
– …
- Multi-Processing – 18 Loss Functions (PyTorch 0.2)
What We Build?

Define modules
(must have)

Build network
(must have)

http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim
What We Build?
D_in=1000 D_out=100
Define modules
(must have)
y_pred



Build network
(must have) H=100

http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim
What We Build?
D_in=1000 D_out=100
Define modules
(must have)
y_pred



Build network
(must have) H=100

Don’t Update y (y are labels here)


Construct Our Model

Optimizer and Loss Function

http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim
What We Build?
D_in=1000 D_out=100
Define modules
(must have)
y_pred



Build network
(must have) H=100

Don’t Update y (y are labels here)


Construct Our Model

Optimizer and Loss Function

Reset Gradient
Backward
Update Step
http://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-optim
Concepts of PyTorch
• Modules of PyTorch • Basic Method
– torch.nn.DataParallel
Data: – Recommend by PyTorch
- Tensor
- Variable (for Gradient)

Function:
• Advanced Methods
- NN Modules
– torch.multiprocessing
- Optimizer
– Hogwild (async)
- Loss Function
- Multi-Processing
Multi-GPU Processing
• torch.nn.DataParallel
– gpu_id = '6,7‘
– os.environ['CUDA_VISIBLE_DEVICES'] = gpu_id
– net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
– output = net(input_var)

• Important Notes:
– Device_ids must start from 0
– (batch_size/GPU_size) must be integer
Saving Models
• First Approach (Recommend by PyTorch)
• # save only the model parameters
• torch.save(the_model.state_dict(), PATH)

• # load only the model parameters


• the_model = TheModelClass(*args, **kwargs)
• the_model.load_state_dict(torch.load(PATH))
• Second Approach
• torch.save(the_model, PATH) # save the entire model
• the_model = torch.load(PATH) # load the entire model

http://pytorch.org/docs/master/notes/serialization.html#recommended-approach-for-saving-a-model
Recurrent Neural Network (RNN)

output
hidden
self.i2h
input_size=50+20=70
input
http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#example-2-recurrent-net
Recurrent Neural Network (RNN)

Same module (i.e. same parameters)


among the time
output
hidden
self.i2h
input_size=50+20=70
input
http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#example-2-recurrent-net
Transfer Learning
• Freeze the parameters of original model
– requires_grad = False

• Then add your own modules

http://pytorch.org/docs/master/notes/autograd.html#excluding-subgraphs-from-backward
Comparison with TensorFlow
Properties TensorFlow PyTorch
Static
Graph Dynamic
Dynamic (TensorFlow Fold)
Ramp-up Time - Win
Graph Creation and Debugging - Win
Feature Coverage Win Catch up quickly
Documentation Tie Tie
Serialization Win (support other lang.) -
Deployment Win (Cloud & Mobile) -
Data Loading - Win
Device Management Win Need .cuda()
Custom Extensions - Win

Summarized from https://awni.github.io/pytorch-tensorflow/


Remind: Platform & Final Project

Thank You~!

You might also like