0% found this document useful (0 votes)
2 views

Beginner's PyTorch Guide

The document is a comprehensive guide on using PyTorch for machine learning, focusing on neural networks. It covers the basics of neural networks, essential PyTorch functions, and the implementation of neural networks in PyTorch, culminating in a training loop for model optimization. The guide is structured into chapters that progressively build on concepts, making it accessible for beginners in machine learning.

Uploaded by

nbukhari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Beginner's PyTorch Guide

The document is a comprehensive guide on using PyTorch for machine learning, focusing on neural networks. It covers the basics of neural networks, essential PyTorch functions, and the implementation of neural networks in PyTorch, culminating in a training loop for model optimization. The guide is structured into chapters that progressively build on concepts, making it accessible for beginners in machine learning.

Uploaded by

nbukhari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

GPT Learning Hub

Beginner’s PyTorch Guide


The most important Python library for Machine Learning.

by Dev G on September 03

Hey everyone, Dev from GPT Learning Hub again!

First, I wanted to say thank you guys for all the positive feedback on the ML Challenge. I’m
glad you guys enjoyed the explanations and my teaching style.

For those who missed the first round, you can still join for free here.

We’re back again with our next challenge: 4 Days of PyTorch!

PyTorch is a Deep Learning library, meaning that it’s used to build and train Neural
Networks.

It’s way more popular than TensorFlow these days.


Here’s the plan:

Chapter One: Review of Neural Networks (how do these actually work?)

Chapter Two: Basic PyTorch Functions

Chapter Three: Implementing Neural Networks in PyTorch

Chapter Four: Training Neural Networks in PyTorch


How do Neural Networks actually work?
by Dev G on September 03

Let’s break down the diagram above.

At the end of this chapter, I’ll link a short video where I go over some practice questions.

It turns out, Neural Networks are very similar to Linear Regression!

Super Quick Review of Linear Regression:

We predict an output Y based on a few input attributes, like X₁, X₂, and X₃.

Y = W₁ ⋅ X₁ + W₂ ⋅ X₂ + W₃ ⋅ X₃ + b

W₁, W₂, W₃, and b are the parameters of the model, meaning that their values are updated
during training:

Let’s say X₁ represents a teenager’s current weight, X₂ their current height, and X₃ the
average of their parents’ heights.
Y will represent the model’s prediction for how tall this person will be, when finished
growing.

Okay, back to Neural Networks.

The goal of a Neural Network is to output an accurate prediction based on the input
attributes.

Each of the three nodes in the first column store an input attribute.

Next, each node in the middle column uses this equation to predict a number Y:

Y = W₁ ⋅ X₁ + W₂ ⋅ X₂ + W₃ ⋅ X₃ + b

But those nodes are EACH predicting an output, so we actually have 4 different Y values, Y₁
through Y₄.

Also, each node uses a separate set of parameters.


Meaning, each node stores its own values for for W₁ through W₃ (plus the constant b), and
they can be completely different.

Lastly, the final node. This node takes Y₁ through Y₄ as input, and also uses Linear
Regression to calculate an output Z (also referred to as ŷ in the diagram below).

Z = W₁ ⋅ Y₁ + W₂ ⋅ Y₂ + W₃ ⋅ Y₃ + W₄ ⋅ Y₄ + b

Z is the prediction for how tall someone will be, when finished growing.

And that’s all Neural Networks are! They’re just combinations of nodes that use Linear
Regression.

-----------
Okay, there is one detail left to discuss.

This detail is what makes Neural Networks different from Linear Regression.

Before passing Y₁ through Y₄ into the final node, we first pass each value into another
function, and the outputs of that function will be sent into the final node.

That function is typically called a “nonlinearity” or “activation function”.

One example is the Sigmoid function, shown below.

In case you’re wondering, here is the equation for the Sigmoid:


But what’s the point of this?

Why do we need to pass Y₁ through Y₄ into the Sigmoid before passing the values into the
final node?

Well . . . without a nonlinear function, the Neural Network can only learn simple, linear
relationships like the one below.

What do we mean by “learn” ?

“Learning” is the process in which the parameters of the model (all the W’s and b’s) are
updated.

Without a nonlinear function, the Neural Network’s prediction will remain inaccurate, even
if we let the network learn for thousands of iterations.

Nonlinearities are essential to the success of Neural Networks! Since most data in the
world is nonlinear, we need to add functions like the Sigmoid into the model.

Here’s another example of a nonlinearity, the Tanh function:


The nonlinearity we choose affects the performance of the Neural Network!

-----------

That concludes our review of Neural Networks!

P.S. Here’s a quick quiz to check your understanding on Neural Networks.


PyTorch 101: Basic Functions
by Dev G on September 03

Hey everyone,

Dev from GPT Learning Hub again!

Welcome to Chapter 2: Basic PyTorch Functions.

If you need a refresher of Chapter 1, here is a quick review of Neural Networks.

Today’s chapter may seem a bit abstract, since we’re going over the basic data types and
functions of PyTorch.

But I promise that learning these fundamentals will pay off tomorrow, when we go over
Neural Networks in PyTorch!

Let’s get started:

The main data structure of PyTorch is a tensor.


Tensors are like multi-dimensional arrays that can be 1-D, 2-D, 3-D, and higher.

Tensors are useful for storing the inputs and outputs of Machine Learning models. Here’s
two simple examples.

The input to a model:

The output to a model:


But why do we use tensors over normal Python lists?

When it comes to Machine Learning, speed and efficiency is everything.

PyTorch is a Python library, but many of the functions are internally written in C++.

These functions take advantage of parallel processing whenever possible, and are
optimized for speed.

So, whenever we’re dealing with Machine Learning data, tensors are the go-to.

-----------

Let’s create basic tensors.

First, we can create tensors from Python lists.


We can also create tensors of all ones or all zeros.
Lastly, we can create tensors of random numbers!

Let’s move on to the basic functions.

-----------

There’s a ton of different PyTorch functions, so I selected 4 important functions to master.

These functions are used all the time when training models!

First, the reshape() function. We simply have to specify the input tensor and the new size.

The reshape() function is used all the time in CNNs for reshaping images!

Next, the sum() function. We have to carefully specify the dim parameter, depending on
whether we want to sum each row or column!
The sum() function is often used when we want to combine the predictions from two
models.

The cat() function is commonly used to stack tensors on top of each other or side by side!
One cool application of cat() is in Multimodal LLMs.

We want the model to process images and text, so at some point, inside the model, we have
to concatenate the image and text tensors!

Lastly, the mse_loss() function, which stands for Mean Squared Error Loss.
When calculating the error for a regression model such as:

Y = W₁ ⋅ X₁ + W₂ ⋅ X₂ + W₃ ⋅ X₃ + b

We use the Mean Squared Error function to calculate the error between the model’s
prediction Y and the “true” answer from the dataset.

This function first calculates the difference between the model’s prediction and the “true”
answer for every data point in our dataset.

Next, mse_loss() squares all the differences.

Finally, the function averages all those values, returning one final number.

We square the differences to remove the sign (+ or -) from positive and negative errors!

Using mse_loss() is much faster than manually looping over all data points and calculating
the error, since this function takes advantage of parallel processing when possible.

Stay tuned for Chapters 3 and 4 of our challenge to see the function used in action!

-----------

P.S. If you want to jump ahead and learn more PyTorch, I created a 7 minute video covering
the basics of Neural Networks and how to build them in PyTorch.
Coding Neural Networks In PyTorch
by Dev G on September 03

Hey everyone,

Dev from GPT Learning Hub again!

Today is Chapter 3: Implementing Neural Nets in PyTorch.

If you need a refresher of Neural Networks, here is a quick review of the concepts from
Chapter 1.

If you need a refresher of basic PyTorch syntax, check out this intro to PyTorch video.

Let’s get started.

-----------

We’re going to code up this diagram as a class in Python:

After the class is written, we can make an instance of the class and use the model to make
predictions.

Also, let’s say this model predicts how tall someone will be, once they’re finished growing.
The network takes in X₁ (a teenager’s current weight), X₂ (their current height), and X₃ (the
average of their parents’ heights).

Time for the code. Here is where we’ll start:

We’re going to define the Neural Network in the class SimpleModel.

There are two methods or functions we have to write.

The first is the __init__() function, also called the constructor. This is where we’ll define
the number of nodes in each column of the Neural Network diagram!
Next is the forward() function.

This is where we’ll pass in X₁ (a teenager’s current weight), X₂ (their current height), and X₃
(the average of their parents’ heights) and return the model’s prediction!

BTW, the input data point (X₁, X₂, and X₃) will be stored in an array x, which will have size 3.

It’s time to write the __init__() function and define the model.

We will make use of the built-in PyTorch class nn.Linear ! This class has its own forward()
function already defined, which we will later make use of.

In the above Neural Network diagram, we know that each node in the middle column uses
this equation:

Y = W₁ ⋅ X₁ + W₂ ⋅ X₂ + W₃ ⋅ X₃ + b

The model has to store W₁ through W₃ plus the constant b for each node. This is exactly
what nn.Linear will keep track of!

Each time we make an instance of nn.Linear, we call it a “layer” of the model.


Above, we define two instances of nn.Linear !

The first input to nn.Linear() is the number of nodes in the previous layer, and the
second input is the number of nodes in the current layer.

This diagram is now fully defined:

Let’s move on to the forward() function, which calculates the final output.

We call this the “forward” function, since we can imagine the data flowing from left to right
through the network.
The simplest way to write this function is to pass x into the middle_layer, and then pass
the output into the final_layer.

We can directly call the forward() method of each layer like this:

Or, we can use this syntax, since Python knows that we want to call the forward() method.
This option is preferred since it’s more concise:
-----------

That wraps up today’s chapter on implementing Neural Networks in PyTorch.

Tomorrow, we’ll make an instance of the class we wrote, and we’ll train it on a dataset!

The code for training the model will bridge together many concepts, so you won’t want to
miss it.

-----------

If you want to review this material further, check out this video I created!

I also included timecodes in case you want to skip the Neural Networks Review & go
straight to PyTorch.

-----------

One Final Detail:

When using the PyTorch library, every model must inherit from (or in other words,
subclass) the parent class nn.Module.
The Python syntax for inheritance is shown above. Simply write the parent class in
parentheses.

If this part is unfamiliar to you, don’t worry! It’s actually not as important as the rest of this
chapter.
PyTorch Finale: Training Neural Networks
by Dev G on September 03

Hey everyone,

Dev from GPT Learning Hub again!

This is the final chapter of our PyTorch Challenge.

If you need a refresher of Neural Networks, here is a quick review of the concepts from
Chapter 1.

If you need a refresher of basic PyTorch syntax, check out this intro to PyTorch video.

We’re finally going to write the simple for loop that is used to train all modern ML models.

Let’s get started.

-----------

We’ll pick off right where we finished yesterday. Here is the Neural Network we
implemented:
This model takes in 3 input attributes (such as a person’s current weight, height, and the
average of their parents’ heights).

And it predicts a single number, such as how tall the person will eventually grow.

Here is the corresponding class we wrote:

Take the time to understand this class! Today’s concepts build on top of yesterday’s.

If you want to review these required fundamentals, I highly recommend the Intro to
PyTorch video.

Okay, the first step in training the model is to make an instance of our class:
We can now use the model to get predictions!

Let’s pass in a simple data point, where the person’s current weight is 45 (kg), current
height is 1.5 (meters), and the average of parents’ heights is 1.7 (meters).

The model’s prediction for the final height is 13.7 meters! This makes no sense, but a poor
prediction is expected.

We haven’t trained the model yet, so the values for W₁, W₂, W₃, b, etc. are completely
random!

We need to train the network:


Side Note: We also haven’t added any nonlinearities, like the Sigmoid function (pictured
below), which are essential for Neural Networks to perform better than simple Linear
Regression.

Let’s focus on training the model for now.

Here is the rough pseudocode for our training loop:


Side Note: Understanding this training loop requires an understanding of Gradient
Descent.

Gradient Descent is the most important algorithm in Machine Learning. It uses derivatives
to update the parameters and minimize the model’s error.

Click here for the most concise explanation of Gradient Descent you’ve ever seen.

I’ve created countless explanations of Gradient Descent, and with each iteration, I’ve made
the explanation more and more concise.

My most recent iteration is only three minutes long, and I can’t recommend the video
enough.

Back to our training loop:

Let’s replace each step of the pseudocode with a line of Python.

First, let’s get the model’s predictions using the current parameter values:
We are assuming that the dataset has been previously defined!

Another day, we’ll go over how to define the dataset. For now let’s assume that dataset is an
array of size N by 3, where N is the number of datapoints we have:

As a result, predictions will be an array of size N, where each entry is the model’s
prediction for a data point.

Next, let’s calculate the model’s error using the Mean Squared Error function, which we
went over on Day 2!
TLDR for Mean Squared Error: Calculate the difference between the model’s prediction and
the true answer for each data point, square all the differences, and then average them all
together to return one final number.

We are assuming that true_answers has been previously defined!

For now, let’s assume that true_answers is an array of size N. Each entry stores the final
height for a person in the dataset.

The last column of this table is essentially true_answers.

Next, let’s calculate all necessary derivatives for Gradient Descent with a simple call to
backward().
You may be wondering what in the world backward() is doing.

I’ll go through a high-level explanation without getting into crazy detail.

By this point, you should know that derivatives are necessary to use the Gradient Descent
formula and update the parameters.

But how do we actually calculate the derivatives required to update each parameter?

Well, remember the forward() visualization from yesterday? Here it is again:

Backward() is like the reverse of forward(). Using the model’s prediction and the
corresponding error, calculations now flow from right to left through the network, in order
to calculate all the required derivatives.
We have to go from right to left for backward(). Why?

If you do the math by hand (don’t worry, this is not necessary), this is what you would find:

The derivatives that are needed to update the parameters in the hidden layer of the
network depend on the derivatives for the parameters in the output layer of the network.

So, we have to calculate the final layer’s derivatives first, and then use those to calculate the
middle layer’s derivatives.

This is why we call the function backward(). We’re going from right to left.

If that seemed a bit unclear, no worries. It’s the trickiest concept in Machine Learning,
and thankfully PyTorch will calculate all the derivatives for us.

We don’t need to worry about all the details of backward().

However, it is essential to understand Gradient Descent, and the overall idea of using
derivatives to update the parameters and minimize the error function.

I promise that was the trickiest part of today’s chapter!

Back to the code:

Let’s use the derivatives to update the parameters using this equation:
Thankfully, we can do this in PyTorch with two lines of code.

First, define an “optimizer” object.

We simply need to pass in a list of the model parameters that the optimizer is responsible
for updating.

We’re using the SGD class which stands for Stochastic Gradient Descent, a form of
Gradient Descent.

Next, call the step() function which will use the Gradient Descent equation to update each
parameter.
The model is now trained!

If we try to get the model prediction for a sample data point, we will now get a much more
accurate prediction of 1.8 meters.

That wraps up Day 4: Training Neural Networks in PyTorch!

I want to make two final points:

First, the code for the training loop is nearly identical whether we’re training a
height-prediction model, an LLM to generate text, or a vision model to classify images!

We get the model predictions, calculate derivatives, and use Gradient Descent to improve
the model predictions!

The only thing that varies is the Neural Network diagram and thus the class we define.

Next, there is actually one final line of code that we need for the training loop. I saved it for
the end of the chapter so that the explanation from earlier was as intuitive as possible.

Without this simple line of code, the training loop will fail. Here it is:
By default, PyTorch will store the derivatives from previous iterations and maintain a
running sum.

This is not what we want. Instead, we want to “reset” or “zero” the derivatives after every
iteration, so that the next iteration’s derivatives are not affected.

We simply need to call zero_grad() at the end of every iteration.

It’s a bit odd that PyTorch adds up all the derivatives by default, since this is not the typical
use case.

But hey, at least PyTorch makes everything else relatively pain-free.

GPT Learning Hub


San Jose, CA, 95128

You might also like