ANN_Unit
ANN_Unit
The Neuron
Building from up from the foundation of the Neural Network we will first
examine the Neuron; how it works and what it looks like. It is the
centerpiece of the Neural Network. For point of comparison, there will
be some examination of the human brain; how that works and why we
want to replicate it.
Practical Application
Then we get into some deep learning on the machinations of the Neural
Network. We will follow one in action to see what we are striving
towards. But instead of the T2 slicing open his flesh to reveal the robot
skeleton beneath, we’ll be looking at how a Neural Network can
predict housing prices. Not as dramatic but potentially just as
upsetting.
How Neural Networks Learn
The next three tutorials will focus on what makes Neural Networks so
fascinating; how they learn. We will be going into deep learning with
the Gradient Descent method. Then we will move on to its refined
sibling, the Stochastic Gradient Descent method.
If by this point fiery flashes of Judgement Day have not interrupted your
thinking completely, we will have a summary section that
covers Backpropagation and how to compile a set of instructions for
your Neural Network.
The Neuron
In this deep learning tutorial we are going to examine the Neuron in
Neural Networking. Briefly, we will cover:
What it is
What it does
Where it fits in the Neural Network
Why it is important
The neuron that forms the basis of all Neural Networks is an imitation of
what has been observed in the human brain.
This odd pink critter is just one of the thousands swimming around
inside our brains.
A gap exists between them. To continue its journey, the signal must act
like a stuntman jumping across a deep canyon on a dirtbike. This jump
process of the signal passing is called the synapse. For simplicity’s
sake, this is the term I will also use when referring to the passing of
signals in our Neural Networks.
The inputs on the left side represent the incoming signals to the main
neuron in the middle. In a human neuron, this these would include smell
or touch.
In your Neural Network these inputs are independent variables. They
travel down the synapses, go through the big grey circle, then emerge
the other side as output values. It is a like-for-like process, for the
most part.
The main difference between the biological process and its artificial
counterpart is the level of control you exert over the input values; the
independent variables on the left-hand side.
Observations
It is equally important to note that each variable does not stand alone.
They are together as a singular observation. For example, you may list
a person’s height, age, and weight. These are three different
descriptors, but they pertain to one individual person.
Now, once these values pass through the main neuron and break on
through to the other side, they become output values.
Output values
Output values can take different forms. Take a look at this diagram:
They can either be:
continuous (price)
binary (yes or no)
or categorical.
However, just as the input variables are different parts of a whole, the
same goes for a categorical output.
The neuron is sandwiched between two single rows of data. There may
be three input variables and one output. It doesn’t matter. They are two
single corresponding rows of data. One for one.
He needs to get over there. In the brain, he has to take the leap. He
would much rather face this dilemma in a Neural Network, where he
doesn’t need the bike to reach his nirvana.
Weights
Each synapse is assigned a weight.
That signal goes on to the next neuron down the line then the next, so
on and so forth. That’s it.
In conclusion
We have covered:
Input Values
Weights
Synapses
The Neuron
The Activation Function
Output Values
This is the process you will see repeated again and again and again. . .
all the way down the line, hundreds or thousands of times depending on
the size of your Neural Network and how many neurons are within it.
In the last part of the course we examined the neuron, how it works, and
why it is important.
The last section and this one are intrinsically linked because the
activation function is the process applied to the weighted input value
once it enters the neuron.
Values
If weighted input values are shampoo, floor polish, and gin, the
neuron would be the deep black pot. The activation function is the open
flame beneath that congeals the concoction into something new;
the output value.
Just as you can adjust the size of the flame and time you wish to spend
stirring, there are many options you can process your input values with.
The first is the simplest. The x-axis represents the weighted sum of
inputs. On the y-axis are the values from 0 to 1. If the weighted sum is
valued as less than 0, the TF will pass on the value 0. If the value is
equal to or more than 0, the TF passes on 1. It is a yes or no, black or
white, binary function.
Shaplier than its rigid cousin, the Sigmoid’s curvature means it is far
better suited to probabilities when applied at the output layer of your
NN. If the Threshold tells you the difference between 0 and 1 dollar, the
Sigmoid gives you that as well as every cent in between.
Look at how the red line presses itself against the x-axis before
exploding in a fiery arrow at 45 degrees to a palace above the clouds.
From there it soars on a route similar to that of the Sigmoid, though over
a greater span, all the way to blessed 1.
The pinnacle of your adult life. The moment you realize the time falling
asleep on the couch while watching South Park reruns is finally coming
to an end.
Choices, choices
Lucky for us we have Neural Networks. They can find answers quicker
than we can and without the hair-tugging vacillation we would likely go
through on the way.
Remember
One thing to remember before we get into this example. In this
section, we will not be training the network . Training is a very
important part of Neural Networking but don’t stress, we will be looking
at this later on when we better understand how Neural Networks learn.
This part is all about application, so we will imagine our Neural Network
is already trained up, primed and ready to go.
Power up
You can implement a hidden layer that sits between the input and
output layers.
From this new cobweb of arrows, representing the synapses, you begin
to understand how these factors, in differing combinations, cast a wider
net of possibilities. The result is a far more detailed composite
picture of what you are looking for.
Let’s go step-by-step
We begin with the four variables on the left and the top neuron of the
hidden layer in the middle. All four variable all be connected to the
neuron by synapses.
However, not all of the synapses are weighted. They will either
have a 0 value or non 0 value.
The former indicates importance while the latter means they will be
discarded.
For instance, the Area and Distance variable may be valued as non 0.
Which means they are weighted. This means they matter. The other two
variables, Bedrooms and Age, aren’t weighted and so are not
considered by that first neuron. Got it? Good.
You may wonder why that first neuron is only considering two
of the four variables.
In this case, it is common on the property market that larger homes
become cheaper the further they are from the city. That’s a basic fact.
So what this neuron may be doing is looking specifically for properties
that are large but are not so far from the city.
This is speculation
We have not yet done deep learning on training Neural Networks.
Based on the variables at hand this is an educated guess as to how the
neuron is processing these variables.
Once the Distance and Area criteria have been met, the neuron applies
an activation function and makes its own calculations. These two
variables will then contribute to the price in the final output layer.
If you have a new property with three or four bedrooms and large
square footage, you can see how the neuron has identified the value of
such a place, regardless of its distance to the city.
The way these neurons work and interact means the network
itself is extremely flexible , allowing it to look for specific things and
therefore make a comprehensive search for whatever it is they have
been trained to identify.
Hardcoding
This is where you tell the programme specific rules and outcomes, then
guide it throughout the entire process, accounting for every possible
option the programme will have to deal with. It is a more involved
process with more interaction between the programmer and
programme.
Neural Networking
With a Network, you create the facility for the programme to understand
what it needs to do independently. You provide the inputs, state the
desired outputs, and let it work its own way from one to the other.
Let’s revisit some old ground before we plow into fresh earth
When input variables go along the synapses and into the neuron, where
an activation function is applied, the resulting data is the output value,
Ŷ.
When the output value and actual value are almost touching we know
we have optimal weights and can therefore proceed to the testing
phase, or application phase.
Example
Say we have three input values.
Hours of study
Hours of sleep
Result in a mid-semester quiz
Then the cost function is applied and the data goes in reverse
through the Neural Network.
Go Bigger
What if you wanted to apply this process to an entire class? You would
simply need to duplicate these smaller Networks and repeat the process
again.
However, once you do this you will not have a number of smaller
networks processing separately side-by-side.
If you have thirty students the Y / Ŷ comparison will occur thirty times in
each smaller network but the cost function will be applied to all of them
together.
Additional Reading
For further reading on this process, I will direct you towards an article
named A list of cost functions used in neural networks, alongside
applications. CrossValidated (2015).
I hope you found something useful in this deep learning article. See you
next time.
Gradient Descent
This is a continuation of the last deep learning section on how
Neural Networks learn. If you recall, we summed up the learning
process for Neural Networks by focusing on one particular area.
Backpropagation
This is when you feed the end data back through the Neural Network
and then adjust the weighted synapses between the input value and the
neuron.
This puts all the more emphasis on how and why we alter the weights.
If you make the wrong alteration it would be like having a car with the
front axle pointing slightly to the left, and while you are driving you to let
go of the steering wheel but keep your foot on the gas.
We want to eliminate all other weights except for the one right
at the bottom of the U-shape , the one closest to 0. The closer to 0,
the lower the difference between output value and actual value.
Instead of going through every weight one at a time, and ticking every
wrong weight off as you go, you instead look at the angle of the
cost function line.
If the slope is negative, like it is from the highest red dot on the above
line, that means you must go downhill from there.
For the sake of remembering the journey downwards, you can picture it
like the old Prince of Persia video games.
You need to jump across the open space, from ledge to ledge, until you
get to the checkpoint at the bottom of the level, which is our 0.
It is a far more efficient method , I’m sure you will agree. That is just
an examination of the process in principle.
At the end of the next section, I will provide you with some additional
reading and there will be further information once we reach the
application phase.
Just to remind you, the weights are the only thing we can adjust as we
look to bring the difference between our output value and our actual
value as close to 0 as possible. This is important because the smaller
the difference between those two value, the better our Neural Network
is performing.
Gradient Descent
You can see the aforementioned method illustrated in the graph below.
What you want is the global minimum. See where the red dot is
sitting? That is the local minimum. It is a low point on the gradient. But
not the lowest. That would be the purple dot, further down.
If you criss-cross down the way you would in the first graph in the
article, you may end up stuck at the second-lowest point and never
reach optimal weight, which would be the lowest.
For this reason, another method has been developed for when you
have a non-convex shape on your graph.
With this, you won’t get bogged down in the nucks higher up on the line
before you get to the bottom.
This alternative method is called Stochastic Gradient Descent.
Now, with the Gradient Descent method, all the weights for all ten rows
of data are adjusted simultaneously. This is good because it means
you start with the same weights across the board every time. The
weights move like a flock of birds, all together in the same direction, all
the time.
It is a deterministic method .
On the other hand, this can take a little longer than the Stochastic
method because with every adjustment, every piece of data has to be
loaded up all over again.
Row after row we do this until all ten rows have been run through.
The Stochastic method is a lighter algorithm and therefore
faster than its all-encompassing cousin.
Mini-batch method
With this, you don’t have to either run one row at a time or every row at
once. If you have a hundred rows of data you can do five or ten at a
time, then update your weights after every subsection has been run.
Backpropagation
Adjusting
If they are far apart we use Backpropagation to run the data in reverse
through the Neural Network. The weights get adjusted. Consequently,
we are brought closer to the actual value.
In a Nutshell
Additional Reading
I mentioned this in the last post but I will include some additional
reading material in case you want to get into some extra deep learning
on this process. Neural Networks and Deep Learning by Michael
Nielsen (2015) is all you will need to go full Einstein on this subject.
Give it a look.