0% found this document useful (0 votes)
16 views18 pages

neural network

Basics of neural network
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
16 views18 pages

neural network

Basics of neural network
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
— Twtrodustiow to aati fefal Nawal Nunsonks —¥ ANN is a Machine Learning model inspired by the networks of biological neurons found in our brains. © There is now a huge quantity of data available to train neural networks, and ANNs frequently outperform other ML techniques on very large and complex problems @ The tremendous increase in computing power since the 1990s now makes it possible to train large neural networks in a reasonable amount of time. This is in part due to Moore's law (the number of components in integrated circuits hasdoubled about every 2 years over the last 50 years), but also thanks to the gaming industry, which has stimulated the production of powerful GPU cards by the millions. Moreover, cloud platforms have made this power accessible to everyone ‘@ The training algorithms have been improved. To be fair they are only slightly different from the ones used in the 1990s, but these relatively small tweaks have had a huge positive impact. Some theoretical limitations of ANNs have turned out to be benign in practice. For example, many people thought that ANN training algorithms were doomed because they were likely to get stuck in local optima, but it turns out that this is rather rare in practice (and when it is the case, they are usually fairly close to the global optimum). @ ANNs seem to have entered a virtuous circle of funding and progress. Amazing products based on ANNs regularly make the headline news, which pulls more and more attention and funding toward them, resulting in more and more progress and even more amazing products. McCulloch and Pitts proposed a very simple model of the biological neuron, which later became known as an artificial neuron: it has one or more binary (on/off) inputs and one binary output. The artificial neuron activates its output when more than a certain number of its inputs are active. In their paper, they showed that even with such a simplified model it is possible to build a network of artificial neurons that computes any logical proposition you want. Neurons Connection @ ‘ y ®® © A © ¥ @ » Cea Comp udedion, c= C=AAB C=AVB C=AA-B + The first network on the left is the identity function: if neuron A is activated,then neuron C gets activated as well (since it receives two input signals from neuron A); but if neuron A is off, then neuron Cis off as well + The second network performs a logical AND: neuron C is activated only when both neurons A and B are activated (a single input signal is not enough to activate neuron C). + The third network performs a logical OR: neuron C gets activated if either neuron A or neuron Bis activated (or both). + Finally, if we suppose that an input connection can inhibit the neuron's activity (which is the case with biological neurons), then the fourth network computes a slightly more complex logical proposition: neuron C is activated only if neuron A is active and neuron B is off. If neuron A is active all the time, then you get a logical NOT: neuron C is active when neuron Bis off, and vice versa © Pouptson:- ge oe ap tw ops ANN asddiccbes , iwomted Oa 1457 by fi ee Sh baud ow aligtt Ac uent aatifre! al er KOd a tluenled ee end ef sewetime a Ler thet wr (LTU)- te and pulp are vwrabers (Visreod *% binay enloyy wy, and face nput sonniction A — orsoudadtd with aw wut + The TW ake A Wi fan ibs din puss (A=W BWrMet - Warn = TW) stom opplies fomion $0 tak sv Gund oukpult tre Avewlt Mayo) = Bepl2), wate Ze ww. Output: h(x) = step(x" w) a = x Stop function: step(z) | 4% Mvnt Wet peel \Z7 Weighted sum: z= x'w Weights baw (™) = are) x, % % Inputs D Haretrold Logie ont — Mer common pa fucker use a haat ons eta fonction & urd rirtead- i “LY 2d0 huawikt 5 : ae g iatde (2) 5 i 44720 Squ(2)= © 4Y2zro 1 34270 A single TLU can be used for simple linear binary classification. It computes a linear combination of the inputs, and if the result exceeds a threshold, it outputs the positive class. Otherwise it outputs the negative class (just like a Logistic Regression or linear SVM classifier) A Perceptron is simply composed of a single layer of TLUswith each TLU connected to all the inputs. When all the neurons in a layer are connected to every neuron in the previous layer (ie., its input neurons), the layer is called a fully connected layer, or a dense layer. The inputs of the Perceptron are fed to special passthrough neurons called input neurons: they output whatever input they are fed. All the input neurons form the input layer. Moreover, an extra bias feature is generally added (x0 = 1): it istypically represented using a special type of neuron called a bias neuron, which outputs 1 all the time, A Perceptron with two inputs and three outputs is represented in Figure . This Perceptron can classify instances simultaneously into three different binary classes, which makes it a multioutput classifier. Outputs . —B BUNK, poacpires witty Tu »\ Output e | layer +wo input purer , our bleu vimser = Oud Here output . RLLROLu- Bias neuron Input {always outputs 1) id Input neuron (passthrough) *1 % Inputs BO) cy corgi tomepese Het corpus 61 fully vonnutl ie hw, oF Cl to) In this equation + As always, X represents the matrix of input features. It has one row per instance and one column per feature. + The weight matrix W contains all the connection weights except for the ones from the bias neuron. It has one row per input neuron and one column per artificial neuron in the layer. + The bias vector b contains all the connection weights between the bias neuron and the artificial neurons. It has one bias term per artificial neuron. + The function @ is called the activation function: when the artificial neurons are TLUs, it is a step function —#Perceptron learning rule (weight update) : Wig lm OP = Wig + H(GE- yg) In this equation: + wi, jis the connection weight between the ith input neuron and the jth outputneuron. + xi is the ith input value of the current training instance. + y jis the output of the jth output neuron for the current training instance. + yjis the target output of the jth output neuron for the current training instance. «tis the learning rate. — Pauprew fy, Sklin dee wot outpat probit, frcrad Hoy make pedsion baud ew & bard trucked - ~> in their 1969 monograph Perceptrons, Marvin Minsky and Seymour Papert highligh-ted a number of serious weaknesses of Perceptrons—in particular, the fact that theyare incapable of solving some trivial problems (e,g., the Exclusive OR (XOR) classifi-cation problem; see the left side of Figure . This is true of any other linear classification model (such as Logistic Regression classifiers), but researchers had expectedmuch more from Perceptrons, and some were so disappointed that they droppedneural networks altogether in favor of higher-level problems such as logic, problemsolving, and search. It turns out that some of the limitations of Perceptrons can be eliminated by stackingmultiple Perceptrons. The resulting ANN is called a Multilayer Perceptron (MLP). AnMLP can solve the XOR problem, as you can verify by computing the output of theMLP represented on the right side of Figure: with inputs (0, 0) or (1, 1), the net-work outputs 0, and with inputs (0, 1) or (1, 0) it outputs 1 ie thc Muti hager Reraptacw and Barkpropagadion *- An MLP is composed of one (passthrough) input layer, one or more layers of TLUs,called hidden layers, and one final layer of TLUs called the output layer (seeFigure ). The layers close to the input layer are usually called the lower layers, andthe ones close to the outputs are usually called the upper layers. Every layer except theoutput layer includes a bias neuron and is fully connected to the next layer. ~S output ~P Aaeetedtione % MLP With two 7 layer Crpuk » Ore hidden (agen , anal *S, Hidden ‘YM oukpuk neurons « | layer Note !- “The signal flows only in one direction (from the inputs to the out-puts), so this architecture is an example of a feedforward neural net-work (FNN). ~ When an ANN contains a deep stack of hidden layers, it is called a deep neural net-work (DNN). The field of Deep Learning studies DNNs, and more generally modelscontaining deep stacks of computations. Even so, many people talk about DeepLearning whenever neural networks are involved (even shallow ones) Gradient Descent using an efficient technique for computing the gradients automatically: in just two passes through the network (ane forward, one backward), the backpropagation algo-ithm is able to compute the gradient of the network's error with regard to every sin-gle model parameter. In other words, it can find out how each connection weight and each bias term should be tweaked in order to reduce the error. Once it has these gra-dients, it just performs a regular Gradient Descent step, and the whole process isrepeated until the network converges to the solution. Automatically computing gradients is called automatic differentia-tion, or autodiff. There are various autodiff techniques, with differ-ent pros and cons. The one used by backpropagation is called reverse- “mode autodiff. Let's run through this algorithm in a bit more detail: + Ithandles one mini-batch at a time (for example, containing 32 instances each),and it goes through the full training set multiple times. Each pass is called an epoch + Each mini-batch is passed to the network's input layer, which sends it to the firsthidden layer. The algorithm then computes the output of all the neurons in thislayer (for every instance in the mini- batch). The result is passed on to the nextlayer, its output is computed and passed to the next layer, and so on until we getthe output of the last layer, the output layer. This is the forward pass: its exactlylike making predictions, except all intermediate results are preserved since theyare needed for the backward pass. + Next, the algorithm measures the network's output error (i.e, it uses a loss func-tion that compares the desired output and the actual output of the network, andreturns some measure of the error). + Then it computes how much each output connection contributed to the ertor.This is done analytically by applying the chain rule (perhaps the most fundamen-tal rule in calculus), which makes this step fast and precise. + The algorithm then measures how much of these error contributions came from each connection in the layer below, again using the chain rule, working backward until the algorithm reaches the input layer. As explained earlier, this reverse pass efficiently measures the error gradient across all the connection weights in the network by propagating the error gradient backward through the network (hence the name of the algorithm), + Finally, the algorithm performs a Gradient Descent step to tweak all the connec-tion weights in the network, using the error gradients it just computed. ao Sunamusys- for each training instance, the backpropagation algorithm first makes a prediction (forward pass) and measures the error, then goes through each layer in reverse to measure the error con-tribution from each connection (reverse pass), and finally tweaks the connection weights to reduce the error (Gradient Descent step) Noty '- It is important to initialize all the hidden layers’ connection weights randomly, or else training will fail. For example, if you initialize all weights and biases to zero, then all neurons in a given layer will be perfectly identical, and thus back propagation will affect them in exactly the same way, so they will remain identical. In other words,despite having hundreds of neurons per layer, your model will act as if it had only one neuron per layer: it won't be too smart. If instead you randomly initialize the weights, you break the symme-try and allow back propagation to train a diverse team of neurons. fe onde for t4 algorithm to wosk » We wild place rep fusution uly te Letistte [Hermotd) fumaulon= Hus War Kyertial puave te Bip furnctore Contains Sule fur fines Named) Goa cinta Ti work wits (gradi duet cannot move om flat ouafper) colle hogs ide furndiow hos Qa tall lined en 240 durivataus cuag whe, Ca Gb fo Woke soe proguc cr fing Hye ant (rom SHaword fumcton we bow ee eae vi 0 D revives , dit penile lp tevitvorn bur wot dh, leak 200. to ie duatvedine & Q for 2

You might also like