0% found this document useful (0 votes)
2 views

l14 Machine Learning

The document discusses the fundamentals of neural networks, highlighting their evolution due to increased data availability and computational power. It covers the architecture of supervised neural networks, including the use of non-linear activation functions and the importance of parameter tuning. Additionally, it references educational materials and the historical context of machine learning advancements since the 1990s.

Uploaded by

sashakayukov23
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

l14 Machine Learning

The document discusses the fundamentals of neural networks, highlighting their evolution due to increased data availability and computational power. It covers the architecture of supervised neural networks, including the use of non-linear activation functions and the importance of parameter tuning. Additionally, it references educational materials and the historical context of machine learning advancements since the 1990s.

Uploaded by

sashakayukov23
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

W4995 Applied Machine Learning

Neural Networks
13-12-2022

Guido van Capelleveen

(Prepared by: Stevan Rudinac)


Slide Credit
● Andreas Müller, lecturer at the Data Science Institute
at Columbia University
● Author of the book we will be using for this course

“Introduction to Machine Learning with Python”

● Great materials available at:


● https://github.com/amueller/applied_ml_spring_2017/
● https://amueller.github.io/applied_ml_spring_2017/
History
● Much of what we talk about today existed ~1990
● What changed?

– More data
– Faster computers (GPUs)
– Some improvements:
● Relu ‘activation functions’

● Drop-out

● Adam ‘type of learner’

● batch-normalization
Supervised Neural Networks
● Non-linear models for classification and regression
● Work well for very large datasets

● Non-convex optimization

● Notoriously slow to train – need for GPUs

● Use dot products etc → require preprocessing, similar to SVM


or linear models, unlike trees
● MANY variants (Convolutional nets, Gated Recurrent neural

networks, Long-Short-Term Memory, recursive neural networks,


variational autoencoders, general adverserial networks, neural
turing machines...)
Remember This?

ŷ = w[0] * x[0] + w[1] * x[1] + ... + w[p] * x[p] + b


Logistic regression drawn as neural
net
Basic Architecture
(for making predictions)

x h(x) = f(Wx + b) o(x) = g(W’h(x) + b’)


Multilayer Perceptron With
Single Hidden Layer
Much More Coefficients to Learn
● One between every input and every hidden unit,
which make up the hidden layer
● One between every unit in the hidden layer and the

output
● But, computing a series of weighted sums is the same

as computing just one weighted sum


● How to make this model truly more powerful than a

linear model?
Applying Nonlinear Function
●After computing a weighted sum for each hidden unit, a
nonlinear function is applied to the result
● Common choices:
– Rectifying nonlinearity (aka rectified linear unit or
relu): cuts off values below zero
– Tangens hyperbolicus (tanh): saturates to –1 for low
input values and +1 for high input values
●The result of this function is then used in the weighted
sum that computes the output, ŷ
Nonlinear activation function
Formula for MLP with tanh
Nonlinearity

h[0] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3])


h[1] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3])
h[2] = tanh(w[0, 0] * x[0] + w[1, 0] * x[1] + w[2, 0] * x[2] + w[3, 0] * x[3])
ŷ = v[0] * h[0] + v[1] * h[1] + v[2] * h[2]

● w - weights between the input x and the hidden layer h


● v - weights between the hidden layer h and the output ŷ

● weights v and w - learned from data

● x - input features

● ŷ - computed output

● h - intermediate computations
Can have arbitrary many layers
Estimating complexity
● Count the weights:

● Weights from inputs to first hidden layer =


number of features * number of hidden units in
layer
+
● Weights from each hidden layer to another

hidden layer = number of features * number of


hidden units in layer
+
● Weights from last hidden layers to y = number

of hidden units in last hidden layer


Parameter tuning
● Create a network large enough to overfit

●Shrink the network or add regularization (alpha)


to improve generalization
https://generatieveai.pleio.nl/
The AI act

You might also like