Machine Learning For Everyone
Machine Learning For Everyone
Everyone
vas3k.com
I decided to write a post I’ve been wishing existed for a long time. A
simple introduction for those who always wanted to understand
machine learning. Only real-world problems, practical solutions,
simple language, and no high-level theorems. One and for everyone.
Whether you are a programmer or a manager.
Let's roll.
Big tech companies are huge fans of neural networks. Obviously. For
them, 2% accuracy is an additional 2 billion in revenue. But when you
are small, it doesn't make sense. I heard stories of the teams spending
a year on a new recommendation algorithm for their e-commerce
website, before discovering that 99% of traffic came from search
engines. Their algorithms were useless. Most users didn't even open
the main page.
Clearly, the machine will learn faster with a teacher, so it's more
commonly used in real-life tasks. There are two types of such tasks:
classification – an object's category prediction, and
regression – prediction of a specific point on a numeric
axis.
Classification
"Splits objects based at one of the attributes known beforehand.
Separate socks by based on color, documents based on language, music
by genre"
From here onward you can comment with additional information for
these sections. Feel free to write your examples of tasks. Everything
is written here based on my own subjective experience.
Everyone who works with finance and analysis loves regression. It's
even built-in to Excel. And it's super smooth inside — the machine
simply tries to draw a line that indicates average correlation. Though,
unlike a person with a pen and a whiteboard, machine does so with
mathematical accuracy, calculating the average interval to every dot.
When the line is straight — it's a linear regression, when it's curved –
polynomial. These are two major types of regression. The other ones
are more exotic. Logistic regression is a black sheep in the flock.
Don't let it trick you, as it's a classification method, not regression.
If you want to get deeper into this, check these series: Machine
Learning for Humans. I really love and recommend it!
Labeled data is luxury. But what if I want to create, let's say, a bus
classifier? Should I manually take photos of million fucking buses on
the streets and label each of them? No way, that will take a lifetime,
and I still have so many games not played on my Steam account.
Clustering
"Divides objects based on unknown features. Machine chooses the best
way"
Nowadays used:
Apple Photos and Google Photos use more complex clustering. They're
looking for faces in photos to create albums of your friends. The app
doesn't know how many friends you have and how they look, but it's
trying to find the common facial features. Typical clustering.
However, you may have problems with colors like Cyan◼︎-like colors.
Is it green or blue? Here comes the K-Means algorithm.
All done. Clusters defined, stable, and there are exactly 32 of them.
Here is a more real-world explanation:
Searching for the centroids is convenient. Though, in real life clusters
not always circles. Let's imagine you're a geologist. And you need to
find some similar minerals on the map. In that case, the clusters can
be weirdly shaped and even nested. Also, you don't even know how
many of them to expect. 10? 100?
K-means does not fit here, but DBSCAN can be helpful. Let's say, our
dots are people at the town square. Find any three people standing
close to each other and ask them to hold hands. Then, tell them to
start grabbing hands of those neighbors they can reach. And so on,
and so on until no one else can take anyone's hand. That's our first
cluster. Repeat the process until everyone is clustered. Done.
Dimensionality Reduction
(Generalization)
In the real world, every big retailer builds their own proprietary
solution, so nooo revolutions here for you. The highest level of tech
here — recommender systems. Though, I may be not aware of a
breakthrough in the area. Let me know in the comments if you have
something to share.
Knowledge of all the road rules in the world will not teach the
autopilot how to drive on the roads. Regardless of how much data we
collect, we still can't foresee all the possible situations. This is why its
goal is to minimize error, not to predict all the moves.
More effective way here — to build a virtual city and let self-driving
car to learn all its tricks there first. That's exactly how we train auto-
pilots right now. Create a virtual city based on a real map, populate
with pedestrians and let the car learn to kill as few people as
possible. When the robot is reasonably confident in this artificial
GTA, it's freed to test in the real streets. Fun!
Remember the news about AI beating a top player at the game of Go?
Despite shortly before this it being proved that the number of
combinations in this game is greater than the number of atoms in the
universe.
This means the machine could not remember all the combinations and
thereby win Go (as it did chess). At each turn, it simply chose the best
move for each situation, and it did well enough to outplay a human
meatbag.
However, the neural networks got all the hype today, while the words
like "boosting" or "bagging" are scarce hipsters on TechCrunch.
Despite all the effectiveness the idea behind these is overly simple. If
you take a bunch of inefficient algorithms and force them to correct
each other's mistakes, the overall quality of a system will be higher
than even the best individual algorithms.
You'll get even better results if you take the most unstable algorithms
that are predicting completely different results on small noise in
input data. Like Regression and Decision Trees. These algorithms are
so sensitive to even a single outlier in input data to have models go
mad.
Data in random subsets may repeat. For example, from a set like
"1-2-3" we can get subsets like "2-2-3", "1-2-2", "3-1-2" and so on. We
use these new datasets to teach the same algorithm several times and
then predict the final answer via simple majority voting.
The most famous example of bagging is the Random Forest algorithm,
which is simply bagging on the decision trees (which were illustrated
above). When you open your phone's camera app and see it drawing
boxes around people's faces — it's probably the results of Random
Forest work. Neural networks would be too slow to run real-time yet
bagging is ideal given it can calculate trees on all the shaders of a
video card or on these new fancy ML processors.
Same as in bagging, we use subsets of our data but this time they are
not randomly generated. Now, in each subsample we take a part of
the data the previous algorithm failed to process. Thus, we make a
new algorithm learn to fix the errors of the previous one.
Nowadays there are three popular tools for boosting, you can read a
comparative report in CatBoost vs. LightGBM vs. XGBoost
These weights tell the neuron to respond more to one input and less
to another. Weights are adjusted when training — that's how the
network learns. Basically, that's all there is to it.
To prevent the network from falling into anarchy, the neurons are
linked by layers, not randomly. Within a layer neurons are not
connected, but they are connected to neurons of the next and previous
layers. Data in the network goes strictly in one direction — from the
inputs of the first layer to the outputs of the last.
A well trained neural network can fake the work of any of the
algorithms described in this chapter (and frequently works more
precisely). This universality is what made them widely popular.
Finally we have an architecture of human brain they said we just need
to assemble lots of layers and teach them on any possible data they
hoped. Then the first AI winter) started, then it thawed, and then
another wave of disappointment hit.
Convolutional neural networks are all the rage right now. They are
used to search for objects on photos and in videos, face recognition,
style transfer, generating and enhancing images, creating effects like
slow-mo and improving image quality. Nowadays CNNs are used in
all the cases that involve pictures and videos. Even in your iPhone
several of these networks are going through your nudes to detect
objects in those. If there is something to detect, heh.
First of all, if a cat had its ears down or turned away from the
camera: you are in trouble, the neural network won't see a thing.
Output would be several tables of sticks that are in fact the simplest
features representing objects edges on the image. They are images on
their own but built out of sticks. So we can once again take a block of
8x8 and see how they match together. And again and again…
This operation is called convolution, which gave the name for the
method. Convolution can be represented as a layer of a neural
network, because each neuron can act as any function.
When we feed our neural network with lots of photos of cats it
automatically assigns bigger weights to those combinations of sticks
it saw the most frequently. It doesn't care whether it was a straight
line of a cat's back or a geometrically complicated object like a cat's
face, something will be highly activating.
The beauty of this idea is that we have a neural net that searches for
the most distinctive features of the objects on its own. We don't need
to pick them manually. We can feed it any amount of images of any
object just by googling billions of images with it and our net will
create feature maps from sticks and learn to differentiate any object
on its own.
All because modern voice assistants are trained to speak not letter by
letter, but on whole phrases at once. We can take a bunch of voiced
texts and train a neural network to generate an audio-sequence
closest to the original speech.
In other words, we use text as input and its audio as the desired
output. We ask a neural network to generate some audio for the given
text, then compare it with the original, correct errors and try to get
as close as possible to ideal.
Here we'll be helped by the fact that text, speech or music are
sequences. They consist of consecutive units like syllables. They all
sound unique but depend on previous ones. Lose this connection and
you get dubstep.
This approach had one huge problem - when all neurons remembered
their past results, the number of connections in the network became
so huge that it was technically impossible to adjust all the weights.
When a neural network can't forget, it can't learn new things (people
have the same flaw).
The first decision was simple: limit the neuron memory. Let's say, to
memorize no more than 5 recent results. But it broke the whole idea.
You can take speech samples from anywhere. BuzzFeed, for example,
took Obama's speeches and trained a neural network to imitate his
voice. As you see, audio synthesis is already a simple task. Video still
has issues, but it's a question of time.
There are many more network architectures in the wild. I recommend
a good article called Neural Network Zoo, where almost all types of
neural networks are collected and briefly explained.
That's wrong.
If this were the case, every human must beat animals in everything
but it's not true. The average squirrel can remember a thousand
hidden places with nuts — I can't even remember where are my keys.
Ok, multiply 1680 by 950 right now in your mind. I know you won't
even try, lazy bastards. But give you a calculator — you'll do it in two
seconds. Does this mean that the calculator just expanded the
capabilities of your brain?
If yes, can I continue to expand them with other machines? Like, use
notes in my phone to not to remember a shitload of data? Oh, seems
like I'm doing it right now. I'm expanding the capabilities of my brain
with the machines.