Introduction To AI
Introduction To AI
Deep Learning
A look at the types of data
• Data is raw information. Data might be facts, statistics, opinions, or any kind of content that is recorded in some
format. This could include voices, photos, names, and even dance moves!
• Data can be organized into the following three types.
• Structured data is typically categorized as quantitative data and is highly organized. Structured data is information
that can be organized in rows and columns. Perhaps you've seen structured data in a spreadsheet, like Google
Sheets or Microsoft Excel. Examples of structured data includes names, dates, addresses, credit card numbers, stock
information.
• Unstructured data, also known as dark data, is typically categorized as qualitative data. It cannot be processed and
analyzed by conventional data tools and methods. Unstructured data lacks any built-in organization, or structure.
Examples of unstructured data include images, texts, customer comments, medical records, and even song lyrics.
• Semi-structured data is the “bridge” between structured and unstructured data. It doesn't have a predefined data
model. It combines features of both structured data and unstructured data. It's more complex than structured data,
yet easier to store than unstructured data. Semi-structured data uses metadata to identify specific data
characteristics and scale data into records and preset fields. Metadata ultimately enables semi-structured data to be
better cataloged, searched, and analyzed than unstructured data. An example of semi-structured data is a video on
a social media site. The video by itself is unstructured data, but a video typically has text for the internet to easily
categorize that information, such as through a hashtag to identify a location.
How does machine learning
approach a problem?
• Two ways to solve dark data problems
• If AI doesn’t rely on programming instructions to work with unstructured data, how does AI do it? Machine learning can analyze
dark data far more quickly than a programmable computer can. To see why, consider the problem of finding a route through big
city traffic using a navigation system. It’s a dark data problem because solving it requires working with not only a complicated
street map, but also with changing variables like weather, traffic jams, and accidents. Let’s look at how two different systems
might try to solve this problem.
• The machine learning process is entirely different
• The machine learning process has advantages:
• It doesn’t need a database of all the possible routes from one place to another. It just needs to know where places are on the
map.
• It can respond to traffic problems quickly because it doesn’t need to store alternative routes for every possible traffic situation.
It notes where slowdowns are and finds a way around them through trial and error.
• It can work very quickly. While trying single turns one at a time, it can work through millions of tiny calculations.
• But machine learning has two more advantages that programmable computers lack:
• Machine learning can predict. You know this already. A machine can determine, “Based on traffic right now, this route is likely
to be faster than that one.” It knows this because it compared routes as it built them.
• Machine learning learns! It can notice that your car was delayed by a temporary detour and adjust its recommendations to
help other drivers.
Machine learning uses
probabilistic calculation
• There are two other ways to contrast classical and machine learning systems. One is deterministic and the other
is probabilistic.
• Let’s dig in and see what these two words mean.
• For a deterministic system, there must be an enormous, predetermined structure of routes—a gigantic database of
possibilities from which the machine can make its choice. If a certain route leads to the destination, then the machine
flags it as “YES”. If not, it flags it as “NO”. This is basically binary thinking: on or off, yes or no. This is the essence of a
computer program. The answer is either true or false, not a confidence value.
• Machine learning is probabilistic. It never says “YES” or “NO”. Machine learning is analog (like waves gradually going
up and down) rather than binary (like arrows pointing upward and downward). Machine learning constructs every
possible route to a destination and compares them in real time, including all the variables such as changing traffic. So,
a machine learning system doesn’t say, “This is the fastest route.” It says something like, “I am 84% confident that this
route will get you there in the shortest time.” You might have seen this yourself if you’ve traveled in a car with an up-
to-date GPS navigation system that offers you two or three choices with estimated times.
• If machine learning offers only probabilities, who makes the final decision?
• This can literally be a life-and-death question. Suppose you have a serious disease and your doctor offers you a
choice. Do you want your doctor to prescribe your treatment, or do you want the treatment that a machine learning
system determines is most likely to succeed?
• Machine learning solves problems in three ways:
• •Supervised learning
• •Unsupervised learning
• •Reinforcement learning
• Let's explore each one!
Supervised learning
• Supervised learning is about providing AI with enough examples to make
accurate predictions.
• All supervised learning algorithms need labeled data. Labeled data is data that is
grouped into samples that are tagged with one or more labels. In other words,
applying supervised learning requires you to tell your model:
• What the key characteristics of a thing are, also called features
• What the thing actually is
• For example, the information might be drawings and photos of animals, some of
which are dogs and are labeled “dog”. The machine will learn by identifying a
pattern for “dog”. When the machine sees a new dog photo and is asked, “What
is this?”, it will respond, “dog”, with high accuracy. This is known as
a classification problem.
Unsupervised learning
• In unsupervised learning, a person feeds a machine a large amount of information, asks a
question, and then the machine is left to figure out how to answer the question by itself.
• For example, the machine might be fed many photos and articles about dogs. It will
classify and cluster information about all of them. When shown a new photo of a dog,
the machine can identify the photo as a dog, with reasonable accuracy.
• Unsupervised learning is helpful when you don't know how to classify data. For example,
imagine you work for a banking institution and you have a large set of customer financial
data. You don't know what type of groups or categories to organize the data. Here, an
unsupervised learning algorithm could find natural groupings of similar customers in a
database, and then you could describe and label them.
• This type of learning has the ability to discover similarities and differences in information,
which makes it an ideal solution for exploratory data analysis, cross-selling strategies,
customer segmentation, and image recognition.
• Reinforcement learning is a machine learning model similar to supervised learning, but the
algorithm isn’t trained using sample data. This model learns as it goes by using trial and error. A
sequence of successful outcomes is reinforced to develop the best recommendation for a given
problem. The foundation of reinforcement learning is rewarding the “right” behavior and punishing
the “wrong” behavior.
• You might be wondering, what does it mean to "reward" a machine? Good question! Rewarding a
machine means that you give your agent positive reinforcement for performing the "right" thing and
negative reinforcement for performing the "wrong" things.
• As a machine learns through trial and error, it tries a prediction, then compares it with data in
its corpus.
• Each time the comparison is positive, the machine receives positive numerical feedback, or a reward.
• Each time the comparison is negative, the machine receives negative numerical feedback, or
a penalty.
• Over time, a machine’s predictions will grow to be more accurate. It accomplishes
this automatically based on feedback, rather than through human intervention.
• Machines can learn. They might begin performing tasks poorly but, with the help of artificial intelligence
(AI), they can improve their work, becoming experts at something that might take a human many years
to learn. In fact, even the AI in your smartphone can use machine learning to reason through subjects
that no human has ever been able to master.
• How is this possible? To find out, you'll explore three categories of AI:
• The term artificial intelligence describes computer systems that can apply reasoning to subjects that
previously required human intelligence.
• Machine learning can enable systems to predict and classify given data in response to ever-changing
data, somewhat like the way you learn from experience.
• Deep learning is a group of extremely powerful types of machine learning, many of which are inspired
by the operation of neural networks in the human brain.
• These AI systems do not have common sense or world knowledge, nor do they have a sense of self (at
least, not yet). But they do acquire knowledge and understanding through experience—a process
called cognition—accomplishing results that resemble human thinking. They do this primarily by using
complex rules called algorithms that help them analyze data.
Classical machine learning
• Classical machine learning began in the 1950s. AI systems learned by ingesting data and getting
better at recognizing patterns. The AI systems could predict things like the distance between
points or the intensity of values.
• Like all machine learning, the classical form depends on algorithms. Recall that algorithms are
mathematical expressions that output a result. Classical machine learning uses a small number
of algorithms in a relatively simple arrangement. Sometimes machine learning algorithms
are binary, which means that they output one of only two values. Typical binary results might be
a 1 or a 0, a YES or a NO, and a TRUE or a FALSE.
• Other classical learning algorithms are more complicated. For example, their result might be
represented as a position on a multidimensional graph rather than “this point” or “that point”.
Here are three typical algorithms that are used in classical computing:
• Decision tree
• Linear regression
• Logistic regression
Decision tree
• A decision tree is a supervised learning algorithm. It operates like a
flowchart. You can think of a flowchart as an upside-down decision
tree. The flowchart has a root node (where the flowchart begins),
branches that connect to internal nodes, and more branches that
connect to leaf nodes.
• Explore these features in the following diagram, which depicts a
decision tree about whether this is a good time to surf on the ocean.
Linear regression
• Linear regression is another type of algorithm. It relates to data that might be graphed as a
straight line. For example, a business might believe that more advertising spending leads to
better sales. This could be graphed as a series of dots that form a rising straight line, as
depicted here.
• As suggested in the chart, as advertising increases, so do sales. There are many possible
outcomes (different amounts of advertising lead to different amounts of sales), but the change
rises on the graph in a straight line.
• The situation is more complicated if a company’s actual sales show different data for different
products, at different locations, on different dates, and so on. With a large number of variables
and instances, the graph becomes a mass of dots that don’t arrange into a straight line at all.
Without adjustment, resulting graph is too general to help a business make a good decision.
That’s where linear regression can help. Linear regression can learn all the variables, then
calculate a reasonably accurate prediction of how advertising will impact sales at some time
and location in the future. In effect, linear regression resolves the mass of dots into a “most
likely” line that can be used for simple prediction.
Logistic regression
• In some situations, a relationship does not fall in a straight line. Sometimes a system uses values that require a specific,
limited kind of outcome, such as something between 0 and 1 (or NO and YES). In this situation, a graph can form what’s
called a sigmoid function, or an S-shaped curve, as shown in the accompanying example. For any set of variables, the
outcome (which is a point on the S-curve) falls between 0 and 1.
• Here’s a real-world example. Refer to the previous graph. Let’s say you want to know how many hours you should study in
order to pass an exam. You have the number of study hours and passing or failing status for 10 other students. “Hours of
studying” is a varying amount, in this case, between 1 and 5 hours. Passing the exam is a matter of NO or YES (either FAIL
or PASS).
• If you plot these two factors together as a logistical regression, you get an S curve in which 0 hours of study results in a
very low chance of passing, while 5.5 hours results in a very high chance. As shown in the chart, the variable “Hours of
studying” is along the x-axis. The values along the y-axis represent the values for the variable “Probability of passing
exam”.
• Here’s another way to understand the graph: it predicts that studying at least 4 hours gives you a very good chance of
passing the course.
• Comparing linear and logistic regressions
• Linear and logistic regressions are useful in the following ways:
• A linear regression answers a question such as “If this increases by X, how much will Y increase?”
• A logistic regression answers a question such as “If this increases by X, will the value of Y be closer to 0 or 1?
Inspired by the human brain
• Today, machine learning has evolved into a collection of powerful applications called
the deep learning ecosystem. The foundation for many applications is called a neural
network. A neural network uses electronic circuitry inspired by the way neurons
communicate in the human brain.
• In the brain, cells called neurons have a cell body at one end where the nucleus resides,
and a long axon leading to a set of branching terminals at the other end. Neurons
communicate to each other by receiving signals into the axon, altering those signals,
then transmitting them out through the terminals to other neurons. Researchers
estimate that a human brain has about 100 billion neurons, each one connected to up to
10,000 other neurons.
• In a neural network, a building block, called a perceptron, acts as the equivalent of a
single neuron. A perceptron has an input layer, one or more hidden layers, and
an output layer. A signal enters the input layer and the hidden layers run algorithms on
the signal. Then, the result is passed to the output layer.
• The hidden layers in a neural network resemble, as a
group, the long cell body that connects dendrites to
axons within a human brain cell. Those hidden layers
contain nodes. Each node runs an algorithm and bits of
additional code to test and adjust its result. When the
value reaches a certain threshold, the node “fires”.
• Note: A node often uses a sigmoid function to determine
whether or not to “fire”. As explained previously, a
sigmoid function can generate a binary answer, such as,
YES or NO. The binary answer tells the node whether or
not to fire. You can think of the threshold as a hurdle a
solution must jump over to give a result of YES.
• If there are other nodes connected to the node, they are
activated when the signal reaches them. If the other
nodes reach their own thresholds, then they fire. The
signal cascades down through the hidden layers in a way
that’s somewhat similar to how a signal passes down the
body of a human brain cell.
• Keep in mind that these resemblances are only
similarities. Neural networks are inspired by the human
brain, but the activities inside neural networks are quite
different.
Machine learning is often trial
and error
Once a neural network has ingested or already learned a certain amount of data, it stores the data in its “body of information”, called its corpus. In
order to learn, the neural network constantly tests new data or the results of its calculation against its corpus. If the network determines that the
new data or results don’t match the patterns it has already established, it modifies those patterns for a better fit. Sometimes, to improve a single
match, the network tests hundreds or thousands of modifications very rapidly and makes adjustments. Then, the network tests to determine if the
match is improving. So, step by step, the machine learns.
To visualize this, imagine that someone blindfolds you. They tell you that your assignment is to climb a hill and reach its exact top using the fewest
number steps possible. Then, they send you walking up the hill.
• Machine learning makes many guesses
• Machine learning uses its tremendous calculation speed to make many guesses that bring it closer and closer to an answer. It randomly makes its
first guess, sets that guess as a variable, then tests how accurately the guess fits with both old and new data. Next, it makes an adjustment to the
variable and tries again. Using mathematical processes to help it choose right-size adjustments, the system keeps on trying, getting closer and
closer to perfection but never quite reaching it.
• For this reason, many AI systems output a confidence value along with an answer or prediction. For example, a system predicting effective
treatments for a cancer patient might output two or three suggested approaches, along with a measure of how confident it is that each
treatment might work. This reflects how the system reaches those decisions. The system also leaves the final decision to the doctor who knows
the patient.
• Any computer can perform at least a crude kind of machine learning based on cycles of estimation. Classical machine learning can do it, too. But
depending on the complexity of a problem, a conventional computer or even a classical system might take days (or centuries!) to reach a
conclusion.
• In many modern applications of AI, the unstructured data involved is complex enough to overwhelm even a simple perceptron, such as to decide
whether to order pizza in a previous lesson. So, a perceptron requires more brainpower in the form of deep learning. Deep learning relies on
multiple layers of nodes (even multiple groups of perceptrons with multiple layers of nodes!) to finish the work in reasonable time.
From perceptrons to deep
learning
• DNN layers can be arranged in groups or elaborate blocks of groups
for greater power. DNNs can even be doubled in competing teams
that judge and learn from each other’s mistakes, without human
intervention. This creates powerful reinforcement learning