0% found this document useful (0 votes)
154 views

Semi-: Supervised Learning

Semi-supervised learning uses both labeled and unlabeled data for training machine learning models. It aims to produce better models by leveraging a large amount of unlabeled data along with a typically smaller amount of labeled data. Active learning is a type of semi-supervised learning where the model selects the most informative unlabeled examples to be labeled by an oracle like a human to iteratively improve the model with less labeled data. While promising, active learning has not been widely adopted in industry due to practical challenges around infrastructure, software support, and long training times of modern deep learning models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views

Semi-: Supervised Learning

Semi-supervised learning uses both labeled and unlabeled data for training machine learning models. It aims to produce better models by leveraging a large amount of unlabeled data along with a typically smaller amount of labeled data. Active learning is a type of semi-supervised learning where the model selects the most informative unlabeled examples to be labeled by an oracle like a human to iteratively improve the model with less labeled data. While promising, active learning has not been widely adopted in industry due to practical challenges around infrastructure, software support, and long training times of modern deep learning models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Semi-supervised Learning

1
also known as data mining or
pattern extraction
Brunton, Steven L., et al. "Data-Driven Aerospace Engineering: Reframing the
Industry with Machine Learning." arXiv preprint arXiv:2008.10740 (2020).

If the labels are discrete, such as


a categorical description of an
image (e.g., dog vs. cat), then the
supervised learning task is a
classification.
In other words, any
idea what our results
should look like?

If the labels are continuous, such


as the lift profile for a particular
airfoil shape, then the task is a
regression.

2
Semi-supervised learning
▪ In semi-supervised learning, the dataset contains both labelled and unlabelled examples.

▪ Usually, the quantity of unlabeled examples is much higher than the number of labelled
examples.

▪ The goal of a semi-supervised learning algorithms the same as the goal of the supervised
learning algorithm.

▪ The hope here is that, by using many unlabelled examples, a learning algorithm can find (we
might say “produce” or “compute”) a better model.

https://www.dropbox.com/s/wxybbtbiv64yf0j/Chapter1.pdf?dl=0
3
Semi-supervised learning

Semi-supervised learning technique typically involves the following steps:


● First, training the model with a small amount of labelled data (similar to what is done in
supervised learning) until the model gives good results.
● Using the model with unlabelled training or pseudo label dataset to predict the output.
● Link the labels from the labelled training data with the pseudo labels and the data inputs
from the labelled training data with the inputs in the unlabeled data.
● Train the model in the same way as one would in the case of the fully labelled dataset.

https://analyticsindiamag.com/self-supervised-learni
ng-vs-semi-supervised-learning-how-they-differ/
How to deal with the lack of labels

5
How it is done in practice: crowdsourcing vs. curated crowds (how do company label
the data)

6
Weak supervision
▪ If hand labelling is so problematic, what if we don’t use hand labels altogether? One
approach that has gained popularity is weak supervision with tools such as Snorkel.

▪ The core idea of Snorkel is labelling function: a function that encodes subject matter
expertise.

▪ People often rely on heuristics to label data. For example, a doctor might use this heuristics
to decide whether a patient’s case should be prioritized as emergent.

▪ If the nurse’s note mentions serious conditions like pneumonia, the patient’s case should be
given priority consideration.

7
What is Snorkel?
▪ The Snorkel project started at Stanford in 2016 with a simple technical bet: that it would
increasingly be the training data, not the models, algorithms, or infrastructure, that decided
whether a machine learning project succeeded or failed.

▪ Training data (e.g. label) over the algorithm

▪ Given this premise, we set out to explore the radical idea that you could bring mathematical
and systems structure to the messy and often entirely manual process of training data
creation and management, starting by empowering users to programmatically label, build,
and manage training data.

https://www.snorkel.org/ 8
Active Learning: #1
▪ Active learning (also called “query learning,” or sometimes “optimal experimental design” in
the statistics literature) is a subfield of machine learning and, more generally, artificial
intelligence.

▪ The key hypothesis is that, if the learning algorithm is allowed to choose the data from
which it learns—to be “curious,” if you will—it will perform better with less training.

▪ In active learning, you first provide a small number of labelled examples. The model is
trained on this "seed" dataset. Then, the model "asks questions" by selecting the unlabeled
data points it is unsure about, so the human can "answer" the questions by providing labels
for those points. The model updates again and the process is repeated until the
performance is good enough. By having the human iteratively teach the model, it's
possible to make a better model, in less time, with much less labelled data.

https://humanloop.com/blog/why-you-should-be-using-active-learning/ 9
Active Learning: #2

▪ For any supervised learning system to perform well, it must often be trained on hundreds
(even thousands) of labeled instances.

▪ Sometimes these labels come at little or no cost, such as the the “spam” flag you mark on
unwanted email messages, or the five-star rating you might give to films on a social
networking website.

▪ In these cases you provide such labels for free, but for many other more sophisticated
supervised learning tasks, labelled instances are very difficult, time-consuming, or
expensive to obtain. An example of this is speech recognition.

10
Active Learning: #3

▪ Active learning systems attempt to overcome the labeling bottleneck by asking queries in
the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator).

▪ In this way, the active learner aims to achieve high accuracy using as few labeled instances
as possible, thereby minimizing the cost of obtaining labeled data.

▪ Active learning is well-motivated in many modern machine learning problems where data
may be abundant but labels are scarce or expensive to obtain.

11
Active Learning: #4

12
Active learning: #5

▪ Active learning still requires hand


labels, but instead of randomly
labelling a subset of data, you label
the samples that are most helpful to
your model.

▪ Does the last bit sound like Bayesian


optimisation?

13
If active learning is so great, then why doesn't
everyone use it?
▪ You need to have a lot of infrastructure to connect different teams of people responsible for
data labelling vs model training.

▪ Most of the software libraries being used assume all of your data is labelled before you train
a model, so to use active learning you have to write a tonne of boiler plate.

▪ Modern deep learning models are very slow to update, so frequently retraining them from
scratch is painful. Nobody wants to label a hundred examples then wait 24 hours for a
model to be fully re-trained before labelling the next 100.

▪ Academic papers uses small dataset which is not something we use in the industry.

https://humanloop.com/blog/why-you-should-be-using-active-learning/ 14
Semi-supervised vs. self-supervised
• Self-supervised learning obtains a supervisory signal from the data by leveraging the
underlying structure. The general method for self-supervised learning is to predict
unobserved or hidden part of the input.

• Self-supervised learning uses the data structure to learn, it can use various supervisory
signals across large datasets without relying on labels.

• A self-supervised learning system aims at creating a data-efficient artificial intelligent system.


It is generally referred to as extension or even improvement over unsupervised learning
methods.

• However, as opposed to unsupervised learning, self-supervised learning does not focus on


clustering and grouping.

https://analyticsindiamag.com/self-supervised-learning-vs-se
mi-supervised-learning-how-they-differ/
Self-supervised
There are three significant advantages to self-supervised learning:

● Scalability: Supervised learning technique needs labelled data to predict the outcome for unknown
data. However, it may need a large dataset to build models that make accurate predictions. Manual
data labelling is time-consuming and often not practical.
● Improved capabilities: Self-supervised learning has significant applications in computer vision for
performing tasks such as colourisation, 3D rotation, depth completion, and context filling. Speech
recognition is another area where self-supervised learning thrives.
● Human intervention: Self-supervised learning automatically generates labels without human
intervention.

https://analyticsindiamag.com/self-supervised-learning-vs-se
mi-supervised-learning-how-they-differ/
Self-supervised: issue
• Despite its various advantages, self-supervised learning suffers from
uncertainty.
• In cases such as Google’s BERT model, where variables are discrete,
this technique works well.
• However, in the case of variables with continuous distribution
(variables obtained only by measuring), this technique has failed to
generate successful results.

https://analyticsindiamag.com/self-supervised-learning-vs-se
mi-supervised-learning-how-they-differ/
Curriculum learning

18
Curriculum learning
• Curriculum learning studies how you can improve model performance by
first teaching about simple concepts before teaching complex ones.
• Active learning naturally enforces a curriculum on your models and helps
them achieve better overall performance.
• Curriculum learning can provide performance improvements over the
standard training approach based on random data shuffling. However,
the necessity of finding a way to rank the samples from easy to hard, as
well as the right pacing function for introducing more difficult data seemed
to have limit the usage of the curriculum approaches.

19
How many flavours of curriculum learning are there?

• A model M is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P , improves with experience E. Thus, you
can then apply CL to these 4 options:
• Applied to the experience E, t
• Applied to model M,
• Applied to the class of tasks T
• Applied to the performance measure P
• Linking curriculum learning to continuation methods allows us to see these 4 options under the same
light: namely to smoothing the loss function.
• They are in fact equivalent because these are all different instances of the original formulation which
can be viewed as a continuation method. The continuation method is a well-known approach in
non-convex optimization, which starts from a simple (smoother) objective function that is easy to
optimize. Then, the objective function is gradually transformed into less smooth versions until it reaches
the original (non-convex) objective function.
• Pay attention that more less convex does not mean less smooth. In a way very non-convex function can
be very smooth not necessarily non-smooth!

Soviany, Petru, et al. "Curriculum learning: A survey." arXiv preprint arXiv:2101.10382 (2021).
20
Curriculum learning
• These forms of curriculum are somewhat equivalent, each bears its own
advantages and disadvantages.

• However, performing curriculum by gradually increasing the model’s


capacity (for instance by adding extra layers) does not suffer from this
problem.

• For a list of options to to rank the data for images, audio an text see the
reference below.

21
Soviany, Petru, et al. "Curriculum learning: A survey." arXiv preprint arXiv:2101.10382 (2021).
Human in the loop
Active learning
Reinforcement

22
Human in the loop (aka HITL)

• Human-in-the-loop (HITL) is a branch of artificial intelligence that leverages both human and
machine intelligence to create machine learning models. In a traditional human-in-the-loop
approach, people are involved in a virtuous circle where they train, tune, and test a particular
algorithm.
• When should you use human-in-the-loop machine learning?

• For training: As we discussed above, humans can be used to provide labeled data for model
training. This is probably the most common place you’ll see data scientists use a HitL approach.
• For tuning or testing: Humans can also help tune a model for higher accuracy. Say your model
is unconfident about a certain set of decisions, like if a certain image is in fact a cat. Human
annotators can score those decisions, effectively telling the model, “yes, this is a cat” or “nope,
it’s a lamppost,” thus tuning it so it’s more accurate in the future.
• For example, if you look for specific information in a language that is only spoken by a few
thousand people, the machine learning algorithm may not find any examples to learn from. So, a
HITL approach helps to ensure the

https://appen.com/blog/human-in-the-loop/
23
What’s the difference between human-in-the-loop and active learning?

• Active learning generally refers to the humans handling low confidence units
and feeding those back into the model.
• Human-in-the-loop is broader, encompassing active learning approaches as
well as the creation of data sets through human labeling. Additionally, HitL can
sometimes (though rarely) refer to people simply validating (or invalidating) an
output without feeding those judgments back to the model. HITL AI enables
human verification and corrections. In practice it provides a workflow and UI for
humans (referred to as labelers in HITL) to review, validate and correct the data
extracted from documents by Human in the Loop processors. It is used across
Financial Services, Health, Manufacturing, Government and other industries.
• Take-home message: two completely different kinds of intelligence
(humans+computers) are being leveraged simultaneously.

https://cloud.google.com/document-ai/hitl
24
Active vs. reinforced learning

• Active learning is a special case of machine learning in which a learning algorithm can interactively query a user (or
some other information source) to label new data points with the desired outputs.
• Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take
actions in an environment in order to maximize the notion of cumulative reward.
• These definitions still create some overlapping area between the two. Here is a better explanation of the two:
• Active learning is a technique that is applied to Supervised Learning settings. In the supervised learning
paradigm, you train a system by providing inputs and expected outputs (labels). The system learns to mimic
the training data, ideally generalizing it to unseen but extrapolable cases. Active learning is applied normally
in cases where obtaining labels is expensive so, we obtain new labels dynamically, defining an algorithmic
strategy to maximize the usefulness of the new data points.
• Reinforcement learning is a different paradigm, where we don't have labels, and therefore cannot use
supervised learning. Instead of labels, we have a "reinforcement signal" that tells us "how good" the current
outputs of the system being trained are. Therefore, in reinforcement learning the system (ideally) learns a
strategy to obtain as good rewards as possible.

https://datascience.stackexchange.com/questions/85358/what-is-the-difference-between-active-lea
rning-and-reinforcement-learning
25
A Different Approach: Designing with a Human in the Loop

• What if, instead of thinking of automation as the removal of human


involvement from a task, we imagined it as the selective inclusion of
human participation? The result would be a process that harnesses the
efficiency of intelligent automation while remaining amenable to human
feedback, all while retaining a greater sense of meaning.

https://hai.stanford.edu/news/humans-loop-design-interactive-ai-systems
26
Where you cannot (less useful) implement the human-in-the-loop approach?

• Human-in-the-loop is not the concept you can implement in every machine learning project. Mainly HITL
approach is used, when there is not much data available yet, human-in-the-loop is suitable because, at this
stage, people can initially make much better judgments than machines are capable of.

• And human in the loop deep learning is used when humans and machine learning processes interact
to solve one or more of the following scenarios:Algorithms are not understanding the input.

• When data input is interpreted incorrectly.


• Algorithms don’t know how to perform the task.
• To make humans more efficient and accurate.
• To make the machine learning model more accurate.
• When the cost of errors is too high in ML development.
• When the data you’re looking for is rare or not available.

27
The business case for the HITL approach: #1

• Businesses that don't invest in AI now will risk being disrupted by others that do.
• A properly built AI application can get us to a reasonable starting point – predictive
accuracy of 80% or more. If we use that 80% while working on the remaining 20%, we
can get closer to the 100% mark.
• This is where humans come in. By combining an AI deployment with a managed service
layer, an organization can manage the 20% inaccuracy and learn from exceptions.
• As a human processes these exceptions, the AI application learns and improves its
algorithms to increase accuracy. This is why keeping humans in the loop is essential
when fine-tuning AI applications.
• In fact, when it comes to selling, people will tend to catch you out exception-base
instances! So, you’d learn from them. Keep in mind that they key here is: don’t
sell automation but sell.

https://www.genpact.com/insight/article/why-ai-still-needs-humans-in-the-loop?gclid=CjwKCAiAlrSPBhBaEiwAuLSDUOVcNwxfghAil2qKHf 28
MmNEIyXahgRnqsjn3UPz-ntd_93zVzb4yKEhoChuwQAvD_BwE
The business case for the HITL approach: #2

• To dive a bit deeper, AI is augmentative because it solves only one part of


a three-part problem – prediction first, then judgment and, finally, action.
Imagine a driver on a racetrack. If AI predicts that another driver is going to
overtake them from the left, they must judge whether to stay the course,
block, or accelerate. The driver then acts by executing the move safely and
swiftly.

• Similarly, magnetic resonance imaging (MRI) uses a variety of signals


bouncing off internal body parts to predict what lies under the skin. But a
doctor has to apply judgment to decide the best way to treat an injury
revealed by an MRI scan.

https://www.genpact.com/insight/article/why-ai-still-needs-humans-in-the-loop?gclid=CjwKCAiAlrS
PBhBaEiwAuLSDUOVcNwxfghAil2qKHfMmNEIyXahgRnqsjn3UPz-ntd_93zVzb4yKEhoChuwQAv
D_BwE 29
HITL clarifications

• The training algorithm is no different than before.


• For example, you can wait for 100 wrong predictions, and then feed those into
your classifier via fine-tuning at some small learning rate. This will marginally
improve it's accuracy on that category in the future. You could in theory just
update it after every single prediction, but this could conceivably result in poor
performance due to small batch size.
• The issue with this logic is that it's not enough to suddenly change your mind on
the given sample, the question is, whether you'll be able to spot similar samples
in the future. If the raccoon was really difficult to spot in the first place, the
marginal gain of having a single example pointed out to you might be quite
small.
• So is the learning really, prospect really massive? If the NNs learn to generalise,
is our action to single out the exception, going against the whole idea of AI?
• We're told so often that more data is better, that it's easy to forget that the quality
of the data matters just as much as the quantity.
https://stats.stackexchange.com/questions/361707/active-learning-with-human-in-the-loop

30
How it is done in practice (active learning)
• In active learning (a special case of semi-supervised learning), you
first provide a small number of labelled examples. The model is
trained on this "seed" dataset. Then, the model "asks questions" by
selecting the unlabeled data points it is unsure about, so the human
can "answer" the questions by providing labels for those points. The
model updates again and the process is repeated until the
performance is good enough. By having the human iteratively
teach the model, it's possible to make a better model, in less
time, with much less labelled data.

• So how can the model find the next data point that needs labelling?
Some commonly used methods are:

• select the data points where the model's predictive distribution has
highest entropy
• pick data points where the model's favored prediction is least
confident
• train an ensemble of models and focus on the data points where they
disagree custom methods that leverage Bayesian deep learning to
get better uncertainty estimates.
• Depending of the No of labelled data the red line in the figure
becomes smaller or bigger. It is our goal as a engineer to find the
point where the the improvement is the greatest.
31
If active learning is so great, then why doesn't everyone use it?

• Most of our tools are not designed with Active Learning in mind. There are often different teams of people
responsible for data labelling vs model training but active learning requires these processes to be coupled.
If you do get these teams to work together, you still need a lot of infrastructure to connect model training to
annotation interfaces.
• Most of the software libraries being used assume all of your data is labelled before you train a model, so to
use active learning you have to write a tonne of boiler plate. You also need to figure out how best to host
your model, have the model communicate with a team of annotators and update itself as it gets data
asynchronously from different annotators.
• On top of this, modern deep learning models are very slow to update, so frequently retraining them from
scratch is painful. Nobody wants to label a hundred examples then wait 24 hours for a model to be fully
re-trained before labelling the next 100. Deep learning models also tend to have millions or billions of
parameters and getting good uncertainty estimates from these models is an open research problem.

If you read academic papers on active learning, you might think active learning will give you a small saving
in labels but for a huge amount of work. The papers are misleading though because they operate on
academic datasets that tend to be balanced/clean. They almost always label one example at a time and
they forget that not every data-point is equally easy to label. On more realistic problems with large class
imbalances, noisy data and variable labelling costs, the benefits can be much bigger than the literature
suggests. In some cases there can be a 10x reduction in labelling costs.

https://humanloop.com/blog/why-you-should-be-using-active-learning
32
How to use Active Learning today

• ModAL: is built on top of scikitlearn


• Prodigy: Prodigy is an annotation interface built by the makers of Spacy
• Labelbox provides interfaces for a variety of image annotations and has
recently added support for text as well. Unlike Prodigy, Labelbox has been
designed with teams of annotators in mind and has more tools for ensuring
that your labels are correct.

33
How to select key samples?

• The key technology for the human-in-the-loop is obtaining key samples


and labeling them with human intervention. At present, researchers mostly
use confidence-based methods to obtain key samples.
• This method plays an irreplaceable role in classification tasks. However,
for other tasks (for example, semantic segmentation, regression, and
target detection tasks), is not as easy.
• Active learning aims to train an accurate prediction model with the least
cost by marking the examples that provide the most information.

Wu, Xingjiao, et al. "A Survey of Human-in-the-loop for Machine Learning." arXiv preprint
34
arXiv:2108.00941 (2021).
A concrete example

This process could be any


labeling process: adding the topic to
news stories, classifying sports
photos according to the sport being
played, identifying the sentiment of a
social media comment, rating a video
on how explicit the content is, and so
on. In all cases, you could use
machine learning to automate some
of the process of labeling or to speed
up the human process.

Monarch, Robert Munro. Human-in-the-Loop Machine Learning: Active learning and


annotation for human-centered AI. Simon and Schuster, 2021.

35
A word from experience

• But far more academic papers focus on how to adapt algorithms to new domains
without new training data than on how to annotate the right new training data
efficiently.

• By contrast with academic machine learning, it is more common in industry to


improve model performance by annotating more training data.

• Especially when the nature of the data is changing over time (which is also
common), using a handful of new annotations can be far more effective than
trying to adapt an existing model to a new domain of data.

Monarch, Robert Munro. Human-in-the-Loop Machine Learning: Active learning and


36
annotation for human-centered AI. Simon and Schuster, 2021.
active learning sampling strategies: Uncertainty, diversity, and random
sampling #1

• Uncertainty sampling is the set of strategies for identifying unlabeled items


that are near a decision boundary in your current machine learning
model.

• Diversity sampling is the set of strategies for identifying unlabeled items


that are underrepresented or unknown to the machine learning model in its
current state.

• As they are not perfect, the strategies are often used together to find a
selection of unlabeled items that will maximize both uncertainty and
diversity.

Monarch, Robert Munro. Human-in-the-Loop Machine Learning: Active learning and annotation for
human-centered AI. Simon and Schuster, 2021.
37
active learning sampling strategies: Uncertainty, diversity, and random
sampling #2
Uncertainty sampling: this active
learning strategy is effective for
selecting unlabeled items near the
decision boundary.

One possible result from combining


One possible result of diversity
uncertainty sampling and diversity
sampling. This active learning
sampling. When the strategies are
strategy is effective for
combined, items are selected that are
selecting unlabeled items in
near diverse sections of the decision
different parts of the problem
boundary. Therefore, we are
space.
optimizing the chance of finding items
that are likely to result in a changed
Monarch, Robert Munro. Human-in-the-Loop Machine Learning: Active learning decision boundary.
and annotation for human-centered AI. Simon and Schuster, 2021.
38
When to use active learning
• You should use active learning when you can annotate only a small
fraction of your data and when random sampling will not cover the diversity
of data.

• This means that you can do it when it is feasible to do it. Things get out of
control very easily especially for video annotation.

39
Software based on human in the loop

• http://www.wekinator.org/

40

You might also like