Semi-: Supervised Learning
Semi-: Supervised Learning
1
also known as data mining or
pattern extraction
Brunton, Steven L., et al. "Data-Driven Aerospace Engineering: Reframing the
Industry with Machine Learning." arXiv preprint arXiv:2008.10740 (2020).
2
Semi-supervised learning
▪ In semi-supervised learning, the dataset contains both labelled and unlabelled examples.
▪ Usually, the quantity of unlabeled examples is much higher than the number of labelled
examples.
▪ The goal of a semi-supervised learning algorithms the same as the goal of the supervised
learning algorithm.
▪ The hope here is that, by using many unlabelled examples, a learning algorithm can find (we
might say “produce” or “compute”) a better model.
https://www.dropbox.com/s/wxybbtbiv64yf0j/Chapter1.pdf?dl=0
3
Semi-supervised learning
https://analyticsindiamag.com/self-supervised-learni
ng-vs-semi-supervised-learning-how-they-differ/
How to deal with the lack of labels
5
How it is done in practice: crowdsourcing vs. curated crowds (how do company label
the data)
6
Weak supervision
▪ If hand labelling is so problematic, what if we don’t use hand labels altogether? One
approach that has gained popularity is weak supervision with tools such as Snorkel.
▪ The core idea of Snorkel is labelling function: a function that encodes subject matter
expertise.
▪ People often rely on heuristics to label data. For example, a doctor might use this heuristics
to decide whether a patient’s case should be prioritized as emergent.
▪ If the nurse’s note mentions serious conditions like pneumonia, the patient’s case should be
given priority consideration.
7
What is Snorkel?
▪ The Snorkel project started at Stanford in 2016 with a simple technical bet: that it would
increasingly be the training data, not the models, algorithms, or infrastructure, that decided
whether a machine learning project succeeded or failed.
▪ Given this premise, we set out to explore the radical idea that you could bring mathematical
and systems structure to the messy and often entirely manual process of training data
creation and management, starting by empowering users to programmatically label, build,
and manage training data.
https://www.snorkel.org/ 8
Active Learning: #1
▪ Active learning (also called “query learning,” or sometimes “optimal experimental design” in
the statistics literature) is a subfield of machine learning and, more generally, artificial
intelligence.
▪ The key hypothesis is that, if the learning algorithm is allowed to choose the data from
which it learns—to be “curious,” if you will—it will perform better with less training.
▪ In active learning, you first provide a small number of labelled examples. The model is
trained on this "seed" dataset. Then, the model "asks questions" by selecting the unlabeled
data points it is unsure about, so the human can "answer" the questions by providing labels
for those points. The model updates again and the process is repeated until the
performance is good enough. By having the human iteratively teach the model, it's
possible to make a better model, in less time, with much less labelled data.
https://humanloop.com/blog/why-you-should-be-using-active-learning/ 9
Active Learning: #2
▪ For any supervised learning system to perform well, it must often be trained on hundreds
(even thousands) of labeled instances.
▪ Sometimes these labels come at little or no cost, such as the the “spam” flag you mark on
unwanted email messages, or the five-star rating you might give to films on a social
networking website.
▪ In these cases you provide such labels for free, but for many other more sophisticated
supervised learning tasks, labelled instances are very difficult, time-consuming, or
expensive to obtain. An example of this is speech recognition.
10
Active Learning: #3
▪ Active learning systems attempt to overcome the labeling bottleneck by asking queries in
the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator).
▪ In this way, the active learner aims to achieve high accuracy using as few labeled instances
as possible, thereby minimizing the cost of obtaining labeled data.
▪ Active learning is well-motivated in many modern machine learning problems where data
may be abundant but labels are scarce or expensive to obtain.
11
Active Learning: #4
12
Active learning: #5
13
If active learning is so great, then why doesn't
everyone use it?
▪ You need to have a lot of infrastructure to connect different teams of people responsible for
data labelling vs model training.
▪ Most of the software libraries being used assume all of your data is labelled before you train
a model, so to use active learning you have to write a tonne of boiler plate.
▪ Modern deep learning models are very slow to update, so frequently retraining them from
scratch is painful. Nobody wants to label a hundred examples then wait 24 hours for a
model to be fully re-trained before labelling the next 100.
▪ Academic papers uses small dataset which is not something we use in the industry.
https://humanloop.com/blog/why-you-should-be-using-active-learning/ 14
Semi-supervised vs. self-supervised
• Self-supervised learning obtains a supervisory signal from the data by leveraging the
underlying structure. The general method for self-supervised learning is to predict
unobserved or hidden part of the input.
• Self-supervised learning uses the data structure to learn, it can use various supervisory
signals across large datasets without relying on labels.
https://analyticsindiamag.com/self-supervised-learning-vs-se
mi-supervised-learning-how-they-differ/
Self-supervised
There are three significant advantages to self-supervised learning:
● Scalability: Supervised learning technique needs labelled data to predict the outcome for unknown
data. However, it may need a large dataset to build models that make accurate predictions. Manual
data labelling is time-consuming and often not practical.
● Improved capabilities: Self-supervised learning has significant applications in computer vision for
performing tasks such as colourisation, 3D rotation, depth completion, and context filling. Speech
recognition is another area where self-supervised learning thrives.
● Human intervention: Self-supervised learning automatically generates labels without human
intervention.
https://analyticsindiamag.com/self-supervised-learning-vs-se
mi-supervised-learning-how-they-differ/
Self-supervised: issue
• Despite its various advantages, self-supervised learning suffers from
uncertainty.
• In cases such as Google’s BERT model, where variables are discrete,
this technique works well.
• However, in the case of variables with continuous distribution
(variables obtained only by measuring), this technique has failed to
generate successful results.
https://analyticsindiamag.com/self-supervised-learning-vs-se
mi-supervised-learning-how-they-differ/
Curriculum learning
18
Curriculum learning
• Curriculum learning studies how you can improve model performance by
first teaching about simple concepts before teaching complex ones.
• Active learning naturally enforces a curriculum on your models and helps
them achieve better overall performance.
• Curriculum learning can provide performance improvements over the
standard training approach based on random data shuffling. However,
the necessity of finding a way to rank the samples from easy to hard, as
well as the right pacing function for introducing more difficult data seemed
to have limit the usage of the curriculum approaches.
19
How many flavours of curriculum learning are there?
• A model M is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P , improves with experience E. Thus, you
can then apply CL to these 4 options:
• Applied to the experience E, t
• Applied to model M,
• Applied to the class of tasks T
• Applied to the performance measure P
• Linking curriculum learning to continuation methods allows us to see these 4 options under the same
light: namely to smoothing the loss function.
• They are in fact equivalent because these are all different instances of the original formulation which
can be viewed as a continuation method. The continuation method is a well-known approach in
non-convex optimization, which starts from a simple (smoother) objective function that is easy to
optimize. Then, the objective function is gradually transformed into less smooth versions until it reaches
the original (non-convex) objective function.
• Pay attention that more less convex does not mean less smooth. In a way very non-convex function can
be very smooth not necessarily non-smooth!
Soviany, Petru, et al. "Curriculum learning: A survey." arXiv preprint arXiv:2101.10382 (2021).
20
Curriculum learning
• These forms of curriculum are somewhat equivalent, each bears its own
advantages and disadvantages.
• For a list of options to to rank the data for images, audio an text see the
reference below.
21
Soviany, Petru, et al. "Curriculum learning: A survey." arXiv preprint arXiv:2101.10382 (2021).
Human in the loop
Active learning
Reinforcement
22
Human in the loop (aka HITL)
• Human-in-the-loop (HITL) is a branch of artificial intelligence that leverages both human and
machine intelligence to create machine learning models. In a traditional human-in-the-loop
approach, people are involved in a virtuous circle where they train, tune, and test a particular
algorithm.
• When should you use human-in-the-loop machine learning?
• For training: As we discussed above, humans can be used to provide labeled data for model
training. This is probably the most common place you’ll see data scientists use a HitL approach.
• For tuning or testing: Humans can also help tune a model for higher accuracy. Say your model
is unconfident about a certain set of decisions, like if a certain image is in fact a cat. Human
annotators can score those decisions, effectively telling the model, “yes, this is a cat” or “nope,
it’s a lamppost,” thus tuning it so it’s more accurate in the future.
• For example, if you look for specific information in a language that is only spoken by a few
thousand people, the machine learning algorithm may not find any examples to learn from. So, a
HITL approach helps to ensure the
https://appen.com/blog/human-in-the-loop/
23
What’s the difference between human-in-the-loop and active learning?
• Active learning generally refers to the humans handling low confidence units
and feeding those back into the model.
• Human-in-the-loop is broader, encompassing active learning approaches as
well as the creation of data sets through human labeling. Additionally, HitL can
sometimes (though rarely) refer to people simply validating (or invalidating) an
output without feeding those judgments back to the model. HITL AI enables
human verification and corrections. In practice it provides a workflow and UI for
humans (referred to as labelers in HITL) to review, validate and correct the data
extracted from documents by Human in the Loop processors. It is used across
Financial Services, Health, Manufacturing, Government and other industries.
• Take-home message: two completely different kinds of intelligence
(humans+computers) are being leveraged simultaneously.
https://cloud.google.com/document-ai/hitl
24
Active vs. reinforced learning
• Active learning is a special case of machine learning in which a learning algorithm can interactively query a user (or
some other information source) to label new data points with the desired outputs.
• Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take
actions in an environment in order to maximize the notion of cumulative reward.
• These definitions still create some overlapping area between the two. Here is a better explanation of the two:
• Active learning is a technique that is applied to Supervised Learning settings. In the supervised learning
paradigm, you train a system by providing inputs and expected outputs (labels). The system learns to mimic
the training data, ideally generalizing it to unseen but extrapolable cases. Active learning is applied normally
in cases where obtaining labels is expensive so, we obtain new labels dynamically, defining an algorithmic
strategy to maximize the usefulness of the new data points.
• Reinforcement learning is a different paradigm, where we don't have labels, and therefore cannot use
supervised learning. Instead of labels, we have a "reinforcement signal" that tells us "how good" the current
outputs of the system being trained are. Therefore, in reinforcement learning the system (ideally) learns a
strategy to obtain as good rewards as possible.
https://datascience.stackexchange.com/questions/85358/what-is-the-difference-between-active-lea
rning-and-reinforcement-learning
25
A Different Approach: Designing with a Human in the Loop
https://hai.stanford.edu/news/humans-loop-design-interactive-ai-systems
26
Where you cannot (less useful) implement the human-in-the-loop approach?
• Human-in-the-loop is not the concept you can implement in every machine learning project. Mainly HITL
approach is used, when there is not much data available yet, human-in-the-loop is suitable because, at this
stage, people can initially make much better judgments than machines are capable of.
• And human in the loop deep learning is used when humans and machine learning processes interact
to solve one or more of the following scenarios:Algorithms are not understanding the input.
27
The business case for the HITL approach: #1
• Businesses that don't invest in AI now will risk being disrupted by others that do.
• A properly built AI application can get us to a reasonable starting point – predictive
accuracy of 80% or more. If we use that 80% while working on the remaining 20%, we
can get closer to the 100% mark.
• This is where humans come in. By combining an AI deployment with a managed service
layer, an organization can manage the 20% inaccuracy and learn from exceptions.
• As a human processes these exceptions, the AI application learns and improves its
algorithms to increase accuracy. This is why keeping humans in the loop is essential
when fine-tuning AI applications.
• In fact, when it comes to selling, people will tend to catch you out exception-base
instances! So, you’d learn from them. Keep in mind that they key here is: don’t
sell automation but sell.
https://www.genpact.com/insight/article/why-ai-still-needs-humans-in-the-loop?gclid=CjwKCAiAlrSPBhBaEiwAuLSDUOVcNwxfghAil2qKHf 28
MmNEIyXahgRnqsjn3UPz-ntd_93zVzb4yKEhoChuwQAvD_BwE
The business case for the HITL approach: #2
https://www.genpact.com/insight/article/why-ai-still-needs-humans-in-the-loop?gclid=CjwKCAiAlrS
PBhBaEiwAuLSDUOVcNwxfghAil2qKHfMmNEIyXahgRnqsjn3UPz-ntd_93zVzb4yKEhoChuwQAv
D_BwE 29
HITL clarifications
30
How it is done in practice (active learning)
• In active learning (a special case of semi-supervised learning), you
first provide a small number of labelled examples. The model is
trained on this "seed" dataset. Then, the model "asks questions" by
selecting the unlabeled data points it is unsure about, so the human
can "answer" the questions by providing labels for those points. The
model updates again and the process is repeated until the
performance is good enough. By having the human iteratively
teach the model, it's possible to make a better model, in less
time, with much less labelled data.
• So how can the model find the next data point that needs labelling?
Some commonly used methods are:
• select the data points where the model's predictive distribution has
highest entropy
• pick data points where the model's favored prediction is least
confident
• train an ensemble of models and focus on the data points where they
disagree custom methods that leverage Bayesian deep learning to
get better uncertainty estimates.
• Depending of the No of labelled data the red line in the figure
becomes smaller or bigger. It is our goal as a engineer to find the
point where the the improvement is the greatest.
31
If active learning is so great, then why doesn't everyone use it?
• Most of our tools are not designed with Active Learning in mind. There are often different teams of people
responsible for data labelling vs model training but active learning requires these processes to be coupled.
If you do get these teams to work together, you still need a lot of infrastructure to connect model training to
annotation interfaces.
• Most of the software libraries being used assume all of your data is labelled before you train a model, so to
use active learning you have to write a tonne of boiler plate. You also need to figure out how best to host
your model, have the model communicate with a team of annotators and update itself as it gets data
asynchronously from different annotators.
• On top of this, modern deep learning models are very slow to update, so frequently retraining them from
scratch is painful. Nobody wants to label a hundred examples then wait 24 hours for a model to be fully
re-trained before labelling the next 100. Deep learning models also tend to have millions or billions of
parameters and getting good uncertainty estimates from these models is an open research problem.
•
If you read academic papers on active learning, you might think active learning will give you a small saving
in labels but for a huge amount of work. The papers are misleading though because they operate on
academic datasets that tend to be balanced/clean. They almost always label one example at a time and
they forget that not every data-point is equally easy to label. On more realistic problems with large class
imbalances, noisy data and variable labelling costs, the benefits can be much bigger than the literature
suggests. In some cases there can be a 10x reduction in labelling costs.
https://humanloop.com/blog/why-you-should-be-using-active-learning
32
How to use Active Learning today
33
How to select key samples?
Wu, Xingjiao, et al. "A Survey of Human-in-the-loop for Machine Learning." arXiv preprint
34
arXiv:2108.00941 (2021).
A concrete example
35
A word from experience
• But far more academic papers focus on how to adapt algorithms to new domains
without new training data than on how to annotate the right new training data
efficiently.
• Especially when the nature of the data is changing over time (which is also
common), using a handful of new annotations can be far more effective than
trying to adapt an existing model to a new domain of data.
• As they are not perfect, the strategies are often used together to find a
selection of unlabeled items that will maximize both uncertainty and
diversity.
Monarch, Robert Munro. Human-in-the-Loop Machine Learning: Active learning and annotation for
human-centered AI. Simon and Schuster, 2021.
37
active learning sampling strategies: Uncertainty, diversity, and random
sampling #2
Uncertainty sampling: this active
learning strategy is effective for
selecting unlabeled items near the
decision boundary.
• This means that you can do it when it is feasible to do it. Things get out of
control very easily especially for video annotation.
39
Software based on human in the loop
• http://www.wekinator.org/
40