Advances and Challenges in Meta-Learning A Technical Review
Advances and Challenges in Meta-Learning A Technical Review
Abstract—Meta-learning empowers learning systems with the where data is scarce or costly to obtain. Most existing approaches
ability to acquire knowledge from multiple tasks, enabling faster rely on either supervised learning of a representation tailored to
adaptation and generalization to new tasks. This review provides a single task, or unsupervised learning of a representation that
a comprehensive technical overview of meta-learning, emphasiz-
ing its importance in real-world applications where data may be captures general features that may not be well-suited to new
scarce or expensive to obtain. The article covers the state-of-the-art tasks. Furthermore, learning from scratch for each task is often
meta-learning approaches and explores the relationship between not feasible, especially in domains such as medicine, robotics,
meta-learning and multi-task learning, transfer learning, domain and rare language translation where data availability is limited.
adaptation and generalization, self-supervised learning, personal- To overcome these challenges, meta-learning has emerged as
ized federated learning, and continual learning. By highlighting the
synergies between these topics and the field of meta-learning, the a promising approach. Meta-learning enables models to quickly
article demonstrates how advancements in one area can benefit adapt to new tasks, even with few examples, and general-
the field as a whole, while avoiding unnecessary duplication of ize across them. While meta-learning shares similarities with
efforts. Additionally, the article delves into advanced meta-learning transfer learning and multitask learning, it goes beyond these
topics such as learning from complex multi-modal task distribu- approaches by enabling a learning system to learn how to learn.
tions, unsupervised meta-learning, learning to efficiently adapt to
data distribution shifts, and continual meta-learning. Lastly, the This capability is particularly valuable in settings where data is
article highlights open problems and challenges for future research scarce, costly to obtain, or where the environment is constantly
in the field. By synthesizing the latest research developments, this changing. While humans can rapidly acquire new skills by lever-
article provides a thorough understanding of meta-learning and aging prior experience and are therefore considered generalists,
its potential impact on various machine learning applications. We most deep learning models are still specialists and are limited
believe that this technical overview will contribute to the advance-
ment of meta-learning and its practical implications in addressing to performing well on specific tasks. Meta-learning bridges this
real-world problems. gap by enabling models to efficiently adapt to new tasks.
Index Terms—Deep neural networks, few-shot learning, meta-
learning, representation learning, transfer learning.
B. Contribution
This review article primarily discusses the use of meta-
I. INTRODUCTION learning techniques in deep neural networks to learn reusable
A. Context and motivation representations, with an emphasis on few-shot learning; it does
not cover topics such as AutoML and Neural Architecture
EEP representation learning has revolutionized the field
D of machine learning by enabling models to learn effective
features from data. However, it often requires large amounts of
Search [1], which are out of scope. Similarly, even though meta-
learning is often applied in the context of reinforcement learn-
ing [2], [3], it falls outside the scope of this article. Distinct from
data for solving a specific task, making it impractical in scenarios
existing surveys on meta-learning, such as [4], [5], [6], [7], [8],
this review article highlights several key differentiating factors:
Manuscript received 10 July 2023; revised 10 November 2023; accepted 21 r Inclusion of advanced meta-learning topics: In addition
January 2024. Date of publication 24 January 2024; date of current version 5
June 2024. This work was supported by the “Knowledge Foundation” (KK- to covering fundamental aspects of meta-learning, this
stiftelsen). Recommended for acceptance by M. Cho. (Corresponding author: review article delves into advanced topics such as learning
Anna Vettoruzzo.) from multimodal task distributions, meta-learning without
Anna Vettoruzzo, Mohamed-Rafik Bouguelia, and Thorsteinn Rögnvaldsson
are with the Center for Applied Intelligent Systems Research (CAISR), Halm- explicit task information, learning without data sharing
stad University, 301 18 Halmstad, Sweden (e-mail: [email protected]; among clients, adapting to distribution shifts, and contin-
[email protected]; [email protected]). ual learning from a stream of tasks. By including these
Joaquin Vanschoren is with the Automated Machine Learning Group, Eind-
hoven University of Technology, 5612 AZ Eindhoven, The Netherlands (e-mail: advanced topics, our article provides a comprehensive
[email protected]). understanding of the current state-of-the-art and highlights
KC Santosh is with the Applied AI Research Lab, Department of Computer the challenges and opportunities in these areas.
Science, University of South Dakota, Vermillion, SD 57069 USA (e-mail: r Detailed exploration of relationship with other topics:
[email protected]).
Digital Object Identifier 10.1109/TPAMI.2024.3357847 We not only examine meta-learning techniques but also
© 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see
https://creativecommons.org/licenses/by/4.0/
4764 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 7, JULY 2024
establish clear connections between meta-learning and re- II. BASIC NOTATIONS AND DEFINITIONS
lated areas, including transfer learning, multitask learning,
In this section, we introduce some simple notations which will
self-supervised learning, personalized federated learning, be used throughout the article and provide a formal definition of
and continual learning. This exploration of the relation-
the term “task” within the scope of this article.
ships and synergies between meta-learning and these im-
We use θ (and sometimes also φ) to represent the set of
portant topics provides valuable insights into how meta- parameters (weights) of a deep neural network model. D =
learning can be efficiently integrated into broader machine
{(xj , yj )}nj=1 denotes a dataset, where inputs xj are sampled
learning frameworks.
r Clear and concise exposition: Recognizing the complexity from the distribution p(x) and outputs yj are sampled from
p(y|x). The function L(., .) denotes a loss function, for example,
of meta-learning, this review article provides a clear and
L(θ, D) represents the loss achieved by the model’s parameters
concise explanation of the concepts, techniques and appli- θ on the dataset D. The symbol T refers to a task, which is
cations of meta-learning. It is written with the intention primarily defined by the data-generating distributions p(x) and
of being accessible to a wide range of readers, including
p(y|x) that define the problem.
both researchers and practitioners. Through intuitive ex- In a standard supervised learning scenario, the objective is
planations, illustrative examples, and references to seminal to optimize the parameters θ by minimizing the loss L(θ, D),
works, we facilitate readers’ understanding of the founda-
where the dataset D is derived from a single task T , and the
tion of meta-learning and its practical implications. loss function L depends on that task. Formally, in this setting,
r Consolidation of key information: As a fast-growing field,
a task Ti is a triplet Ti {pi (x), pi (y|x), Li } that includes
meta-learning has information scattered across various
task-specific data-generating distributions pi (x) and pi (y|x), as
sources. This review article consolidates the most impor- well as a task-specific loss function Li . The goal is to learn a
tant and relevant information about meta-learning, present-
model that performs well on data sampled from task Ti . In a
ing a comprehensive overview in a single resource. By
more challenging setting, we consider learning from multiple
synthesizing the latest research developments, this survey tasks {Ti }Ti=1 , which involves (a dataset of) multiple datasets
becomes an indispensable guide to researchers and practi-
{Di }Ti=1 . In this scenario, a set of training tasks is used to learn
tioners seeking a thorough understanding of meta-learning
a model that performs well on test tasks. Depending on the
and its potential impact on various machine learning specific setting, a test task can either be sampled from the training
applications.
tasks or completely new, never encountered during the training
By highlighting these contributions, this article complements
phase.
existing surveys and offers unique insights into the current state In general, tasks can differ in various ways depending on the
and future directions of meta-learning.
application. For example, in image recognition, different tasks
can involve recognizing handwritten digits or alphabets from
C. Organization different languages [2], [10], while in natural language process-
ing, tasks can include sentiment analysis [11], [12], machine
In this article, we provide the foundations of modern deep translation [13], and chatbot response generation [14], [15],
learning methods for learning across tasks. To do so, we first [16]. Tasks in robotics can involve training robots to achieve
define the key concepts and introduce relevant notations used different goals [17], while in automated feedback generation,
throughout the article in Section II. Then, we cover the basics tasks can include providing feedback to students on different
of multitask learning and transfer learning and their relation exams [18]. It is worth noting that tasks can share structures,
to meta-learning in Section III. In Section IV, we present an even if they appear unrelated. For example, the laws of physics
overview of the current state of meta-learning methods and underlying real data, the language rules underlying text data,
provide a unified view that allows us to categorize them into and the intentions of people all share common structures that
three types: black-box meta-learning methods, optimization- enable models to transfer knowledge across seemingly unrelated
based meta-learning methods, and meta-learning methods that tasks.
are based on distance metric learning [9]. In Section V, we delve
into advanced meta-learning topics, explaining the relationship
between meta-learning and other important machine learning
III. FROM MULTITASK AND TRANSFER TO META-LEARNING
topics, and addressing issues such as learning from multimodal
task distributions, performing meta-learning without provided Meta-learning, multitask learning, and transfer learning en-
tasks, learning without sharing data across clients, learning compass different approaches aimed at learning across multiple
to adapt to distribution shifts, and continual learning from a tasks. Multitask learning aims to improve performance on a
stream of tasks. Finally, the article explores the application of set of tasks by learning them simultaneously. Transfer learning
meta-learning to real-world problems and provides an overview fine-tunes a pre-trained model on a new task with limited data.
of the landscape of promising frontiers and yet-to-be-conquered In contrast, meta-learning acquires useful knowledge from past
challenges that lie ahead. Section VI focuses on these challenges, tasks and leverages it to learn new tasks more efficiently. In this
shedding light on the most pressing questions and future research section, we transition from discussing “multitask learning” and
opportunities. “transfer learning” to introducing the topic of “meta-learning”.
VETTORUZZO et al.: ADVANCES AND CHALLENGES IN META-LEARNING: A TECHNICAL REVIEW 4765
Db from the target task Tb using gradient descent or any other During the meta-training phase, prior knowledge enabling
optimizer for several optimization steps. An example of the efficient learning of new tasks is extracted from a set of training
fine-tuning process for one gradient descent step is expressed tasks {Ti }Ti=1 . This is achieved by using a meta-dataset con-
as follows: sisting of multiple datasets {Di }Ti=1 , each corresponding to a
different training task. At meta-test time, a small training dataset
φ ← θ − α∇θ L(θ, Db ), Dnew is observed from a completely new task Tnew and used in
where φ denotes the parameters fine-tuned for task Tb , and α is conjunction with the prior knowledge to infer the most likely
the learning rate. posterior parameters. As in transfer learning, accessing prior
Models with pre-trained parameters θ are often available tasks at meta-test time is impractical. Although the datasets
online, including models pre-trained on large datasets such as {Di }i come from different data distributions (since they come
ImageNet for image classification [32] and language models like from different tasks {Ti }i ), it is assumed that the tasks them-
BERT [33], PaLM [34], LLaMA [35], and GPT-4 [36], trained selves (both for training and testing) are drawn i.i.d. from an
on large text corpora. Models pre-trained on other large and underlying task distribution p(T ), implying some similarities in
diverse datasets or using unsupervised learning techniques, as the task structure. This assumption ensures the effectiveness of
discussed in Section V-C, can also be used as a starting point for meta-learning frameworks even when faced with limited labeled
fine-tuning. data. Moreover, the more tasks that are available for meta-
However, as discussed in [37], it is crucial to avoid destroying training, the better the model can learn to adapt to new tasks,
initialized features when fine-tuning. Some design choices, such just as having more data improves performance in traditional
as using a smaller learning rate for earlier layers, freezing earlier machine learning.
layers and gradually unfreezing, or re-initializing the last layer, In the next section, we provide a more formal definition of
can help to prevent this issue. Recent studies such as [38] show meta-learning and various approaches to it.
that fine-tuning the first or middle layers can sometimes work
better than fine-tuning the last layers, while others recommend IV. META-LEARNING METHODS
a two-step process of training the last layer first and then fine- To gain a unified understanding of the meta-learning problem,
tuning the entire network [37]. More advanced approaches, such we can draw an analogy to the standard supervised learning
as STILTs [39], propose an intermediate step of further training setting. In the latter, the goal is to learn a set of parameters φ
the model on a labeled task with abundant data to mitigate the for a base model hφ (e.g., a neural network parametrized by φ),
potential degradation of pre-trained features. which maps input data x ∈ X to the corresponding output y ∈ Y
In [40], it was demonstrated that transfer learning via fine- as follows:
tuning may not always be effective, particularly when the target
hφ : X → Y
task dataset is very small or very different from the source tasks. (1)
x → y = hφ (x).
To investigate this, the authors fine-tuned a pre-trained universal
language model on specific text corpora corresponding to new To accomplish this, a typically large training dataset D =
tasks using varying numbers of training examples. Their results {(xj , yj )}nj=1 specific to a particular task T is used to learn
showed that starting with a pre-trained model outperformed φ.
training from scratch on the new task. However, when the In the meta-learning setting, the objective is to learn prior
size of the new task dataset was very small, fine-tuning on knowledge, which consists of a set of meta-parameters θ, for a
such a limited number of examples led to poor generalization procedure Fθ (Ditr , xts ). This procedure uses θ to efficiently learn
performance. To address this issue, meta-learning can be used from (or adapt to) a small training dataset Ditr = {(xk , yk )}K k=1
to learn a model that can effectively adapt to new tasks with from a task Ti , and then make accurate predictions on unlabeled
limited data by leveraging prior knowledge from other tasks. In test data xts from the same task Ti . As we will see in the following
fact, meta-learning is particularly useful for learning new tasks sections, Fθ is typically composed of two functions: (1) a meta-
from very few examples, and we will discuss it in more detail in learner fθ (.) that produces task-specific parameters φi ∈ Φ from
the remainder of this article. Ditr ∈ X K , and (2) a base model hφi (.) that predicts outputs
corresponding to the data in xts :
C. Meta-Learning Problem
fθ : X K → Φ hφi : X → Y
Meta-learning (or learning to learn) is a field that aims to Ditr → φi = fθ (Ditr ), x → y = hφi (x).
surpass the limitations of traditional transfer learning by adopt- (2)
ing a more sophisticated approach that explicitly optimizes for Note that the process of obtaining task-specific parameters φi =
transferability. As discussed in Section III-B, traditional transfer fθ (Ditr ) is often referred to as “adaptation” in the literature,
learning involves pre-training a model on source tasks and as it adapts to the task Ti using a small amount of data while
fine-tuning it for a new task. In contrast, meta-learning trains leveraging the prior knowledge summarized in θ. The objective
a network to efficiently learn or adapt to new tasks with only of meta-training is to learn the set of meta-parameters θ. This is
a few examples. Fig. 1(c) illustrates this approach, where at accomplished by using a meta-dataset {Di }Ti=1 , which consists
meta-training time we learn to learn tasks, and at meta-test time of a dataset of datasets, where each dataset Di = {(xj , yj )}nj=1
we learn a new task efficiently. is specific to a task Ti .
VETTORUZZO et al.: ADVANCES AND CHALLENGES IN META-LEARNING: A TECHNICAL REVIEW 4767
performance of most meta-learning approaches tends to deteri- B. Meta-Learning & Personalized Federated Learning
orate as the dissimilarity among tasks increases [84], [85], [86],
Federated learning (FL) is a distributed learning paradigm
[87], indicating that a globally shared set of meta-parameters θ
where multiple clients collaborate to train a shared model while
may not adequately capture the heterogeneity among tasks and
preserving data privacy by keeping their data locally stored.
enable fast adaptation.
FedAvg [106] is a pioneering method that combines local
To address this challenge, MMAML [88] builds upon the
stochastic gradient descent on each client with model averaging
standard MAML approach by estimating the mode of tasks
on a central server. This approach performs well when local
sampled from a multimodal task distribution p(T ) and adjusting
data across clients is independent and identically distributed
the initial model parameters accordingly. Another approach
(IID). However, in scenarios with heterogeneous (non-IID) data
proposed in [89] involves learning a meta-regularization condi-
distributions, regularization techniques [107], [108], [109] have
tioned on additional task-specific information. However, obtain- been proposed to improve local learning.
ing such additional task information may not always be feasible. Personalized federated learning (PFL) is an alternative ap-
Alternatively, some methods propose learning multiple model
proach that aims to develop customized models for individual
initializations θ1 , θ2 , . . . , θM and selecting the most suitable clients while leveraging the collaborative nature of FL. Popular
one for each task, leveraging clustering techniques applied in PFL methods include L2GD [110], which combines local and
either the task-space or parameter-space [90], [91], [92], [93], or
global models, as well as multi-task learning methods like
relying on the output of an additional network, as in MUSE [94]. pFedMe [111], Ditto [112], and FedPAC [113]. Clustered or
CAVIA [65] partitions the initial model parameters into shared group-based FL approaches [114], [115], [116], [117] learn
parameters across all tasks and task-specific context parameters,
multiple group-based global models. In contrast, meta-learning-
while LGM-Net [95] directly generates classifier weights based based methods interpret PFL as a meta-learning algorithm,
on an encoded task representation.
where personalization to a client aligns with adaptation to
A series of related works (but outside of the meta-learning
a task [118]. Notably, various combinations of MAML-type
field) aim to build a “universal representation” that encompasses methods with FL architectures have been explored in [118],
a robust set of features capable of achieving strong perfor-
[119], [120] to find an initial shared point that performs well after
mance across multiple datasets (or modes) [44], [96], [97], [98],
personalization to each client’s local dataset. Additionally, the
[99], [99], [100]. This representation is subsequently adapted to authors of [121] proposed ARUBA, a meta-learning algorithm
individual tasks in various ways. However, these approaches
inspired by online convex optimization, which enhances the
are currently limited to classification problems and do not
performance of FedAvg.
leverage meta-learning techniques to efficiently adapt to new To summarize, there is a growing focus on addressing FL
tasks.
challenges in non-IID data settings. The integration of meta-
A more recent line of research focuses on cross-domain learning has shown promising outcomes, leading to enhanced
meta-learning, where knowledge needs to be transferred from personalization and performance in PFL methods.
tasks sampled from a potentially multimodal distribution p(T )
to target tasks sampled from a different distribution. One notable
study, BOIL [63], reveals that the success of meta-learning C. Unsupervised Meta-Learning With Tasks Construction
methods, such as MAML, can be attributed to large changes in In meta-training, constructing tasks typically relies on labeled
the representation during task learning. The authors emphasize data. However, real-world scenarios often involve mostly, or
the importance of updating only the body (feature extractor) only, unlabeled data, requiring techniques that leverage unla-
of the model and freezing the head (classifier) during the adap- beled data to learn feature representations that can transfer to
tation phase for effective cross-domain adaptation. Building on downstream tasks with limited labeled data. One alternative to
this insight, DAML [101] introduces tasks from both seen and address this is through “self-supervised learning” (also known as
pseudo-unseen domains during meta-training to obtain domain- “unsupervised pre-training”) [122], [123], [124]. This involves
agnostic initial parameters capable of adapting to novel classes training a model on a large unlabeled dataset, as depicted in
in unseen domains. In [102], the authors propose a transfer- Fig. 7, to capture informative features. Contrastive learn-
able meta-learning algorithm with a meta task adaptation to ing [122], [125] is commonly used in this context, aiming to
minimize the domain divergence and thus facilitate knowledge learn features by bringing similar examples closer together while
transfer across domains. To further improve the transferability of pushing differing examples apart. The learned features can then
cross-domain knowledge, [103] and [104] propose to incorpo- be fine-tuned on a target task Tnew with limited labeled data
rate semi-supervised techniques into the meta-learning frame- Dnew
tr
, leading to improved performance compared to training
work. Specifically, [103] combines the representation power of from scratch. Another promising alternative is “unsupervised
large pre-trained language models (e.g., BERT [33]) with the meta-learning,” which aims to automatically construct diverse
generalization capability of prototypical networks enhanced by and structured training tasks from unlabeled data. These tasks
SMLMT [105] to achieve effective generalization and adapta- can then be used with any meta-learning algorithm, such as
tion to tasks from new domains. In contrast, [104] promotes the MAML [2] and ProtoNet [76]. In this section, we explore meth-
idea of task-level self-supervision by leveraging multiple views ods for meta-training without predefined tasks and investigate
or augmentations of tasks. strategies for automatically constructing tasks for meta-learning.
4772 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 7, JULY 2024
as in [145], [146] or via translation between domains using Cy- E. Meta-Learning & Continual Learning
cleGAN [147] as in [148], [149], [150]. Other approaches focus This section explores the application of meta-learning to
on aligning the feature distribution of multiple source domains continual learning, where learners continually accumulate ex-
with the target domain [151] or they address the multi-target perience over time to more rapidly acquire new knowledge or
domain adaptation scenario [152], [153], [154] with models skills. Continual learning scenarios can be divided into task-
capable of adapting to multiple target domains. However, these incremental learning, domain-incremental learning, and class-
methods face limitations when dealing with insufficient labeled incremental learning, depending on whether task identity is
data in the source domain or when quick adaptation to new target provided at test time or must be inferred by the algorithm [170].
domains is required. Additionally, they assume the input-output In this section, we focus on approaches that specifically address
relationship (i.e., p(y|x)) to be the same across domains. To task/class-incremental learning.
solve these problems, some methods [153], [155], [156], [157] Traditionally, meta-learning has primarily focused on sce-
combine meta-learning with domain adaptation. In particular, narios where a batch of training tasks is available. However,
ARM [155] leverages contextual information extracted from real-world situations often involve tasks presented sequentially,
batches of unlabeled data to learn a model capable of adapting allowing for progressive leveraging of past experience. This is
to distribution shifts. illustrated in Fig. 9, and examples include tasks that progres-
Effective domain generalization via meta-learning: Domain sively increase in difficulty or build upon previous knowledge,
generalization enables models to perform well on new and or robots learning diverse skills in changing environments.
unseen domains without requiring access to their data, as il- Standard online learning involves observing tasks in a se-
lustrated in Fig. 8. This is particularly useful in scenarios quential manner, without any task-specific adaptation or use
where access to data is restricted due to real-time deployment of past experience to accelerate adaptation. To tackle this is-
requirements or privacy policies. For instance, an object de- sue, researchers have proposed various approaches, includ-
tection model for self-driving cars trained on three types of ing memory-based methods [171], [172], [173], regularization-
roads may need to be deployed to a new road without any based methods [174], [175], [176] and dynamic architectural
data from that domain. In contrast to domain adaptation, which methods [177], [178], [179]. However, each of these methods
requires access to (unlabeled) data from a specific target domain has its own limitations, such as scalability, memory inefficiency,
during training to specialize the model, domain generalization time complexity, or the need for task-specific parameters. Meta-
belongs to the inductive setting. Most domain generalization learning has emerged as a promising approach for addressing
methods aim to train neural networks to learn domain-invariant continual learning. In [180], the authors introduced ANML, a
representations that are consistent across domains. For instance, framework that meta-learns an activation-gating function that
domain adversarial training [158] trains the network to make enables context-dependent selective activation within a deep
predictions based on features that cannot be distinguished be- neural network. This selective activation allows the model to
tween domains. Another approach is to directly align the rep- focus on relevant knowledge and avoid catastrophic forgetting.
resentations between domains using similarity metrics, such as Other approaches such as MER [181], OML [182], and LA-
in [159]. Data augmentation techniques are also used to enhance MAML [183] use gradient-based meta-learning algorithms to
the diversity of the training data and improve generalization optimize various objectives such as gradient alignment, inner
across domains [160], [161], [162]. Another way to improve representations, or task-specific learning rates and learn update
generalization to various domains is to use meta-learning and rules that avoid negative transfer. These algorithms enable faster
applying the episodic training paradigm typical of MAML [58], learning over time and enhanced proficiency in each new task.
as in [163], [164], [165], [166], [167], [168], [169]. For instance,
MLDG [166] optimizes a model by simulating the train-test
domain shift during the meta-training phase. MetaReg [167] VI. OPEN CHALLENGES & OPPORTUNITIES
proposes to meta-learn a regularization function that improves Meta-learning has been a promising area of research that has
domain generalization. DADG [169] contains a discriminative shown impressive results in various machine learning domains.
adversarial learning component to learn a set of general features However, there are still open challenges that need to be addressed
4774 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 7, JULY 2024
in order to further advance the field. In this section, we discuss field. Several efforts have been made towards creating use-
some of these challenges and categorize them into three groups. ful benchmark datasets, including Meta-Dataset [82], Meta-
Addressing these challenges can lead to significant advances in Album Dataset [192], NEVIS’22 [193], Meta-World Bench-
meta-learning, which could potentially lead to more generaliz- mark [194], Visual Task Adaptation Benchmark [195], Taskon-
able and robust machine learning models. omy Dataset [196], VALUE Benchmark [197], and BIG
Bench [198]. However, further work is needed to ensure that the
datasets are comprehensive and representative of the diversity
A. Addressing Fundamental Problem Assumptions of real-world problems that meta-learning aims to address.
The first category of challenges pertains to the fundamental Some ways with which existing benchmarks can be improved
assumptions made in meta-learning problems. to better reflect real-world problems and challenges in meta-
One such challenge is related to generalization to out-of- learning are: 1) to increase the diversity and complexity of tasks
distribution tasks and long-tailed task distributions. Indeed, that are included; 2) to consider more realistic task distributions
adaptation becomes difficult when the few-shot tasks observed at that can change over time; and 3) to include real-world data that is
meta-test time are from a different task distribution than the ones representative of the challenges faced in real-world applications
seen during meta-training. While there have been some attempts of meta-learning. For example, including medical data, financial
to address this challenge, such as in [102], [184], it still remains data, time-series data, or other challenging types of data (besides
unclear how to address it. Ideas from the domain generalization images and text) can help improve the realism and relevance of
and robustness literature could provide some hints and poten- benchmarks.
tially be combined with meta-learning to tackle these long-tailed Furthermore, developing benchmarks that reflect these more
task distributions and out-of-distribution tasks. For example, realistic scenarios can help improve the generalization and ro-
possible directions are to define subtle regularization techniques bustness of algorithms. This ensures that algorithms are tested on
to prevent the meta-parameters from being very specific to the a range of scenarios and that they are robust and generalizable
distribution of the training tasks, or use subtle task augmentation across a wide range of tasks. Better benchmarks are essential
techniques to generate synthetic tasks that cover a wider range for progress in machine learning and AI, as they challenge
of task variations. current algorithms to find common structures, reflect real-world
Another challenge in this category involves dealing with the problems, and have a significant impact in the real world.
multimodality of data. While the focus has been on meta-training
over tasks from a single modality, the reality is that we may have
multiple modalities of data to work with. Human beings have C. Improving Core Algorithms
the advantage of being able to draw upon multiple modalities, The last category of challenges in meta-learning is centered
such as visual imagery, tactile feedback, language, and social around improving the core algorithms.
cues, to create a rich repository of knowledge and make more A major obstacle is the large-scale bi-level optimization
informed decisions. For instance, we often use language cues to problem encountered in popular meta-learning methods such
aid our visual decision-making processes. Rather than develop- as MAML. The computational and memory costs of such ap-
ing a prior that only works for a single modality, exploring the proaches can be significant, and there is a need to make them
concept of learning priors across multiple modalities of data is more practical, particularly for very large-scale problems, like
a fascinating area to pursue. Different modalities have different learning effective optimizers [199].
dimensionalities or units, but they can provide complementary In addition, a deeper theoretical understanding of various
forms of information. While some initial works in this direction meta-learning methods and their performance is critical to driv-
have been reported, including [185], [186], [187], there is still ing progress and pushing the boundaries of the field. Such
a long way to go in terms of capturing all of this rich prior insights can inform and inspire further advancements in the field
information when learning new tasks. and lead to more effective and efficient algorithms. To achieve
these goals, several fundamental questions can be explored,
including:
B. Providing Benchmarks and Real-World Problems 1) Can we develop theoretical guarantees on the sample com-
The second category of challenges is related to provid- plexity and generalization performance of meta-learning
ing/improving benchmarks to better reflect real-world problems algorithms? Understanding these aspects can help us de-
and challenges. sign more efficient and effective meta-learning algorithms
Meta-learning has shown promise in a diverse set of ap- that require less data or less tasks. While recent investi-
plications, including few-shot land cover classification [188], gations [200], [201], [202] have made notable strides in
few-shot dermatological disease diagnosis [184], automati- this domain, they represent just the initial steps toward
cally providing feedback on student code [18], one-shot im- a more extensive theoretical comprehension. Further re-
itation learning [189], drug discovery [190], motion predic- search is imperative to completely harness the potential of
tion [191], and language generation [16], to mention but meta-learning.
a few. However, the lack of benchmark datasets that accu- 2) Can we gain a better understanding of the optimization
rately reflect real-world problems with appropriate levels of landscape of meta-learning algorithms? For instance, can
difficulty and ease of use is a significant challenge for the we identify the properties of the objective function that
VETTORUZZO et al.: ADVANCES AND CHALLENGES IN META-LEARNING: A TECHNICAL REVIEW 4775
make it easier or harder to optimize? Can we design op- [5] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning
timization algorithms that are better suited to the bi-level in neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 44, no. 9, pp. 5149–5169, Sep. 2022.
optimization problem inherent in various meta-learning [6] R. Vilalta and Y. Drissi, “A perspective view and survey of meta-
approaches? learning,” Artif. Intell. Rev., vol. 18, pp. 77–95, 2002.
3) Can we design meta-learning algorithms that can better [7] M. Huisman, J. N. Van Rijn, and A. Plaat, “A survey of deep meta-
learning,” Artif. Intell. Rev., vol. 54, no. 6, pp. 4483–4541, 2021.
incorporate task-specific or domain-specific expert knowl- [8] L. Zou, “Chapter 1 - meta-learning basics and background,” in Meta-
edge, in a principled way, to learn more effective meta- Learning, L. Zou, Ed. Cambridge, MA, USA: Academic Press, 2023,
parameters? pp. 1–22.
[9] O. Vinyals, “Talk: Model vs optimization meta learning,” in Proc.
Addressing such questions could enhance the design and Int. conf. Neural Inf. Process. Syst., 2017. [Online]. Available: https:
performance of meta-learning algorithms, and help us tackle //evolution.ml/pdf/vinyals.pdf
increasingly complex and challenging learning problems. [10] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot
learning,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 4080–4090.
[11] R. Geng, B. Li, Y. Li, X. Zhu, P. Jian, and J. Sun, “Induction networks
for few-shot text classification,” in Proc. Conf. Empirical Methods Nat-
VII. CONCLUSION ural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process., 2019,
pp. 3904–3913.
In conclusion, the field of artificial intelligence (AI) has [12] B. Liang et al., “Few-shot aspect category sentiment analysis via meta-
witnessed significant advancements in developing specialized learning,” ACM Trans. Inf. Syst., vol. 41, no. 1, pp. 1–31, 2023.
systems for specific tasks. However, the pursuit of generality and [13] J. Gu, Y. Wang, Y. Chen, K. Cho, and V. O. Li, “Meta-learning for low-
resource neural machine translation,” in Proc. Conf. Empirical Methods
adaptability in AI across multiple tasks remains a fundamental Natural Lang. Process., 2020, pp. 3622–3631.
challenge. [14] A. Madotto, Z. Lin, C.-S. Wu, and P. Fung, “Personalizing dialogue
Meta-learning emerges as a promising research area that seeks agents via meta-learning,” in Proc. 57th Annu. Meeting Assoc. Comput.
Linguistics, 2019, pp. 5454–5459.
to bridge this gap by enabling algorithms to learn how to learn. [15] K. Qian and Z. Yu, “Domain adaptive dialog generation via meta learn-
Meta-learning algorithms offer the ability to learn from limited ing,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, 2019,
data, transfer knowledge across tasks and domains, and rapidly pp. 2639–2649.
[16] F. Mi, M. Huang, J. Zhang, and B. Faltings, “Meta-learning
adapt to new environments. This review article has explored vari- for low-resource natural language generation in task-oriented dia-
ous meta-learning approaches that have demonstrated promising logue systems,” in Proc. 28th Int. Joint Conf. Artif. Intell., 2019,
results in applications with scarce data. Nonetheless, numerous pp. 3151–3157.
[17] C. Finn, T. Yu, T. Zhang, P. Abbeel, and S. Levine, “One-shot visual
challenges and unanswered questions persist, calling for further imitation learning via meta-learning,” in Proc. Conf. Robot Learn., 2017,
investigation. pp. 357–368.
A key area of focus lies in unifying various fields such as [18] M. Wu, N. Goodman, C. Piech, and C. Finn, “ProtoTransformer:
A meta-learning approach to providing student feedback,” 2021,
meta-learning, self-supervised learning, domain generalization, arXiv:2107.14035.
and continual learning. Integrating and collaborating across [19] O. Sener and V. Koltun, “Multi-task learning as multi-objective optimiza-
these domains can generate synergistic advancements and foster tion,” in Proc. Adv. Neural Inf. Process. Syst., 2018.
[20] Z. Chen, V. Badrinarayanan, C.-Y. Lee, and A. Rabinovich, “GradNorm:
a more comprehensive approach to developing AI systems. By Gradient normalization for adaptive loss balancing in deep multitask
leveraging insights and techniques from these different areas, we networks,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 794–803.
can construct more versatile and adaptive algorithms capable of [21] A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty
to weigh losses for scene geometry and semantics,” in Proc. IEEE Conf.
learning from multiple tasks, generalizing across domains, and Comput. Vis. Pattern Recognit., 2018, pp. 7482–7491.
continuously accumulating knowledge. [22] I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, “Cross-stitch net-
This review article serves as a starting point for encouraging works for multi-task learning,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., 2016, pp. 3994–4003.
research in this direction. By examining the current state of [23] S. Ruder, J. Bingel, I. Augenstein, and A. Søgaard, “Latent multi-
meta-learning and illuminating the challenges and opportunities, task architecture learning,” in Proc. AAAI Conf. Artif. Intell., 2019,
we aim to inspire researchers to explore interdisciplinary con- pp. 4822–4829.
[24] Y. Gao, J. Ma, M. Zhao, W. Liu, and A. L. Yuille, “NDDR-CNN:
nections and contribute to the progress of meta-learning while Layerwise feature fusing in multi-task CNNs by neural discriminative
integrating it with other AI research fields. Through collective dimensionality reduction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
efforts and collaboration, we can surmount existing challenges Recognit., 2019, pp. 3205–3214.
[25] V. Dumoulin et al., “Feature-wise transformations,” Distill, 2018. [On-
and unlock the full potential of meta-learning to address a broad line]. Available: https://distill.pub/2018/feature-wise-transformations
spectrum of complex problems. [26] S. Liu, E. Johns, and A. J. Davison, “End-to-end multi-task learning
with attention,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
2019, pp. 1871–1880.
REFERENCES [27] M. Long, Z. Cao, J. Wang, and P. S. Yu, “Learning multiple tasks with
multilinear relationship networks,” in Proc. Adv. Neural Inf. Process.
[1] S. K. Karmaker, M. M. Hassan, M. J. Smith, L. Xu, C. Zhai, and K. Veera- Syst., 2017, pp. 1593–1602.
machaneni, “AutoML to date and beyond: Challenges and opportunities,” [28] A. Jaegle et al., “Perceiver IO: A general architecture for structured inputs
ACM Comput. Surv., vol. 54, no. 8, pp. 1–36, 2021. & outputs,” in Proc. Int. Conf. Learn. Representations, 2022, pp. 1–16.
[2] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for [29] C. Fifty, E. Amid, Z. Zhao, T. Yu, R. Anil, and C. Finn, “Efficiently
fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn, 2017, identifying task groupings for multi-task learning,” in Proc. Adv. Neural
pp. 1126–1135. Inf. Process. Syst., 2021, pp. 27 503–27 516.
[3] J. Beck et al., “A survey of meta-reinforcement learning,” [30] Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE Trans.
2023, arXiv:2301.08028. Knowl. Data Eng., vol. 34, no. 12, pp. 5586–5609, Dec. 2022.
[4] J. Vanschoren, Meta-Learning: A Survey. Berlin, Germany: Springer, [31] M. Crawshaw, “Multi-task learning with deep neural networks: A survey,”
2019, pp. 35–61. 2020, arXiv:2009.09796.
4776 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 7, JULY 2024
[32] M. Huh, P. Agrawal, and A. A. Efros, “What makes imagenet good for [61] F. Zhou, B. Wu, and Z. Li, “Deep meta-learning: Learning to learn in the
transfer learning?,” 2016, arXiv:1608.08614. concept space,” 2018, arXiv:1802.03596.
[33] J. D.M.-W. C. Kenton and L. K. Toutanova, “BERT: Pre-training of deep [62] A. Raghu, M. Raghu, S. Bengio, and O. Vinyals, “Rapid learning or
bidirectional transformers for language understanding,” in Proc. Conf. feature reuse? Towards understanding the effectiveness of MAML,” in
North Amer. Chapter Assoc. Comput. Linguistics - Hum. Lang. Technol., Proc. Int. Conf. Learn. Representations, 2023, pp. 1–12.
2019, pp. 4171–4186. [63] J. Oh, H. Yoo, C. Kim, and S.-Y. Yun, “BOIL: Towards representation
[34] A. Chowdhery et al., “PaLM: Scaling language modeling with pathways,” change for few-shot learning,” in Proc. Int. Conf. Learn. Representations,
2022, arXiv:2204.02311. 2021, pp. 1–12.
[35] H. Touvron et al., “LLaMA: Open and efficient foundation language [64] A. Antoniou, H. Edwards, and A. Storkey, “How to train your MAML,”
models,” 2023, arXiv:2302.13971. in Proc. Int. Conf. Learn. Representations, 2018, pp. 1–10.
[36] OpenAI, “GPT-4 technical report,” 2023, arXiv:2303.08774. [65] L. Zintgraf, K. Shiarli, V. Kurin, K. Hofmann, and S. Whiteson, “Fast
[37] A. Kumar, A. Raghunathan, R. M. Jones, T. Ma, and P. Liang, context adaptation via meta-learning,” in Proc. Int. Conf. Mach. Learn.,
“Fine-tuning can distort pretrained features and underperform out-of- 2019, pp. 7693–7702.
distribution,” in Proc. Int. Conf. Learn. Representations, 2022, pp. 1–15. [66] M. Hiller, M. Harandi, and T. Drummond, “On enforcing better condi-
[38] Y. Lee et al., “Surgical fine-tuning improves adaptation to distribution tioned meta-learning for rapid few-shot adaptation,” in Proc. Adv. Neural
shifts,” in Proc. Workshop Distrib. Shifts: Connecting Methods Appl., Inf. Process. Syst., 2022, pp. 4059–4071.
2023, pp. 1–14. [67] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
[39] J. Phang, T. Févry, and S. R. Bowman, “Sentence encoders on MA, USA: MIT Press, 2016.
stilts: Supplementary training on intermediate labeled-data tasks,” [68] A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning
2018, arXiv:1811.01088. algorithms,” 2018, arXiv:1803.02999.
[40] J. Howard and S. Ruder, “Universal language model fine-tuning for text [69] L. Bertinetto, J. F. Henriques, P. H. Torr, and A. Vedaldi, “Meta-learning
classification,” in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, with differentiable closed-form solvers,” in Proc. Int. Conf. Learn. Rep-
2018, pp. 328–339. resentations, 2019, pp. 1–13.
[41] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, [70] K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with
“Meta-learning with memory-augmented neural networks,” in Proc. Int. differentiable convex optimization,” in Proc. IEEE/CVF Conf. Comput.
Conf. Mach. Learn., 2016, pp. 1842–1850. Vis. Pattern Recognit., 2019, pp. 10657–10665.
[42] N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel, “A simple neural [71] A. Rajeswaran, C. Finn, S. M. Kakade, and S. Levine, “Meta-learning
attentive meta-learner,” in Proc. Int. Conf. Learn. Representations, 2018, with implicit gradients,” in Proc. Adv. Neural Inf. Process. Syst., 2019.
pp. 1–12. [72] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE
[43] T. Munkhdalai and H. Yu, “Meta networks,” in Proc. Int. Conf. Mach. Trans. Inf. Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967.
Learn., 2017, pp. 2554–2563. [73] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The
[44] M. Garnelo et al., “Conditional neural processes,” in Proc. Int. Conf. unreasonable effectiveness of deep features as a perceptual metric,” in
Mach. Learn., 2018, pp. 1704–1713. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 586–595.
[45] T. Brown et al., “Language models are few-shot learners,” in Proc. Adv. [74] G. Koch et al., “Siamese neural networks for one-shot image recognition,”
Neural Inf. Process. Syst., 2020, pp. 1877–1901. in Proc. ICML Deep Learn. Workshop, 2015, pp. 1–8.
[46] S. Garg, D. Tsipras, P. S. Liang, and G. Valiant, ”What can transformers [75] O. Vinyals et al., “Matching networks for one shot learning,” in Proc.
learn in-context? A case study of simple function classes,” in Proc. Adv. Adv. Neural Inf. Process. Syst., 2016, pp. 3637–3645.
Neural Inf. Process. Syst., 2022, pp. 30583–30598. [76] S. Laenen and L. Bertinetto, “On episodes, prototypical networks,
[47] L. Kirsch, J. Harrison, J. Sohl-Dickstein, and L. Metz, “General-purpose and few-shot learning,” in Proc. Adv. Neural Inf. Process. Syst.,
in-context learning by meta-learning transformers,” in Proc. Workshop pp. 24581–24592, 2021.
Distrib. Shifts: Connecting Methods Appl., 2022, pp. 1–14. [77] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales,
[48] E. Akyürek, D. Schuurmans, J. Andreas, T. Ma, and D. Zhou, “Learning to compare: Relation network for few-shot learning,” in Proc.
“What learning algorithm is in-context learning? Investigations with IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1199–1208.
linear models,” in proc. 11th Int. Conf. Learn. Representations, [78] V. Garcia and J. Bruna, “Few-shot learning with graph neural networks,”
2022, pp. 1–12. in Proc. Int. Conf. Learn. Representations, 2018, pp. 1–12.
[49] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. [79] K. Allen, E. Shelhamer, H. Shin, and J. Tenenbaum, “Infinite mixture
Process. Syst., 2017, pp. 6000–6010. prototypes for few-shot learning,” in Proc. Int. Conf. Mach. Learn., 2019,
[50] S. Ravi and H. Larochelle, “Optimization as a model for few-shot pp. 232–241.
learning,” in Proc. Int. Conf. Learn. Representations, 2017, pp. 1–11. [80] X. Jiang, M. Havaei, F. Varno, G. Chartrand, N. Chapados, and S. Matwin,
[51] M. Andrychowicz et al., “Learning to learn by gradient descent “Learning to learn with conditional class dependencies,” in Proc. Int.
by gradient descent,” in Proc. Adv. Neural Inf. Process. Syst., Conf. Learn. Representations, 2019, pp. 1–11.
2016, pp. 3988–3996. [81] A. A. Rusu et al., “Meta-learning with latent embedding optimization,”
[52] K. Li and J. Malik, “Learning to optimize,” in Proc. Int. Conf. Learn. in Proc. Int. Conf. Learn. Representations, 2019.
Representations, 2017. [82] E. Triantafillou et al., “Meta-dataset: A dataset of datasets for learning
[53] O. Wichrowska et al., “Learned optimizers that scale and generalize,” in to learn from few examples,” in Proc. Int. Conf. Learn. Representations,
Proc. Int. Conf. Mach. Learn., 2017, pp. 3751–3760. 2019.
[54] A. Shaw, W. Wei, W. Liu, L. Song, and B. Dai, “Meta architecture search,” [83] D. Wang, Y. Cheng, M. Yu, X. Guo, and T. Zhang, “A hybrid approach
in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 11227–11237. with optimization-based and metric-based meta-learner for few-shot
[55] D. Lian et al., “Towards fast adaptation of neural architectures with meta learning,” Neurocomputing, vol. 349, pp. 202–211, 2019.
learning,” in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–13. [84] Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola, “Re-
[56] L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil, “Bilevel thinking few-shot image classification: A good embedding is all you
programming for hyperparameter optimization and meta-learning,” in need?,” in Proc. 16th Eur. Conf. Comput. Vis., Glasgow, U.K., 2020,
Proc. Int. Conf. Mach. Learn., 2018, pp. 1568–1577. pp. 266–282.
[57] T. Kim, J. Yoon, O. Dia, S. Kim, Y. Bengio, and S. Ahn, “Bayesian [85] Y. Chen, Z. Liu, H. Xu, T. Darrell, and X. Wang, ”Meta-baseline: Ex-
model-agnostic meta-learning,” in Proc. Adv. Neural Inf. Process. Syst., ploring simple meta-learning for few-shot learning,” in Proc. IEEE/CVF
2018, pp. 1–11. Int. Conf. Comput. Vis., 2021, pp. 9062–9071.
[58] C. Finn and S. Levine, “Meta-learning and universality: Deep represen- [86] Y. Guo et al., “A broader study of cross-domain few-shot learning,” in
tations and gradient descent can approximate any learning algorithm,” in Proc. 16th Eur. Conf. Comput. Vis., Glasgow, U.K., 2020, pp. 124–141.
Proc. Int. Conf. Learn. Representations, 2018. [87] W.-Y. Chen, Y.-C. Liu, Z. Kira, Y.-C. F. Wang, and J.-B. Huang, “A
[59] Z. Li, F. Zhou, F. Chen, and H. Li, “Meta-SGd: Learning to learn quickly closer look at few-shot classification,” in Proc. Int. Conf. Learn. Repre-
for few-shot learning,” 2017, arXiv:1707.09835. sentations, 2019, pp. 1–11.
[60] H. S. Behl, A. G. Baydin, and P. H. Torr, “Alpha MAML: Adaptive [88] R. Vuorio, S.-H. Sun, H. Hu, and J. J. Lim, “Multimodal model-agnostic
model-agnostic meta-learning,” in Proc. 6th ICML Workshop Automated meta-learning via task-aware modulation,” in Proc. Adv. Neural Inf.
Mach. Learn., 2019, pp. 1–10. Process. Syst., 2019, pp. 1–12.
VETTORUZZO et al.: ADVANCES AND CHALLENGES IN META-LEARNING: A TECHNICAL REVIEW 4777
[89] G. Denevi, M. Pontil, and C. Ciliberto, “The advantage of conditional [114] A. Ghosh, J. Chung, D. Yin, and K. Ramchandran, “An efficient frame-
meta-learning for biased regularization and fine tuning,” in Proc. Adv. work for clustered federated learning,” in Proc. Adv. Neural Inf. Process.
Neural Inf. Process. Syst., 2020, pp. 964–974. Syst., 2020, pp. 19586–19597.
[90] H. Yao, Y. Wei, J. Huang, and Z. Li, “Hierarchically structured meta- [115] M. Duan et al., “Flexible clustered federated learning for client-level data
learning,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 7045–7054. distribution shift,” IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 11,
[91] W. Jiang, J. Kwok, and Y. Zhang, “Subspace learning for effective meta- pp. 2661–2674, Nov. 2022.
learning,” in Proc. Int. Conf. Mach. Learn., 2022, pp. 10177–10194. [116] F. Sattler, K.-R. Müller, and W. Samek, “Clustered federated learn-
[92] G. Jerfel, E. Grant, T. Griffiths, and K. A. Heller, “Reconciling meta- ing: Model-agnostic distributed multitask optimization under privacy
learning and continual learning with online mixtures of tasks,” Adv. constraints,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 8,
Neural Inf. Process. Syst., 2019, pp. 9122–9133. pp. 3710–3722, Aug. 2021.
[93] P. Zhou, Y. Zou, X.-T. Yuan, J. Feng, C. Xiong, and S. Hoi, [117] L. Yang, J. Huang, W. Lin, and J. Cao, “Personalized federated learning
“Task similarity aware meta learning: Theory-inspired improvement on non-IID data via group-based meta-learning,” ACM Trans. Knowl.
on MAML,” in Proc. 37th Conf. Uncertainty Artif. Intell., 2021, Discov. Data, vol. 17, no. 4, pp. 1–20, 2023.
pp. 23–33. [118] Y. Jiang, J. Konečn, K. Rush, and S. Kannan, “Improving fed-
[94] A. Vettoruzzo, M.-R. Bouguelia, and T. Rögnvaldsson, “Meta- erated learning personalization via model agnostic meta learning,”
learning from multimodal task distributions using multiple sets of 2019, arXiv:1909.12488.
meta-parameters,” in Proc. Int. Joint Conf. Neural Netw., 2023, [119] A. Fallah, A. Mokhtari, and A. Ozdaglar, “Personalized federated learn-
pp. 1–8. ing with theoretical guarantees: A model-agnostic meta-learning ap-
[95] H. Li, W. Dong, X. Mei, C. Ma, F. Huang, and B.-G. Hu, “LGM-Net: proach,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 3557–3568.
Learning to generate matching networks for few-shot learning,” in Proc. [120] F. Chen, M. Luo, Z. Dong, Z. Li, and X. He, “Federated
Int. Conf. Mach. Learn., 2019, pp. 3825–3834. meta-learning with fast convergence and efficient communication,”
[96] L. Liu, W. L. Hamilton, G. Long, J. Jiang, and H. Larochelle, “A universal 2018, arXiv:1802.07876.
representation transformer layer for few-shot image classification,” in [121] M. Khodak, M.-F. F. Balcan, and A. S. Talwalkar, “Adaptive gradient-
Proc. Int. Conf. Learn. Representations, 2020, pp. 1–11. based meta-learning methods,” in Proc. Adv. Neural Inf. Process. Syst.,
[97] W.-H. Li, X. Liu, and H. Bilen, “Universal representation learning from 2019, pp. 5917–5928.
multiple domains for few-shot classification,” in Proc. IEEE/CVF Int. [122] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework
Conf. Comput. Vis., 2021, pp. 9526–9535. for contrastive learning of visual representations,” in Proc. Int. Conf.
[98] J. Requeima, J. Gordon, J. Bronskill, S. Nowozin, and R. E. Turner, “Fast Mach. Learn., 2020, pp. 1597–1607.
and flexible multi-task classification using conditional neural adaptive [123] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for
processes,” Adv. Neural Inf. Process. Syst., 2019, pp. 7959–7970. unsupervised visual representation learning,” in Proc. IEEE/CVF Conf.
[99] N. Dvornik, C. Schmid, and J. Mairal, “Selecting relevant features from Comput. Vis. Pattern Recognit., 2020, pp. 9729–9738.
a multi-domain representation for few-shot classification,” in Proc. 16th [124] J.-B. Grill et al., “Bootstrap your own latent-a new approach to self-
Eur. Conf. Comput. Vis., Glasgow, U.K., 2020, pp. 769–786. supervised learning,” in Proc. Adv. Neural Inf. Process. Syst., 2020,
[100] E. Triantafillou, H. Larochelle, R. Zemel, and V. Dumoulin, “Learning pp. 21271–21284.
a universal template for few-shot dataset generalization,” in Proc. Int. [125] A. V. D. Oord, Y. Li, and O. Vinyals, “Representation learning with
Conf. Mach. Learn., 2021, pp. 10424–10433. contrastive predictive coding,” 2018, arXiv:1807.03748.
[101] W.-Y. Lee, J.-Y. Wang, and Y.-C. F. Wang, “Domain-agnostic meta- [126] K. Hsu, S. Levine, and C. Finn, “Unsupervised learning via meta-
learning for cross-domain few-shot classification,” in Proc. IEEE Int. learning,” in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–14.
Conf. Acoust. Speech Signal Process., 2022, pp. 1715–1719. [127] J. Donahue, P. Krähenbühl, and T. Darrell, “Adversarial feature learning,”
[102] B. Kang and J. Feng, “Transferable meta learning across domains,” in in Proc. Int. Conf. Learn. Representations, 2017, pp. 1–12.
Proc. Conf. Uncertainty Artif. Intell., 2018, pp. 177–187. [128] J. Donahue and K. Simonyan, “Large scale adversarial representation
[103] Y. Li and J. Zhang, “Semi-supervised meta-learning for cross-domain learning,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 10542–
few-shot intent classification,” in Proc. 1st Workshop Meta Learn. Its 10552.
Appl. Natural Lang. Process., 2021, pp. 67–75. [129] M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for
[104] W. Yuan, Z. Zhang, C. Wang, H. Song, Y. Xie, and L. Ma, “Task-level self- unsupervised learning of visual features,” in Proc. Eur. Conf. Comput.
supervision for cross-domain few-shot learning,” in Proc. AAAI Conf. Vis., 2018, pp. 132–149.
Artif. Intell., 2022, pp. 3215–3223. [130] S. Khodadadeh, L. Boloni, and M. Shah, “Unsupervised meta-learning
[105] T. Bansal, R. Jha, T. Munkhdalai, and A. McCallum, “Self- for few-shot image classification,” in Proc. Adv. Neural Inf. Processi.
supervised meta-learning for few-shot natural language classification Syst., 2019, pp. 10132–10142.
tasks,” in Proc. Conf. Empirical Methods Natural Lang. Process., [131] S. Khodadadeh, S. Zehtabian, S. Vahidian, W. Wang, B. Lin, and L.
2020, pp. 522–534. Bölöni, “Unsupervised meta-learning through latent-space interpolation
[106] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Ar- in generative models,” in Proc. Int. Conf. Learn. Representations, 2021.
cas, “Communication-efficient learning of deep networks from de- [132] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “Autoaug-
centralized data,” in Proc. Int. Conf. Artif. Intell. Statist., 2017, ment: Learning augmentation strategies from data,” in Proc. IEEE/CVF
pp. 1273–1282. Conf. Comput. Vis. Pattern Recognit., 2019, pp. 113–123.
[107] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, [133] S. Nair, A. Rajeswaran, V. Kumar, C. Finn, and A. Gupta, “R3M: A
“Federated optimization in heterogeneous networks,” in Proc. Mach. universal visual representation for robot manipulation,” in Proc. Conf.
Learn. Syst., vol. 2, pp. 429–450, 2020. Robot Learn., 2023, pp. 892–909.
[108] Q. Li, B. He, and D. Song, “Model-contrastive federated learn- [134] A. Tamkin, M. Wu, and N. Goodman, “Viewmaker networks: Learning
ing,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, views for unsupervised representation learning,” in Proc. Int. Conf. Learn.
pp. 10713–10722. Representations, 2020.
[109] S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, [135] D. B. Lee, D. Min, S. Lee, and S. J. Hwang, “Meta-GMVAE: Mixture
“Scaffold: Stochastic controlled averaging for federated learning,” in of Gaussian VAE for unsupervised meta-learning,” in Proc. Int. Conf.
Proc. Int. Conf. Mach. Learn., 2020, pp. 5132–5143. Learn. Representations, 2021, pp. 1–13.
[110] F. Hanzely and P. Richtárik, “Federated learning of a mixture of global [136] D. Kong, B. Pang, and Y. N. Wu, “Unsupervised meta-learning via
and local models,” 2020, arXiv:2002.05516. latent space energy-based model of symbol vector coupling,” in Proc. 5th
[111] C. T. Dinh, N. Tran, and J. Nguyen, “Personalized federated learn- Workshop Meta- Learn. Conf. Neural Inf. Process. Syst., 2021, pp. 1–9.
ing with Moreau envelopes,” Adv. Neural Inf. Process. Syst., 2020, [137] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in
pp. 21394–21405. Proc. Int. Conf. Learn. Representations, 2014, pp. 1–9.
[112] T. Li, S. Hu, A. Beirami, and V. Smith, “Ditto: Fair and robust federated [138] Y. W. Teh, M. Welling, S. Osindero, and G. E. Hinton, “Energy-based
learning through personalization,” in Proc. Int. Conf. Mach. Learn., 2021, models for sparse overcomplete representations,” J. Mach. Learn. Res.,
pp. 6357–6368. vol. 4, no. Dec, pp. 1235–1260, 2003.
[113] J. Xu, X. Tong, and H. Shao-Lun, “Personalized federated learning with [139] R. Ni, M. Shu, H. Souri, M. Goldblum, and T. Goldstein, “The close
feature alignment and classifier collaboration,” in Proc. Int. Conf. Learn. relationship between contrastive learning and meta-learning,” in Proc.
Representations, 2023, pp. 1–14. Int. Conf. Learn. Representations, 2021, pp. 1–12.
4778 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 7, JULY 2024
[140] Z. Yang, J. Wang, and Y. Zhu, “Few-shot classification with con- [165] Y. Li, Y. Yang, W. Zhou, and T. Hospedales, “Feature-critic networks for
trastive learning,” in Proc. 17th Eur. Conf. Comput. Vis., 2022, heterogeneous domain generalization,” in Proc. Int. Conf. Mach. Learn.,
pp. 293–309. 2019, pp. 3915–3924.
[141] D. B. Lee et al., “Self-supervised set representation learning for unsuper- [166] D. Li, Y. Yang, Y.-Z. Song, and T. Hospedales, “Learning to generalize:
vised meta-learning,” in Proc. Int. Conf. Learn. Representations, 2023, Meta-learning for domain generalization,” in Proc. AAAI Conf. Artif.
pp. 1–13. Intell., 2018, pp. 3490–3497.
[142] H. Jang, H. Lee, and J. Shin, “Unsupervised meta-learning via few-shot [167] Y. Balaji, S. Sankaranarayanan, and R. Chellappa, “MetaReg: Towards
pseudo-supervised contrastive learning,” in Proc. Int. Conf. Learn. Rep- domain generalization using meta-regularization,” in Proc. Adv. Neural
resentations, 2023, pp. 1–13. Inf. Process. Syst., 2018, pp. 1006–1016.
[143] C. Wu, F. Wu, and Y. Huang, “One teacher is enough? Pre-trained [168] Y. Shu, Z. Cao, C. Wang, J. Wang, and M. Long, “Open domain gen-
language model distillation from multiple teachers,” inProc. Findings eralization with domain-augmented meta-learning,” in Proc. IEEE/CVF
Annu. Meeting Assoc. Comput. Linguistics, 2021, pp. 4408–4413. Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9624–9633.
[144] T. Bansal, R. Jha, and A. McCallum, “Learning to few-shot learn across [169] K. Chen, D. Zhuang, and J. M. Chang, “Discriminative adversarial do-
diverse natural language classification tasks,” in Proc. Int. Conf. Comput. main generalization with meta-learning based cross-domain validation,”
Linguistics, 2020, pp. 5108–5123. Neurocomputing, vol. 467, pp. 418–426, 2022.
[145] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, “Deep domain [170] G. M. van de Ven, T. Tuytelaars, and A. S. Tolias, “Three types of
confusion: Maximizing for domain invariance,” 2014, arXiv:1412.3474. incremental learning,” Nature Mach. Intell., vol. 4, no. 12, pp. 1185–1197,
[146] Y. Ganin et al., “Domain-adversarial training of neural networks,” J. 2022.
Mach. Learn. Res., vol. 17, no. 1, pp. 2096–2030, 2016. [171] D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for continual
[147] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image learning,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 6470–6479.
translation using cycle-consistent adversarial networks,” in Proc. IEEE [172] A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, “Efficient
Int. Conf. Comput. Vis., 2017, pp. 2223–2232. lifelong learning with A-GEM,” in Proc. Int. Conf. Learn. Representa-
[148] K. Rao, C. Harris, A. Irpan, S. Levine, J. Ibarz, and M. Khansari, tions, 2019, pp. 1–12.
“RL-CycleGAN: Reinforcement learning aware simulation-to-real,” [173] S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “iCaRL:
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, Incremental classifier and representation learning,” in Proc. IEEE Conf.
pp. 11 157–11 166. Comput. Vis. Pattern Recognit., 2017, pp. 2001–2010.
[149] L. Smith, N. Dhawan, M. Zhang, P. Abbeel, and S. Levine, “Avid: [174] R. Aljundi, M. Rohrbach, and T. Tuytelaars, “Selfless sequential learn-
Learning multi-stage tasks via pixel-level translation of human videos,” ing,” in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–12.
2019, arXiv:1912.04443. [175] J. Kirkpatrick et al., “Overcoming catastrophic forgetting in neural net-
[150] J. Hoffman et al., “Cycada: Cycle-consistent adversarial domain adapta- works,” Proc. Nat. Acad. Sci., vol. 114, no. 13, pp. 3521–3526, 2017.
tion,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 1989–1998. [176] J. Serra, D. Suris, M. Miron, and A. Karatzoglou, “Overcoming catas-
[151] H. Zhao, S. Zhang, G. Wu, J. M. Moura, J. P. Costeira, and G. J. Gordon, trophic forgetting with hard attention to the task,” in Proc. Int. Conf.
“Adversarial multiple source domain adaptation,” in Proc. Adv. Neural Mach. Learn., 2018, pp. 4548–4557.
Inf. Process. Syst., 2018, pp. 8568–8579. [177] X. Li, Y. Zhou, T. Wu, R. Socher, and C. Xiong, “Learn to grow:
[152] L. T. Nguyen-Meidine, A. Belal, M. Kiran, J. Dolz, L.-A. Blais-Morin, A continual structure learning framework for overcoming catastrophic
and E. Granger, “Unsupervised multi-target domain adaptation through forgetting,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 3925–3934.
knowledge distillation,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. [178] Q. Pham, C. Liu, D. Sahoo, and H. Steven, “Contextual transformation
Vis., 2021, pp. 1339–1347. networks for online continual learning,” in Proc. Int. Conf. Learn. Rep-
[153] Z. Chen, J. Zhuang, X. Liang, and L. Lin, “Blending-target domain resentations, 2021, pp. 1–13.
adaptation by adversarial meta-adaptation networks,” in Proc. IEEE/CVF [179] A. A. Rusu et al., “Progressive neural networks,” 2016,
Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2248–2257. arXiv:1606.04671.
[154] B. Gholami, P. Sahu, O. Rudovic, K. Bousmalis, and V. Pavlovic, [180] S. Beaulieu et al., “Learning to continually learn,” in Proc. 24th Eur.
“Unsupervised multi-target domain adaptation: An information theo- Conf. Artif. Intell., 2020, pp. 992–1001.
retic approach,” IEEE Trans. Image Process., vol. 29, pp. 3993–4002, [181] M. Riemer et al., “Learning to learn without forgetting by maximizing
2020. transfer and minimizing interference,” in Proc. Int. Conf. Learn. Repre-
[155] M. Zhang, H. Marklund, N. Dhawan, A. Gupta, S. Levine, and C. Finn, sentations, 2019, pp. 1–14.
“Adaptive risk minimization: Learning to adapt to domain shift,” in Proc. [182] K. Javed and M. White, “Meta-learning representations for continual
Adv. Neural Inf. Process. Syst., 2021, pp. 23 664–23 678. learning,” in Proc. Adv. Neural Inf. Process. Sys., 2019, pp. 1820–1830.
[156] W. Yang, C. Yang, S. Huang, L. Wang, and M. Yang, “Few-shot unsu- [183] G. Gupta, K. Yadav, and L. Paull, “Look-ahead meta learning for con-
pervised domain adaptation via meta learning,” in Proc. IEEE Int. Conf. tinual learning,” in Proc. Adv. Neural Inf. Process. Syst., pp. 11 588–11
Multimedia Expo, 2022, pp. 1–6. 598, 2020.
[157] Y. Feng et al., “Similarity-based meta-learning network with adversarial [184] V. Prabhu, A. Kannan, M. Ravuri, M. Chaplain, D. Sontag, and X.
domain adaptation for cross-domain fault identification,” Knowl.-Based Amatriain, “Few-shot learning for dermatological disease diagnosis,” in
Syst., vol. 217, 2021, Art. no. 106829. Proc. Mach. Learn. Healthcare Conf., 2019, pp. 532–552.
[158] A. Sicilia, X. Zhao, and S. J. Hwang, “Domain adversarial neural net- [185] P. P. Liang, P. Wu, L. Ziyin, L.-P. Morency, and R. Salakhutdinov,
works for domain generalization: When it works and how to improve,” “Cross-modal generalization: Learning in low resource modalities via
Mach. Learn., vol. 112, pp. 2685–2721, 2023. meta-alignment,” in Proc. 29th ACM Int. Conf. Multimedia, 2021,
[159] B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep pp. 2680–2689.
domain adaptation,” in Proc. Eur. Conf. Comput. Vis., Amsterdam, The [186] J.-B. Alayrac et al., “Flamingo: A visual language model for few-
Netherlands, 2016, pp. 443–450. shot learning,” in Proc. Adv. Neural Inf. Process. Syst., 2022,
[160] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond pp. 23 716–23 736.
empirical risk minimization,” in Proc. Int. Conf. Learn. Representations, [187] S. Reed et al., “A generalist agent,” Trans. Mach. Learn. Res., 2022,
2018, pp. 1–13. pp. 1–42.
[161] V. Verma et al., “Manifold mixup: Better representations by in- [188] M. Rußwurm, S. Wang, M. Korner, and D. Lobell, “Meta-learning for
terpolating hidden states,” in Proc. Int. Conf. Mach. Learn., 2019, few-shot land cover classification,” in Proc. IIII/CVF Conf. Comput. Vis.
pp. 6438–6447. Pattern Recognit. Workshops, 2020, pp. 200–201.
[162] H. Yao et al., “Improving out-of-distribution robustness via se- [189] T. Yu et al., “One-shot imitation from observing humans via domain-
lective augmentation,” in Proc. Int. Conf. Mach. Learn., 2022, adaptive meta-learning,” Robotics: Science and Systems XIV, 2018,
pp. 25 407–25 437. pp. 1–10.
[163] Q. Dou, D. Coelho de Castro, K. Kamnitsas, and B. Glocker, “Domain [190] C. Q. Nguyen, C. Kreatsoulas, and K. M. Branson, “Meta-learning GNN
generalization via model-agnostic learning of semantic features,” in Proc. initializations for low-resource molecular property prediction,” in Proc.
Adv. Neural Inf. Process. Syst., 2019, pp. 6450–6461. 4th Lifelong Mach. Learn. Workshop, 2020, pp. 1–6.
[164] D. Li, J. Zhang, Y. Yang, C. Liu, Y.-Z. Song, and T. M. Hospedales, [191] L.-Y. Gui, Y.-X. Wang, D. Ramanan, and J. M. Moura, “Few-shot human
“Episodic training for domain generalization,” in Proc. IEEE/CVF Int. motion prediction via meta-learning,” in Proc. Eur. Conf. Comput. Vis.,
Conf. Comput. Vis., 2019, pp. 1446–1455. 2018, pp. 432–450.
VETTORUZZO et al.: ADVANCES AND CHALLENGES IN META-LEARNING: A TECHNICAL REVIEW 4779
[192] I. Ullah et al., “Meta-album: Multi-domain meta-dataset for few-shot Joaquin Vanschoren is an associate professor
image classification,” in Proc. Adv. Neural Inf. Process. Syst., 2022, of machine learning with the Eindhoven Uni-
pp. 3232–3247. versity of Technology (TU/e). His research fo-
[193] J. Bornschein et al., “Nevis’22: A stream of 100 tasks sampled from 30 cuses on understanding and automating machine
years of computer vision research,” 2022, arXiv:2211.11747. learning, meta-learning, and continual learning. He
[194] T. Yu et al., “Meta-world: A benchmark and evaluation for multi-task and founded and leads OpenML.org, an open sci-
meta reinforcement learning,” Conf. Robot Learn., 2020, pp. 1094–1100. ence platform for machine learning research used
[195] X. Zhai et al., “A large-scale study of representation learning with the all over the world. He obtained several demon-
visual task adaptation benchmark,” 2019, arXiv:1910.04867. stration and application awards, the Dutch Data
[196] A. R. Zamir, A. Sax, W. Shen, L. J. Guibas, J. Malik, and S. Savarese, Prize, and has been invited speaker at ECDA,
“Taskonomy: Disentangling task transfer learning,” in Proc. IEEE Conf. StatComp, AutoML@ICML, CiML@NIPS, Repro-
Comput. Vis. Pattern Recognit., 2018, pp. 3712–3722. ducibility@ICML, DEEM@SIGMOD and many other conferences. He also
[197] L. Li et al., “Value: A multi-task benchmark for video-and-language co-organized machine learning conferences (e.g. ECMLPKDD 2013, LION
understanding evaluation,” in Proc. 35th Conf. Neural Inf. Process. Syst. 2016) and many workshops, including the AutoML workshop series at ICML.
Datasets Benchmarks Track, 2021, pp. 1–13.
[198] A. Srivastava et al., “Beyond the imitation game: Quantifying and extrap-
olating the capabilities of language models,” Trans. Mach. Learn. Res.,
2023, pp. 1–42.
[199] L. Metz, N. Maheswaranathan, C. D. Freeman, B. Poole, and J. Sohl-
Dickstein, “Tasks, stability, architecture, and compute: Training more Thorsteinn Rögnvaldsson (Senior Member, IEEE)
effective learned optimizers, and using them to train themselves,” received the PhD degree in theoretical physics from
2020, arXiv:2009.11243. Lund University, 1994. He is a professor of computer
[200] J. Chen, X.-M. Wu, Y. Li, Q. Li, L.-M. Zhan, and F.-L. Chung, “A closer science with Halmstad University, Sweden. From
look at the training strategy for modern meta-learning,” in Proc. Adv. 2012, he started and directed the Center for Ap-
Neural Inf. Process. Syst., 2020, pp. 396–406. plied Intelligent Systems Research (CAISR), Halm-
[201] J. Lucas, M. Ren, I. R. K. KAMENI, T. Pitassi, and R. Zemel, “Theoretical stad University. He did his postdoc with the Oregon
bounds on estimation error for meta-learning,” in Proc. Int. Conf. Learn. Graduate Institute. His research interests include au-
Representations, 2020, pp. 1–12. tonomous knowledge creation, machine learning, and
[202] J. Guan and Z. Lu, “Fast-rate pac-Bayesian generalization bounds for self-organization.
meta-learning,” in Proc. Int. Conf. Mach. Learn., 2022, pp. 7930–7948.