0% found this document useful (0 votes)
65 views

Lake Et Al 2017 BBS

Uploaded by

Varoon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Lake Et Al 2017 BBS

Uploaded by

Varoon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

BEHAVIORAL AND BRAIN SCIENCES (2017), Page 1 of 72

doi:10.1017/S0140525X16001837, e253

Building machines that learn and


think like people

Brenden M. Lake
Department of Psychology and Center for Data Science, New York University,
New York, NY 10011
[email protected]
http://cims.nyu.edu/~brenden/

Tomer D. Ullman
Department of Brain and Cognitive Sciences and The Center for Brains, Minds
and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139
[email protected]
http://www.mit.edu/~tomeru/

Joshua B. Tenenbaum
Department of Brain and Cognitive Sciences and The Center for Brains, Minds
and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139
[email protected]
http://web.mit.edu/cocosci/josh.html

Samuel J. Gershman
Department of Psychology and Center for Brain Science, Harvard University,
Cambridge, MA 02138, and The Center for Brains, Minds and Machines,
Massachusetts Institute of Technology, Cambridge, MA 02139
[email protected]
http://gershmanlab.webfactional.com/index.html

Abstract: Recent progress in artificial intelligence has renewed interest in building systems that learn and think like people. Many
advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board
games, achieving performance that equals or even beats that of humans in some respects. Despite their biological inspiration and
performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science
suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what
they learn and how they learn it. Specifically, we argue that these machines should (1) build causal models of the world that support
explanation and understanding, rather than merely solving pattern recognition problems; (2) ground learning in intuitive theories of
physics and psychology to support and enrich the knowledge that is learned; and (3) harness compositionality and learning-to-learn to
rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes toward
these goals that can combine the strengths of recent neural network advances with more structured cognitive models.

1. Introduction ConvNets continue to dominate, recently approaching


human-level performance on some object recognition
Artificial intelligence (AI) has been a story of booms and benchmarks (He et al. 2016; Russakovsky et al. 2015;
busts, yet by any traditional measure of success, the last Szegedy et al. 2014). In automatic speech recognition,
few years have been marked by exceptional progress. hidden Markov models (HMMs) have been the leading
Much of this progress has come from recent advances in approach since the late 1980s (Juang & Rabiner 1990),
“deep learning,” characterized by learning large neural yet this framework has been chipped away piece by piece
network-style models with multiple layers of representation and replaced with deep learning components (Hinton
(see Glossary in Table 1). These models have achieved et al. 2012). Now, the leading approaches to speech recog-
remarkable gains in many domains spanning object recog- nition are fully neural network systems (Graves et al. 2013;
nition, speech recognition, and control (LeCun et al. Hannun et al. 2014). Ideas from deep learning have also
2015; Schmidhuber 2015). In object recognition, Krizhev- been applied to learning complex control problems. Mnih
sky et al. (2012) trained a deep convolutional neural et al. (2015) combined ideas from deep learning and rein-
network (ConvNet [LeCun et al. 1989]) that nearly forcement learning to make a “deep reinforcement learn-
halved the previous state-of-the-art error rate on the ing” algorithm that learns to play large classes of simple
most challenging benchmark to date. In the years since, video games from just frames of pixels and the game

© Cambridge University Press 2017 0140-525X/17 1


Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people
score, achieving human- or superhuman-level performance In this article, we view this excitement as an opportunity
on many of them (see also Guo et al. 2014; Schaul et al. to examine what it means for a machine to learn or think
2016; Stadie et al. 2016). like a person. We first review some of the criteria previously
These accomplishments have helped neural networks offered by cognitive scientists, developmental psycholo-
regain their status as a leading paradigm in machine learn- gists, and artificial intelligence (AI) researchers. Second,
ing, much as they were in the late 1980s and early 1990s. we articulate what we view as the essential ingredients for
The recent success of neural networks has captured atten- building a machine that learns or thinks like a person, syn-
tion beyond academia. In industry, companies such as thesizing theoretical ideas and experimental data from
Google and Facebook have active research divisions explor- research in cognitive science. Third, we consider contem-
ing these technologies, and object and speech recognition porary AI (and deep learning in particular) in the light of
systems based on deep learning have been deployed in these ingredients, finding that deep learning models have
core products on smart phones and the web. The media yet to incorporate many of them, and so may be solving
have also covered many of the recent achievements of some problems in different ways than people do. We end
neural networks, often expressing the view that neural net- by discussing what we view as the most plausible paths
works have achieved this recent success by virtue of their toward building machines that learn and think like
brain-like computation and, therefore, their ability to people. This includes prospects for integrating deep learn-
emulate human learning and human cognition. ing with the core cognitive ingredients we identify, inspired
in part by recent work fusing neural networks with lower-
level building blocks from classic psychology and computer
science (attention, working memory, stacks, queues) that
have traditionally been seen as incompatible.
BRENDEN M. LAKE is an Assistant Professor of Psychol- Beyond the specific ingredients in our proposal, we draw
ogy and Data Science at New York University. He a broader distinction between two different computational
received his Ph.D. in Cognitive Science from MIT in approaches to intelligence. The statistical pattern recogni-
2014 and his M.S. and B.S. in Symbolic Systems from tion approach treats prediction as primary, usually in the
Stanford University in 2009. He is a recipient of the context of a specific classification, regression, or control
Robert J. Glushko Prize for Outstanding Doctoral Dis- task. In this view, learning is about discovering features
sertation in Cognitive Science. His research focuses on that have high-value states in common – a shared label in
computational problems that are easier for people a classification setting or a shared value in a reinforcement
than they are for machines.
learning setting – across a large, diverse set of training data.
TOMER D. ULLMAN is a Postdoctoral Researcher at The alternative approach treats models of the world as
MIT and Harvard University through The Center for primary, where learning is the process of model building.
Brains, Minds and Machines (CBMM). He received Cognition is about using these models to understand the
his Ph.D. from the Department of Brain and Cognitive world, to explain what we see, to imagine what could
Sciences at MIT in 2015 and his B.S. in Physics and have happened that didn’t, or what could be true that
Cognitive Science from the Hebrew University of Jeru- isn’t, and then planning actions to make it so. The differ-
salem in 2008. His research interests include intuitive ence between pattern recognition and model building,
physics, intuitive psychology, and computational between prediction and explanation, is central to our view
models of cognitive development. of human intelligence. Just as scientists seek to explain
nature, not simply predict it, we see human thought as fun-
JOSHUA B. TENENBAUM is a Professor of Computational
Cognitive Science in the Department of Brain and Cog- damentally a model building activity. We elaborate this key
nitive Sciences at MIT and a principal investigator at point with numerous examples below. We also discuss how
MIT’s Computer Science and Artificial Intelligence pattern recognition, even if it is not the core of intelligence,
Laboratory (CSAIL) and The Center for Brains, can nonetheless support model building, through “model-
Minds and Machines (CBMM). He is a recipient of free” algorithms that learn through experience how to
the Distinguished Scientific Award for Early Career make essential inferences more computationally efficient.
Contribution to Psychology from the American Psycho- Before proceeding, we provide a few caveats about the
logical Association, the Troland Research Award from goals of this article, and a brief overview of the key ideas.
the National Academy of Sciences, and the Howard
Crosby Warren Medal from the Society of Experimental
Psychologists. His research centers on perception, 1.1. What this article is not
learning, and common-sense reasoning in humans and
machines, with the twin goals of better understanding For nearly as long as there have been neural networks,
human intelligence in computational terms and building there have been critiques of neural networks (Crick 1989;
more human-like intelligence in machines. Fodor & Pylyshyn 1988; Marcus 1998, 2001; Minsky &
Papert 1969; Pinker & Prince 1988). Although we are crit-
SAMUEL J. GERSHMAN is an Assistant Professor of Psy- ical of neural networks in this article, our goal is to build on
chology at Harvard University. He received his Ph.D. in their successes rather than dwell on their shortcomings. We
Psychology and Neuroscience from Princeton Univer- see a role for neural networks in developing more human-
sity in 2013 and his B.A. in Neuroscience and Behavior like learning machines: They have been applied in compel-
from Columbia University in 2007. He is a recipient of
ling ways to many types of machine learning problems,
the Robert J. Glushko Prize for Outstanding Doctoral
Dissertation in Cognitive Science. His research demonstrating the power of gradient-based learning and
focuses on reinforcement learning, decision making, deep hierarchies of latent variables. Neural networks also
and memory. have a rich history as computational models of cognition
(McClelland et al. 1986; Rumelhart et al. 1986b). It is a

2 BEHAVIORAL AND BRAIN


Downloaded from https://www.cambridge.org/core. SCIENCES,
NYU Medical Center: 40 (2017)
Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people
Table 1. Glossary

Neural network: A network of simple neuron-like processing units that collectively performs complex computations. Neural networks
are often organized into layers, including an input layer that presents the data (e.g., an image), hidden layers that transform the data
into intermediate representations, and an output layer that produces a response (e.g., a label or an action). Recurrent connections are
also popular when processing sequential data.
Deep learning: A neural network with at least one hidden layer (some networks have dozens). Most state-of-the-art deep networks are
trained using the backpropagation algorithm to gradually adjust their connection strengths.
Backpropagation: Gradient descent applied to training a deep neural network. The gradient of the objective function (e.g., classification
error or log-likelihood) with respect to the model parameters (e.g., connection weights) is used to make a series of small adjustments to
the parameters in a direction that improves the objective function.
Convolutional neural network (ConvNet): A neural network that uses trainable filters instead of (or in addition to) fully connected
layers with independent weights. The same filter is applied at many locations across an image or across a time series, leading to neural
networks that are effectively larger, but with local connectivity and fewer free parameters.
Model-free and model-based reinforcement learning: Model-free algorithms directly learn a control policy without explicitly
building a model of the environment (reward and state transition distributions). Model-based algorithms learn a model of the
environment and use it to select actions by planning.
Deep Q-learning: A model-free reinforcement-learning algorithm used to train deep neural networks on control tasks such as playing
Atari games. A network is trained to approximate the optimal action-value function Q(s, a), which is the expected long-term cumulative
reward of taking action a in state s and then optimally selecting future actions.
Generative model: A model that specifies a probability distribution over the data. For example, in a classification task with examples X
and class labels y, a generative model specifies the distribution of data given labels P(X | y), as well as a priori on labels P(y), which can
be used for sampling new examples or for classification by using Bayes’ rule to compute P(y | X). A discriminative model specifies P(y | X)
directly, possibly by using a neural network to predict the label for a given data point, and cannot directly be used to sample new
examples or to compute other queries regarding the data. We will generally be concerned with directed generative models (such as
Bayesian networks or probabilistic programs), which can be given a causal interpretation, although undirected (non-causal) generative
models such as Boltzmann machines are also possible.
Program induction: Constructing a program that computes some desired function, where that function is typically specified by training
data consisting of example input-output pairs. In the case of probabilistic programs, which specify candidate generative models for
data, an abstract description language is used to define a set of allowable programs, and learning is a search for the programs likely to
have generated the data.

history we describe in more detail in the next section. At a can usefully inform AI and machine learning (and has already
more fundamental level, any computational model of learn- done so), especially for the types of domains and tasks that
ing must ultimately be grounded in the brain’s biological people excel at. Despite recent computational achievements,
neural networks. people are better than machines at solving a range of difficult
We also believe that future generations of neural computational problems, including concept learning, scene
networks will look very different from the current state- understanding, language acquisition, language understand-
of-the-art neural networks. They may be endowed with ing, speech recognition, and so on. Other human cognitive
intuitive physics, theory of mind, causal reasoning, and abilities remain difficult to understand computationally,
other capacities we describe in the sections that follow. including creativity, common sense, and general-purpose rea-
More structure and inductive biases could be built into soning. As long as natural intelligence remains the best
the networks or learned from previous experience with example of intelligence, we believe that the project of
related tasks, leading to more human-like patterns of learn- reverse engineering the human solutions to difficult compu-
ing and development. Networks may learn to effectively tational problems will continue to inform and advance AI.
search for and discover new mental models or intuitive Finally, whereas we focus on neural network approaches
theories, and these improved models will, in turn, enable to AI, we do not wish to give the impression that these are
subsequent learning, allowing systems that learn-to- the only contributors to recent advances in AI. On the con-
learn – using previous knowledge to make richer inferences trary, some of the most exciting recent progress has been in
from very small amounts of training data. new forms of probabilistic machine learning (Ghahramani
It is also important to draw a distinction between AI that 2015). For example, researchers have developed auto-
purports to emulate or draw inspiration from aspects of mated statistical reasoning techniques (Lloyd et al. 2014),
human cognition and AI that does not. This article focuses automated techniques for model building and selection
on the former. The latter is a perfectly reasonable and (Grosse et al. 2012), and probabilistic programming lan-
useful approach to developing AI algorithms: avoiding cogni- guages (e.g., Gelman et al. 2015; Goodman et al. 2008;
tive or neural inspiration as well as claims of cognitive or Mansinghka et al. 2014). We believe that these approaches
neural plausibility. Indeed, this is how many researchers will play important roles in future AI systems, and they are
have proceeded, and this article has little pertinence to at least as compatible with the ideas from cognitive science
work conducted under this research strategy.1 On the other we discuss here. However, a full discussion of those con-
hand, we believe that reverse engineering human intelligence nections is beyond the scope of the current article.

BEHAVIORAL
Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject AND
to theBRAIN SCIENCES,
Cambridge 40of(2017)
Core terms 3
use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people
1.2. Overview of the key ideas time it takes to say it and hear it. An important motivation
for using neural networks in machine vision and speech
The central goal of this article is to propose a set of core systems is to respond as quickly as the brain does. Although
ingredients for building more human-like learning and neural networks are usually aiming at pattern recognition
thinking machines. We elaborate on each of these ingredi- rather than model building, we discuss ways in which
ents and topics in Section 4, but here we briefly overview these “model-free” methods can accelerate slow model-
the key ideas. based inferences in perception and cognition (sect. 4.3.1)
The first set of ingredients focuses on developmental (see Glossary in Table 1). By learning to recognize patterns
“start-up software,” or cognitive capabilities present early in these inferences, the outputs of inference can be pre-
in development. There are several reasons for this focus dicted without having to go through costly intermediate
on development. If an ingredient is present early in devel- steps. Integrating neural networks that “learn to do infer-
opment, it is certainly active and available well before a ence” with rich model building learning mechanisms
child or adult would attempt to learn the types of tasks dis- offers a promising way to explain how human minds can
cussed in this paper. This is true regardless of whether the understand the world so well and so quickly.
early-present ingredient is itself learned from experience or We also discuss the integration of model-based and
innately present. Also, the earlier an ingredient is present, model-free methods in reinforcement learning (sect.
the more likely it is to be foundational to later development 4.3.2.), an area that has seen rapid recent progress. Once
and learning. a causal model of a task has been learned, humans can
We focus on two pieces of developmental start-up soft- use the model to plan action sequences that maximize
ware (see Wellman & Gelman [1992] for a review of future reward. When rewards are used as the metric for
both). First is intuitive physics (sect. 4.1.1): Infants successs in model building, this is known as model-based
have primitive object concepts that allow them to track reinforcement learning. However, planning in complex
objects over time and to discount physically implausible models is cumbersome and slow, making the speed-
trajectories. For example, infants know that objects will accuracy trade-off unfavorable for real-time control. By
persist over time and that they are solid and coherent. contrast, model-free reinforcement learning algorithms,
Equipped with these general principles, people can such as current instantiations of deep reinforcement learn-
learn more quickly and make more accurate predictions. ing, support fast control, but at the cost of inflexibility and
Although a task may be new, physics still works the same possibly accuracy. We review evidence that humans
way. A second type of software present in early develop- combine model-based and model-free learning algorithms
ment is intuitive psychology (sect. 4.1.2): Infants under- both competitively and cooperatively and that these inter-
stand that other people have mental states like goals actions are supervised by metacognitive processes. The
and beliefs, and this understanding strongly constrains sophistication of human-like reinforcement learning has
their learning and predictions. A child watching an yet to be realized in AI systems, but this is an area where
expert play a new video game can infer that the avatar crosstalk between cognitive and engineering approaches
has agency and is trying to seek reward while avoiding is especially promising.
punishment. This inference immediately constrains
other inferences, allowing the child to infer what
objects are good and what objects are bad. These types 2. Cognitive and neural inspiration in artificial
of inferences further accelerate the learning of new tasks. intelligence
Our second set of ingredients focus on learning. Although
there are many perspectives on learning, we see model build- The questions of whether and how AI should relate to
ing as the hallmark of human-level learning, or explaining human cognitive psychology are older than the terms arti-
observed data through the construction of causal models of ficial intelligence and cognitive psychology. Alan Turing
the world (sect. 4.2.2). From this perspective, the early- suspected that it was easier to build and educate a child-
present capacities for intuitive physics and psychology are machine than try to fully capture adult human cognition
also causal models of the world. A primary job of learning is (Turing 1950). Turing pictured the child’s mind as a note-
to extend and enrich these models and to build analogous book with “rather little mechanism and lots of blank
causally structured theories of other domains. sheets,” and the mind of a child-machine as filling in the
Compared with state-of-the-art algorithms in machine notebook by responding to rewards and punishments,
learning, human learning is distinguished by its richness similar to reinforcement learning. This view on representa-
and its efficiency. Children come with the ability and the tion and learning echoes behaviorism, a dominant psycho-
desire to uncover the underlying causes of sparsely logical tradition in Turing’s time. It also echoes the strong
observed events and to use that knowledge to go far empiricism of modern connectionist models – the idea
beyond the paucity of the data. It might seem paradoxical that we can learn almost everything we know from the stat-
that people are capable of learning these richly structured istical patterns of sensory inputs.
models from very limited amounts of experience. We Cognitive science repudiated the oversimplified beha-
suggest that compositionality and learning-to-learn are viorist view and came to play a central role in early AI
ingredients that make this type of rapid model learning pos- research (Boden 2006). Newell and Simon (1961) devel-
sible (sects. 4.2.1 and 4.2.3, respectively). oped their “General Problem Solver” as both an AI algo-
A final set of ingredients concerns how the rich models rithm and a model of human problem solving, which
our minds build are put into action, in real time (sect. they subsequently tested experimentally (Newell &
4.3). It is remarkable how fast we are to perceive and to Simon 1972). AI pioneers in other areas of research
act. People can comprehend a novel scene in a fraction of explicitly referenced human cognition and even published
a second, or a novel utterance in little more than the papers in cognitive psychology journals (e.g., Bobrow &

4 BEHAVIORAL AND BRAIN


Downloaded from https://www.cambridge.org/core. SCIENCES,
NYU Medical Center: 40 (2017)
Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

Winograd 1977; Hayes-Roth & Hayes-Roth 1979; Wino- for studying cognition, this position on the nature of repre-
grad 1972). For example, Schank (1972), writing in the sentation is often accompanied by a relatively “blank slate”
journal Cognitive Psychology, declared that “We hope to vision of initial knowledge and representation, much like
be able to build a program that can learn, as a child Turing’s blank notebook.
does, how to do what we have described in this paper When attempting to understand a particular cognitive
instead of being spoon-fed the tremendous information ability or phenomenon within this paradigm, a common sci-
necessary” (p. 629). entific strategy is to train a relatively generic neural network
A similar sentiment was expressed by Minsky (1974): “I to perform the task, adding additional ingredients only when
draw no boundary between a theory of human thinking necessary. This approach has shown that neural networks
and a scheme for making an intelligent machine; no can behave as if they learned explicitly structured knowl-
purpose would be served by separating these today since edge, such as a rule for producing the past tense of words
neither domain has theories good enough to explain—or (Rumelhart & McClelland 1986), rules for solving simple
to produce—enough mental capacity” (p. 6). balance beam physics problems (McClelland 1988), or a
Much of this research assumed that human knowledge tree to represent types of living things (plants and animals)
representation is symbolic and that reasoning, language, and their distribution of properties (Rogers & McClelland
planning and vision could be understood in terms of sym- 2004). Training large-scale relatively generic networks is
bolic operations. Parallel to these developments, a radically also the best current approach for object recognition (He
different approach was being explored based on neuron- et al. 2016; Krizhevsky et al. 2012; Russakovsky et al. 2015;
like “sub-symbolic” computations (e.g., Fukushima 1980; Szegedy et al. 2014), where the high-level feature represen-
Grossberg 1976; Rosenblatt 1958). The representations tations of these convolutional nets have also been used to
and algorithms used by this approach were more directly predict patterns of neural response in human and
inspired by neuroscience than by cognitive psychology, macaque IT cortex (Khaligh-Razavi & Kriegeskorte 2014;
although ultimately it would flower into an influential Kriegeskorte 2015; Yamins et al. 2014), as well as human
school of thought about the nature of cognition: parallel typicality ratings (Lake et al. 2015b) and similarity ratings
distributed processing (PDP) (McClelland et al. 1986; (Peterson et al. 2016) for images of common objects. More-
Rumelhart et al. 1986b). As its name suggests, PDP empha- over, researchers have trained generic networks to perform
sizes parallel computation by combining simple units to col- structured and even strategic tasks, such as the recent work
lectively implement sophisticated computations. The on using a Deep Q-learning Network (DQN) to play simple
knowledge learned by these neural networks is thus distrib- video games (Mnih et al. 2015) (see Glossary in Table 1). If
uted across the collection of units rather than localized as in neural networks have such broad application in machine
most symbolic data structures. The resurgence of recent vision, language, and control, and if they can be trained to
interest in neural networks, more commonly referred to emulate the rule-like and structured behaviors that charac-
as “deep learning,” shares the same representational com- terize cognition, do we need more to develop truly
mitments and often even the same learning algorithms as human-like learning and thinking machines? How far can
the earlier PDP models. “Deep” refers to the fact that relatively generic neural networks bring us toward this goal?
more powerful models can be built by composing many
layers of representation (see LeCun et al. [2015] and
Schmidhuber [2015] for recent reviews), still very much 3. Challenges for building more human-like
in the PDP style while utilizing recent advances in hard- machines
ware and computing capabilities, as well as massive data
sets, to learn deeper models. Although cognitive science has not yet converged on a
It is also important to clarify that the PDP perspective is single account of the mind or intelligence, the claim that
compatible with “model building” in addition to “pattern a mind is a collection of general-purpose neural networks
recognition.” Some of the original work done under the with few initial constraints is rather extreme in contempo-
banner of PDP (Rumelhart et al. 1986b) is closer to model rary cognitive science. A different picture has emerged
building than pattern recognition, whereas the recent that highlights the importance of early inductive biases,
large-scale discriminative deep learning systems more including core concepts such as number, space, agency,
purely exemplify pattern recognition (see Bottou [2014] and objects, as well as powerful learning algorithms that
for a related discussion). But, as discussed, there is also a rely on prior knowledge to extract knowledge from small
question of the nature of the learned representations amounts of training data. This knowledge is often richly
within the model – their form, compositionality, and trans- organized and theory-like in structure, capable of the
ferability – and the developmental start-up software that graded inferences and productive capacities characteristic
was used to get there. We focus on these issues in this article. of human thought.
Neural network models and the PDP approach offer a Here we present two challenge problems for machine
view of the mind (and intelligence more broadly) that is learning and AI: learning simple visual concepts (Lake
sub-symbolic and often populated with minimal constraints et al. 2015a) and learning to play the Atari game Frostbite
and inductive biases to guide learning. Proponents of this (Mnih et al. 2015). We also use the problems as running
approach maintain that many classic types of structured examples to illustrate the importance of core cognitive
knowledge, such as graphs, grammars, rules, objects, struc- ingredients in the sections that follow.
tural descriptions, and programs, can be useful yet mislead-
ing metaphors for characterizing thought. These structures
3.1. The Characters Challenge
are more epiphenomenal than real, emergent properties of
more fundamental sub-symbolic cognitive processes The first challenge concerns handwritten character recog-
(McClelland et al. 2010). Compared with other paradigms nition, a classic problem for comparing different types of

BEHAVIORAL
Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject AND
to theBRAIN SCIENCES,
Cambridge 40of(2017)
Core terms 5
use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

Figure 1. The Characters Challenge: Human-level learning of novel handwritten characters (A), with the same abilities also illustrated
for a novel two-wheeled vehicle (B). A single example of a new visual concept (red box) can be enough information to support the (i)
classification of new examples, (ii) generation of new examples, (iii) parsing an object into parts and relations, and (iv) generation of
new concepts from related concepts. Adapted from Lake et al. (2015a).

machine learning algorithms. Hofstadter (1985) argued other large-scale image classification tasks, it does not
that the problem of recognizing characters in all of the mean that they learn and think in the same way. There
ways people do – both handwritten and printed – contains are at least two important differences: people learn from
most, if not all, of the fundamental challenges of AI. fewer examples and they learn richer representations, a
Whether or not this statement is correct, it highlights the comparison true for both learning handwritten characters
surprising complexity that underlies even “simple” and for learning more general classes of objects (Fig. 1).
human-level concepts like letters. More practically, hand- People can learn to recognize a new handwritten character
written character recognition is a real problem that chil- from a single example (Fig. 1A-i), allowing them to discrim-
dren and adults must learn to solve, with practical inate between novel instances drawn by other people and
applications ranging from reading envelope addresses or similar looking non-instances (Lake et al. 2015a; Miller
checks in an automated teller machine (ATM). Handwrit- et al. 2000). Moreover, people learn more than how to do
ten character recognition is also simpler than more pattern recognition: they learn a concept, that is, a model
general forms of object recognition; the object of interest of the class that allows their acquired knowledge to be flex-
is two-dimensional, separated from the background, and ibly applied in new ways. In addition to recognizing new
usually unoccluded. Compared with how people learn examples, people can also generate new examples
and see other types of objects, it seems possible, in the (Fig. 1A-ii), parse a character into its most important
near term, to build algorithms that can see most of the parts and relations (Fig. 1A-iii) (Lake et al. 2012), and gen-
structure in characters that people can see. erate new characters given a small set of related characters
The standard benchmark is the Mixed National Institute of (Fig. 1A-iv). These additional abilities come for free along
Standards and Technology (MNIST) data set for digit recog- with the acquisition of the underlying concept.
nition, which involves classifying images of digits into the cat- Even for these simple visual concepts, people are still
egories ‘0’ to ‘9’ (LeCun et al. 1998). The training set provides better and more sophisticated learners than the best algo-
6,000 images per class for a total of 60,000 training images. rithms for character recognition. People learn a lot more
With a large amount of training data available, many algo- from a lot less, and capturing these human-level learning
rithms achieve respectable performance, including K- abilities in machines is the Characters Challenge. We
nearest neighbors (5% test error), support vector machines recently reported progress on this challenge using probabi-
(about 1% test error), and convolutional neural networks listic program induction (Lake et al. 2015a) (see Glossary in
(below 1% test error [LeCun et al. 1998]). The best results Table 1), yet aspects of the full human cognitive ability
achieved using deep convolutional nets are very close to remain out of reach. Although both people and models rep-
human-level performance at an error rate of 0.2% (Ciresan resent characters as a sequence of pen strokes and rela-
et al. 2012). Similarly, recent results applying convolutional tions, people have a far richer repertoire of structural
nets to the far more challenging ImageNet object recognition relations between strokes. Furthermore, people can effi-
benchmark have shown that human-level performance is ciently integrate across multiple examples of a character
within reach on that data set as well (Russakovsky et al. 2015). to infer which have optional elements, such as the horizon-
Although humans and neural networks may perform tal cross-bar in ‘7’s, combining different variants of the
equally well on the MNIST digit recognition task and same character into a single coherent representation.

6 BEHAVIORAL AND BRAIN


Downloaded from https://www.cambridge.org/core. SCIENCES,
NYU Medical Center: 40 (2017)
Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

Figure 2. Screenshots of Frostbite, a 1983 video game designed for the Atari game console. (A) The start of a level in Frostbite. The
agent must construct an igloo by hopping between ice floes and avoiding obstacles such as birds. The floes are in constant motion (either
left or right), making multi-step planning essential to success. (B) The agent receives pieces of the igloo (top right) by jumping on the
active ice floes (white), which then deactivates them (blue). (C) At the end of a level, the agent must safely reach the completed
igloo. (D) Later levels include additional rewards (fish) and deadly obstacles (crabs, clams, and bears).

Additional progress may come by combining deep learning The DQN learns to play Frostbite and other Atari games
and probabilistic program induction to tackle even richer by combining a powerful pattern recognizer (a deep convo-
versions of the Characters Challenge. lutional neural network) and a simple model-free reinforce-
ment learning algorithm (Q-learning [Watkins & Dayan
1992]). These components allow the network to map
sensory inputs (frames of pixels) onto a policy over a
3.2. The Frostbite Challenge small set of actions, and both the mapping and the policy
The second challenge concerns the Atari game Frostbite are trained to optimize long-term cumulative reward (the
(Fig. 2), which was one of the control problems tackled game score). The network embodies the strongly empiricist
by the DQN of Mnih et al. (2015). The DQN was a signifi- approach characteristic of most connectionist models: very
cant advance in reinforcement learning, showing that a little is built into the network apart from the assumptions
single algorithm can learn to play a wide variety of about image structure inherent in convolutional networks,
complex tasks. The network was trained to play 49 classic so the network has to essentially learn a visual and concep-
Atari games, proposed as a test domain for reinforcement tual system from scratch for each new game. In Mnih et al.
learning (Bellemare et al. 2013), impressively achieving (2015), the network architecture and hyper-parameters
human-level performance or above on 29 of the games. It were fixed, but the network was trained anew for each
did, however, have particular trouble with Frostbite and game, meaning the visual system and the policy are
other games that required temporally extended planning highly specialized for the games it was trained on. More
strategies. recent work has shown how these game-specific networks
In Frostbite, players control an agent (Frostbite Bailey) can share visual features (Rusu et al. 2016) or be used
tasked with constructing an igloo within a time limit. The to train a multitask network (Parisotto et al. 2016),
igloo is built piece by piece as the agent jumps on ice achieving modest benefits of transfer when learning to
floes in water (Fig. 2A–C). The challenge is that the ice play new games.
floes are in constant motion (moving either left or right), Although it is interesting that the DQN learns to play
and ice floes only contribute to the construction of the games at human-level performance while assuming very
igloo if they are visited in an active state (white, rather little prior knowledge, the DQN may be learning to play
than blue). The agent may also earn extra points by gather- Frostbite and other games in a very different way than
ing fish while avoiding a number of fatal hazards (falling in people do. One way to examine the differences is by consid-
the water, snow geese, polar bears, etc.). Success in this ering the amount of experience required for learning. In
game requires a temporally extended plan to ensure the Mnih et al. (2015), the DQN was compared with a profes-
agent can accomplish a sub-goal (such as reaching an ice sional gamer who received approximately 2 hours of prac-
floe) and then safely proceed to the next sub-goal. Ulti- tice on each of the 49 Atari games (although he or she
mately, once all of the pieces of the igloo are in place, likely had prior experience with some of the games). The
the agent must proceed to the igloo and complete the DQN was trained on 200 million frames from each of the
level before time expires (Fig. 2C). games, which equates to approximately 924 hours of

BEHAVIORAL
Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject AND
to theBRAIN SCIENCES,
Cambridge 40of(2017)
Core terms 7
use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people
even faster. In informal experiments with two of the
authors playing Frostbite on a Javascript emulator (http://
www.virtualatari.org/soft.php?soft=Frostbite), after watch-
ing videos of expert play on YouTube for just 2 minutes,
we found that we were able to reach scores comparable
to or better than the human expert reported in Mnih
et al. (2015) after at most 15 to 20 minutes of total
practice.4
There are other behavioral signatures that suggest funda-
mental differences in representation and learning between
people and the DQN. For example, the game of Frostbite
provides incremental rewards for reaching each active ice
floe, providing the DQN with the relevant sub-goals for
completing the larger task of building an igloo. Without
these sub-goals, the DQN would have to take random
Figure 3. Comparing learning speed for people versus Deep actions until it accidentally builds an igloo and is rewarded
Q-Networks (DQNs). Performance on the Atari 2600 game for completing the entire level. In contrast, people likely do
Frostbite is plotted as a function of game experience (in hours not rely on incremental scoring in the same way when fig-
at a frame rate of 60 fps), which does not include additional uring out how to play a new game. In Frostbite, it is possi-
experience replay. Learning curves and scores are shown from ble to figure out the higher-level goal of building an igloo
different networks: DQN (Mnih et al. 2015), DQN+ (Schaul without incremental feedback; similarly, sparse feedback
et al. 2016), and DQN++ (Wang et al. 2016). Random play is a source of difficulty in other Atari 2600 games such as
achieves a score of 65.2. Montezuma’s Revenge, in which people substantially out-
perform current DQN approaches.
The learned DQN network is also rather inflexible to
game time (about 38 days), or almost 500 times as much changes in its inputs and goals. Changing the color or
experience as the human received.2 Additionally, the appearance of objects or changing the goals of the
DQN incorporates experience replay, where each of network would have devastating consequences on perfor-
these frames is replayed approximately eight more times mance if the network is not retrained. Although any specific
on average over the course of learning. model is necessarily simplified and should not be held to
With the full 924 hours of unique experience and addi- the standard of general human intelligence, the contrast
tional replay, the DQN achieved less than 10% of between DQN and human flexibility is striking nonethe-
human-level performance during a controlled test session less. For example, imagine you are tasked with playing
(see DQN in Fig. 3). More recent variants of the DQN Frostbite with any one of these new goals:
perform better, and can even outperform the human
tester (Schaul et al. 2016; Stadie et al. 2016; van Hasselt 1. Get the lowest possible score.
et al. 2016; Wang et al. 2016), reaching 83% of the profes- 2. Get closest to 100, or 300, or 1,000, or 3,000, or any
sional gamer’s score by incorporating smarter experience level, without going over.
replay (Schaul et al. 2016), and 172% by using smarter 3. Beat your friend, who’s playing next to you, but just
replay and more efficient parameter sharing (Wang et al. barely, not by too much, so as not to embarrass them.
2016) (see DQN+ and DQN++ in Fig. 3).3 But they 4. Go as long as you can without dying.
require a lot of experience to reach this level. The learning 5. Die as quickly as you can.
curve for the model of Wang et al. (2016) shows perfor- 6. Pass each level at the last possible minute, right before
mance is approximately 44% after 200 hours, 8% after the temperature timer hits zero and you die (i.e., come as
100 hours, and less than 2% after 5 hours (which is close close as you can to dying from frostbite without actually dying).
to random play, approximately 1.5%). The differences 7. Get to the furthest unexplored level without regard
between the human and machine learning curves suggest for your score.
that they may be learning different kinds of knowledge, 8. See if you can discover secret Easter eggs.
using different learning mechanisms, or both. 9. Get as many fish as you can.
The contrast becomes even more dramatic if we look at 10. Touch all of the individual ice floes on screen once
the very earliest stages of learning. Although both the orig- and only once.
inal DQN and these more recent variants require multiple 11. Teach your friend how to play as efficiently as
hours of experience to perform reliably better than random possible.
play, even non-professional humans can grasp the basics of
the game after just a few minutes of play. We speculate that This range of goals highlights an essential component of
people do this by inferring a general schema to describe the human intelligence: people can learn models and use them
goals of the game and the object types and their interac- for arbitrary new tasks and goals. Although neural networks
tions, using the kinds of intuitive theories, model-building can learn multiple mappings or tasks with the same set of
abilities and model-based planning mechanisms we stimuli – adapting their outputs depending on a specified
describe below. Although novice players may make some goal – these models require substantial training or reconfig-
mistakes, such as inferring that fish are harmful rather uration to add new tasks (e.g., Collins & Frank 2013; Elia-
than helpful, they can learn to play better than chance smith et al. 2012; Rougier et al. 2005). In contrast, people
within a few minutes. If humans are able to first watch require little or no retraining or reconfiguration, adding
an expert playing for a few minutes, they can learn new tasks and goals to their repertoire with relative ease.

8 BEHAVIORAL AND BRAIN


Downloaded from https://www.cambridge.org/core. SCIENCES,
NYU Medical Center: 40 (2017)
Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

The Frostbite example is a particularly telling contrast our discussion to be agnostic with regards to the origins of
when compared with human play. Even the best deep net- the key ingredients. By the time a child or an adult is
works learn gradually over many thousands of game epi- picking up a new character or learning how to play
sodes, take a long time to reach good performance, and Frostbite, he or she is armed with extensive real-world expe-
are locked into particular input and goal patterns. Humans, rience that deep learning systems do not benefit from –
after playing just a small number of games over a span of experience that would be hard to emulate in any general
minutes, can understand the game and its goals well sense. Certainly, the core ingredients are enriched by this
enough to perform better than deep networks do after experience, and some may even be a product of the experi-
almost a thousand hours of experience. Even more impres- ence itself. Whether learned, built in, or enriched, the key
sively, people understand enough to invent or accept new claim is that these ingredients play an active and important
goals, generalize over changes to the input, and explain the role in producing human-like learning and thought, in
game to others. Why are people different? What core ingre- ways contemporary machine learning has yet to capture.
dients of human intelligence might the DQN and other
modern machine learning methods be missing?
One might object that both the Frostbite and Characters 4.1. Developmental start-up software
challenges draw an unfair comparison between the speed Early in development, humans have a foundational
of human learning and neural network learning. We understanding of several core domains (Spelke 2003;
discuss this objection in detail in Section 5, but we feel it Spelke & Kinzler 2007). These domains include number
is important to anticipate it here as well. To paraphrase (numerical and set operations), space (geometry and navi-
one reviewer of an earlier draft of this article, “It is not gation), physics (inanimate objects and mechanics), and
that DQN and people are solving the same task differently. psychology (agents and groups). These core domains
They may be better seen as solving different tasks. Human cleave cognition at its conceptual joints, and each domain
learners – unlike DQN and many other deep learning is organized by a set of entities and abstract principles relat-
systems – approach new problems armed with extensive ing the entities to each other. The underlying cognitive rep-
prior experience. The human is encountering one in a resentations can be understood as “intuitive theories,” with
years-long string of problems, with rich overlapping a causal structure resembling a scientific theory (Carey
structure. Humans as a result often have important 2004; 2009; Gopnik et al. 2004; Gopnik & Meltzo 1999;
domain-specific knowledge for these tasks, even before Gweon et al. 2010; Schulz 2012b; Wellman & Gelman
they ‘begin.’ The DQN is starting completely from scratch.” 1992; 1998). The “child as scientist” proposal further views
We agree, and indeed this is another way of putting our the process of learning itself as also scientist-like, with
point here. Human learners fundamentally take on different recent experiments showing that children seek out new
learning tasks than today’s neural networks, and if we want to data to distinguish between hypotheses, isolate variables,
build machines that learn and think like people, our test causal hypotheses, make use of the data-generating
machines need to confront the kinds of tasks that human process in drawing conclusions, and learn selectively from
learners do, not shy away from them. People never start others (Cook et al. 2011; Gweon et al. 2010; Schulz et al.
completely from scratch, or even close to “from scratch,” 2007; Stahl & Feigenson 2015; Tsividis et al. 2013). We
and that is the secret to their success. The challenge of build- address the nature of learning mechanisms in Section 4.2.
ing models of human learning and thinking then becomes: Each core domain has been the target of a great deal of
How do we bring to bear rich prior knowledge to learn study and analysis, and together the domains are thought to
new tasks and solve new problems so quickly? What form be shared cross-culturally and partly with non-human
does that prior knowledge take, and how is it constructed, animals. All of these domains may be important augmenta-
from some combination of inbuilt capacities and previous tions to current machine learning, though below, we focus
experience? The core ingredients we propose in the next in particular on the early understanding of objects and
section offer one route to meeting this challenge. agents.

4.1.1. Intuitive physics. Young children have a rich knowl-


4. Core ingredients of human intelligence edge of intuitive physics. Whether learned or innate,
important physical concepts are present at ages far earlier
In the Introduction, we laid out what we see as core ingre- than when a child or adult learns to play Frostbite, suggest-
dients of intelligence. Here we consider the ingredients in ing these resources may be used for solving this and many
detail and contrast them with the current state of neural everyday physics-related tasks.
network modeling. Although these are hardly the only ingre- At the age of 2 months, and possibly earlier, human
dients needed for human-like learning and thought (see our infants expect inanimate objects to follow principles of per-
discussion of language in sect. 5), they are key building sistence, continuity, cohesion, and solidity. Young infants
blocks, which are not present in most current learning- believe objects should move along smooth paths, not
based AI systems – certainly not all present together – and wink in and out of existence, not inter-penetrate and not
for which additional attention may prove especially fruitful. act at a distance (Spelke 1990; Spelke et al. 1995). These
We believe that integrating them will produce significantly expectations guide object segmentation in early infancy,
more powerful and more human-like learning and thinking emerging before appearance-based cues such as color,
abilities than we currently see in AI systems. texture, and perceptual goodness (Spelke 1990).
Before considering each ingredient in detail, it is impor- These expectations also go on to guide later learning. At
tant to clarify that by “core ingredient” we do not necessar- around 6 months, infants have already developed different
ily mean an ingredient that is innately specified by genetics expectations for rigid bodies, soft bodies, and liquids (Rips
or must be “built in” to any learning algorithm. We intend & Hespos 2015). Liquids, for example, are expected to go

BEHAVIORAL
Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject AND
to theBRAIN SCIENCES,
Cambridge 40of(2017)
Core terms 9
use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

Figure 4. The intuitive physics-engine approach to scene understanding, illustrated through tower stability. (A) The engine takes in
inputs through perception, language, memory, and other faculties. It then constructs a physical scene with objects, physical
properties, and forces; simulates the scene’s development over time; and hands the output to other reasoning systems. (B) Many
possible “tweaks” to the input can result in very different scenes, requiring the potential discovery, training, and evaluation of new
features for each tweak. Adapted from Battaglia et al. (2013).

through barriers, while solid objects cannot (Hespos et al. blocks were made of different materials (Styrofoam, lead,
2009). By their first birthday, infants have gone through ice)? What if the blocks of one color were much heavier
several transitions of comprehending basic physical con- than those of other colors? Each of these physical judg-
cepts such as inertia, support, containment, and collisions ments may require new features or new training for a
(Baillargeon 2004; Baillargeon et al. 2009; Hespos & Bail- pattern recognition account to work at the same level as
largeon 2008). the model-based simulator.
There is no single agreed-upon computational account of What are the prospects for embedding or acquiring this
these early physical principles and concepts, and previous kind of intuitive physics in deep learning systems? Connec-
suggestions have ranged from decision trees (Baillargeon tionist models in psychology have previously been applied
et al. 2009), to cues, to lists of rules (Siegler & Chen to physical reasoning tasks such as balance-beam rules
1998). A promising recent approach sees intuitive physical (McClelland 1988; Shultz 2003) or rules relating to dis-
reasoning as similar to inference over a physics software tance, velocity, and time in motion (Buckingham &
engine, the kind of simulators that power modern-day ani- Shultz 2000). However, these networks do not attempt to
mations and games (Bates et al. 2015; Battaglia et al. 2013; work with complex scenes as input, or a wide range of sce-
Gerstenberg et al. 2015; Sanborn et al. 2013). According to narios and judgments as in Figure 4. A recent paper from
this hypothesis, people reconstruct a perceptual scene using Facebook AI researchers (Lerer et al. 2016) represents
internal representations of the objects and their physically an exciting step in this direction. Lerer et al. (2016)
relevant properties (such as mass, elasticity, and surface fric- trained a deep convolutional network-based system
tion) and forces acting on objects (such as gravity, friction, or (PhysNet) to predict the stability of block towers from sim-
collision impulses). Relative to physical ground truth, the ulated images similar to those in Figure 4A, but with much
intuitive physical state representation is approximate and simpler configurations of two, three, or four cubical blocks
probabilistic, and oversimplified and incomplete in many stacked vertically. Impressively, PhysNet generalized to
ways. Still, it is rich enough to support mental simulations simple real images of block towers, matching human per-
that can predict how objects will move in the immediate formance on these images, meanwhile exceeding human
future, either on their own or in responses to forces we performance on synthetic images. Human and PhysNet
might apply. confidence were also correlated across towers, although
This “intuitive physics engine” approach enables flexible not as strongly as for the approximate probabilistic simula-
adaptation to a wide range of everyday scenarios and judg- tion models and experiments of Battaglia et al. (2013). One
ments in a way that goes beyond perceptual cues. For limitation is that PhysNet currently requires extensive
example, (Fig. 4), a physics-engine reconstruction of a training – between 100,000 and 200,000 scenes – to learn
tower of wooden blocks from the game Jenga can be judgments for just a single task (will the tower fall?) on a
used to predict whether (and how) a tower will fall, narrow range of scenes (towers with two to four cubes).
finding close quantitative fits to how adults make these pre- It has been shown to generalize, but also only in limited
dictions (Battaglia et al. 2013), as well as simpler kinds of ways (e.g., from towers of two and three cubes to towers
physical predictions that have been studied in infants of four cubes). In contrast, people require far less experi-
(Téglás et al. 2011). Simulation-based models can also ence to perform any particular task, and can generalize to
capture how people make hypothetical or counterfactual many novel judgments and complex scenes with no new
predictions: What would happen if certain blocks were training required (although they receive large amounts of
taken away, more blocks were added, or the table support- physics experience through interacting with the world
ing the tower was jostled? What if certain blocks were glued more generally). Could deep learning systems such as
together, or attached to the table surface? What if the PhysNet capture this flexibility, without explicitly

10 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

simulating the causal interactions between objects in three human learning and thought. Pre-verbal infants distinguish
dimensions? We are not sure, but we hope this is a chal- animate agents from inanimate objects. This distinction is
lenge they will take on. partially based on innate or early-present detectors for
Alternatively, instead of trying to make predictions low-level cues, such as the presence of eyes, motion initiated
without simulating physics, could neural networks be from rest, and biological motion (Johnson et al. 1998;
trained to emulate a general-purpose physics simulator, Premack & Premack 1997; Schlottmann et al. 2006; Trem-
given the right type and quantity of training data, such as oulet & Feldman 2000). Such cues are often sufficient but
the raw input experienced by a child? This is an active not necessary for the detection of agency.
and intriguing area of research, but it too faces significant Beyond these low-level cues, infants also expect agents to
challenges. For networks trained on object classification, act contingently and reciprocally, to have goals, and to take
deeper layers often become sensitive to successively efficient actions toward those goals subject to constraints
higher-level features, from edges to textures to shape- (Csibra 2008; Csibra et al. 2003; Spelke & Kinzler 2007).
parts to full objects (Yosinski et al. 2014; Zeiler & Fergus These goals can be socially directed; at around 3 months
2014). For deep networks trained on physics-related data, of age, infants begin to discriminate antisocial agents that
it remains to be seen whether higher layers will encode hurt or hinder others from neutral agents (Hamlin 2013;
objects, general physical properties, forces, and approxi- Hamlin et al. 2010), and they later distinguish between
mately Newtonian dynamics. A generic network trained anti-social, neutral, and pro-social agents (Hamlin et al.
on dynamic pixel data might learn an implicit representa- 2007; 2013).
tion of these concepts, but would it generalize broadly It is generally agreed that infants expect agents to act in a
beyond training contexts as people’s more explicit physical goal-directed, efficient, and socially sensitive fashion
concepts do? Consider, for example, a network that learns (Spelke & Kinzler 2007). What is less agreed on is the com-
to predict the trajectories of several balls bouncing in a box putational architecture that supports this reasoning and
(Kodratoff & Michalski 2014). If this network has actually whether it includes any reference to mental states and
learned something like Newtonian mechanics, then it explicit goals.
should be able to generalize to interestingly different One possibility is that intuitive psychology is simply cues
scenarios – at a minimum different numbers of differently “all the way down” (Schlottmann et al. 2013; Scholl & Gao
shaped objects, bouncing in boxes of different shapes and 2013), though this would require more and more cues as
sizes and orientations with respect to gravity, not to the scenarios become more complex. Consider, for
mention more severe generalization tests such as all of example, a scenario in which an agent A is moving toward
the tower tasks discussed above, which also fall under the a box, and an agent B moves in a way that blocks A from
Newtonian domain. Neural network researchers have yet reaching the box. Infants and adults are likely to interpret
to take on this challenge, but we hope they will. Whether B’s behavior as “hindering” (Hamlin 2013). This inference
such models can be learned with the kind (and quantity) could be captured by a cue that states, “If an agent’s
of data available to human infants is not clear, as we expected trajectory is prevented from completion, the
discuss further in Section 5. blocking agent is given some negative association.”
It may be difficult to integrate object and physics-based Although the cue is easily calculated, the scenario is also
primitives into deep neural networks, but the payoff in easily changed to necessitate a different type of cue.
terms of learning speed and performance could be great Suppose A was already negatively associated (a “bad
for many tasks. Consider the case of learning to play Frost- guy”); acting negatively toward A could then be seen as
bite. Although it can be difficult to discern exactly how a good (Hamlin 2013). Or suppose something harmful was
network learns to solve a particular task, the DQN probably in the box, which A did not know about. Now B would
does not parse a Frostbite screenshot in terms of stable be seen as helping, protecting, or defending A. Suppose
objects or sprites moving according to the rules of intuitive A knew there was something bad in the box and wanted
physics (Fig. 2). But incorporating a physics-engine–based it anyway. B could be seen as acting paternalistically. A
representation could help DQNs learn to play games such cue-based account would be twisted into gnarled combina-
as Frostbite in a faster and more general way, whether the tions such as, “If an expected trajectory is prevented from
physics knowledge is captured implicitly in a neural completion, the blocking agent is given some negative asso-
network or more explicitly in a simulator. Beyond reducing ciation, unless that trajectory leads to a negative outcome or
the amount of training data, and potentially improving the the blocking agent is previously associated as positive, or
level of performance reached by the DQN, it could elimi- the blocked agent is previously associated as negative,
nate the need to retrain a Frostbite network if the objects or….”
(e.g., birds, ice floes, and fish) are slightly altered in their One alternative to a cue-based account is to use genera-
behavior, reward structure, or appearance. When a new tive models of action choice, as in the Bayesian inverse
object type such as a bear is introduced, as in the later planning, or Bayesian theory of mind (ToM), models of
levels of Frostbite (Fig. 2D), a network endowed with intu- Baker et al. (2009) or the naive utility calculus models of
itive physics would also have an easier time adding this Jara-Ettinger et al. (2015) (see also Jern and Kemp
object type to its knowledge (the challenge of adding new [2015] and Tauber and Steyvers [2011] and a related alter-
objects was also discussed in Marcus [1998; 2001]). In native based on predictive coding from Kilner et al. [2007]).
this way, the integration of intuitive physics and deep learn- These models formalize explicitly mentalistic concepts such
ing could be an important step toward more human-like as “goal,” “agent,” “planning,” “cost,” “efficiency,” and
learning algorithms. “belief,” used to describe core psychological reasoning in
infancy. They assume adults and children treat agents as
4.1.2. Intuitive psychology. Intuitive psychology is another approximately rational planners who choose the most effi-
early-emerging ability with an important influence on cient means to their goals. Planning computations may be

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 11
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people
formalized as solutions to Markov decision processes Frostbite Bailey die because of the bird, to infer that
(MDPs) or partially observable Markov decision processes birds are probably dangerous. It is enough to see that the
(POMDPs), taking as input utility and belief functions experienced player’s avoidance behavior is best explained
defined over an agent’s state-space and the agent’s state- as acting under that belief.
action transition functions, and returning a series of Similarly, consider how a sidekick agent (increasingly
actions the agent should perform to most efficiently fulfill popular in video games) is expected to help a player
their goals (or maximize their utility). By simulating these achieve his or her goals. This agent can be useful in differ-
planning processes, people can predict what agents might ent ways in different circumstances, such as getting items,
do next, or use inverse reasoning from observing a series clearing paths, fighting, defending, healing, and providing
of actions to infer the utilities and beliefs of agents in a information, all under the general notion of being helpful
scene. This is directly analogous to how simulation engines (Macindoe 2013). An explicit agent representation can
can be used for intuitive physics, to predict what will predict how such an agent will be helpful in new circum-
happen next in a scene or to infer objects’ dynamical proper- stances, whereas a bottom-up pixel-based representation
ties from how they move. It yields similarly flexible reasoning is likely to struggle.
abilities: Utilities and beliefs can be adjusted to take into There are several ways that intuitive psychology could be
account how agents might act for a wide range of novel incorporated into contemporary deep learning systems.
goals and situations. Importantly, unlike in intuitive Although it could be built in, intuitive psychology may
physics, simulation-based reasoning in intuitive psychology arise in other ways. Connectionists have argued that
can be nested recursively to understand social interactions. innate constraints in the form of hard-wired cortical circuits
We can think about agents thinking about other agents. are unlikely (Elman 2005; Elman et al. 1996), but a simple
As in the case of intuitive physics, the success that inductive bias, for example, the tendency to notice things
generic deep networks will have in capturing intuitive psy- that move other things, can bootstrap reasoning about
chological reasoning will depend in part on the representa- more abstract concepts of agency (Ullman et al. 2012a).6
tions humans use. Although deep networks have not yet Similarly, a great deal of goal-directed and socially directed
been applied to scenarios involving theory of mind and actions can also be boiled down to a simple utility calculus
intuitive psychology, they could probably learn visual (e.g., Jara-Ettinger et al. 2015), in a way that could be
cues, heuristics and summary statistics of a scene that shared with other cognitive abilities. Although the origins
happens to involve agents.5 If that is all that underlies of intuitive psychology are still a matter of debate, it is
human psychological reasoning, a data-driven deep learn- clear that these abilities are early emerging and play an
ing approach can likely find success in this domain. important role in human learning and thought, as exempli-
However, it seems to us that any full formal account of fied in the Frostbite challenge and when learning to play
intuitive psychological reasoning needs to include repre- novel video games more broadly.
sentations of agency, goals, efficiency, and reciprocal rela-
tions. As with objects and forces, it is unclear whether a
4.2. Learning as rapid model building
complete representation of these concepts (agents, goals,
etc.) could emerge from deep neural networks trained in Since their inception, neural networks models have
a purely predictive capacity. Similar to the intuitive stressed the importance of learning. There are many learn-
physics domain, it is possible that with a tremendous ing algorithms for neural networks, including the percep-
number of training trajectories in a variety of scenarios, tron algorithm (Rosenblatt 1958), Hebbian learning
deep learning techniques could approximate the reasoning (Hebb 1949), the BCM rule (Bienenstock et al. 1982),
found in infancy even without learning anything about goal- backpropagation (Rumelhart et al. 1986a), the wake-sleep
directed or socially directed behavior more generally. But algorithm (Hinton et al. 1995), and contrastive divergence
this is also unlikely to resemble how humans learn, under- (Hinton 2002). Whether the goal is supervised or unsuper-
stand, and apply intuitive psychology unless the concepts vised learning, these algorithms implement learning as a
are genuine. In the same way that altering the setting of process of gradual adjustment of connection strengths.
a scene or the target of inference in a physics-related task For supervised learning, the updates are usually aimed at
may be difficult to generalize without an understanding improving the algorithm’s pattern recognition capabilities.
of objects, altering the setting of an agent or their goals For unsupervised learning, the updates work toward grad-
and beliefs is difficult to reason about without understand- ually matching the statistics of the model’s internal patterns
ing intuitive psychology. with the statistics of the input data.
In introducing the Frostbite challenge, we discussed how In recent years, machine learning has found particular
people can learn to play the game extremely quickly by success using backpropagation and large data sets to solve
watching an experienced player for just a few minutes difficult pattern recognition problems (see Glossary in
and then playing a few rounds themselves. Intuitive psy- Table 1). Although these algorithms have reached human-
chology provides a basis for efficient learning from level performance on several challenging benchmarks, they
others, especially in teaching settings with the goal of com- are still far from matching human-level learning in other
municating knowledge efficiently (Shafto et al. 2014). In ways. Deep neural networks often need more data than
the case of watching an expert play Frostbite, whether or people do to solve the same types of problems, whether it
not there is an explicit goal to teach, intuitive psychology is learning to recognize a new type of object or learning to
lets us infer the beliefs, desires, and intentions of the expe- play a new game. When learning the meanings of words in
rienced player. For example, we can learn that the birds are their native language, children make meaningful generaliza-
to be avoided from seeing how the experienced player tions from very sparse data (Carey & Bartlett 1978; Landau
appears to avoid them. We do not need to experience a et al. 1988; Markman 1989; Smith et al. 2002; Xu & Tenen-
single example of encountering a bird, and watching baum 2007; although see Horst & Samuelson 2008

12 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

regarding memory limitations). Children may only need to important components (Fig. 1B-iii), or even create a new
see a few examples of the concepts hairbrush, pineapple, complex concept through the combination of familiar con-
and lightsaber, before they largely “get it,” grasping the cepts (Fig. 1B-iv). Likewise, as discussed in the context of
boundary of the infinite set that defines each concept from Frostbite, a learner who has acquired the basics of the
the infinite set of all possible objects. Children are far game could flexibly apply his or her knowledge to an infi-
more practiced than adults at learning new concepts, learn- nite set of Frostbite variants (sect. 3.2). The acquired
ing roughly 9 or 10 new words each day, after beginning to knowledge supports reconfiguration to new tasks and new
speak through the end of high school (Bloom 2000; Carey demands, such as modifying the goals of the game to
1978). Yet the ability for rapid “one-shot” learning does not survive, while acquiring as few points as possible, or to effi-
disappear in adulthood. An adult may need to see a single ciently teach the rules to a friend.
image or movie of a novel two-wheeled vehicle to infer the This richness and flexibility suggest that learning as
boundary between this concept and others, allowing him or model building is a better metaphor than learning as
her to discriminate new examples of that concept from pattern recognition. Furthermore, the human capacity for
similar-looking objects of a different type (Fig. 1B-i). one-shot learning suggests that these models are built
Contrasting with the efficiency of human learning, upon rich domain knowledge rather than starting from a
neural networks, by virtue of their generality as highly flex- blank slate (Mikolov et al. 2016; Mitchell et al. 1986). In con-
ible function approximators, are notoriously data hungry trast, much of the recent progress in deep learning has been
(the bias/variance dilemma [Geman et al. 1992]). Bench- on pattern recognition problems, including object recogni-
mark tasks such as the ImageNet data set for object recog- tion, speech recognition, and (model-free) video game learn-
nition provide hundreds or thousands of examples per class ing, that use large data sets and little domain knowledge.
(Krizhevsky et al. 2012; Russakovsky et al. 2015): 1,000 There has been recent work on other types of tasks,
hairbrushes, 1,000 pineapples, and so on. In the context including learning generative models of images (Denton
of learning new, handwritten characters or learning to et al. 2015; Gregor et al. 2015), caption generation (Karpa-
play Frostbite, the MNIST benchmark includes 6,000 thy & Fei-Fei 2017; Vinyals et al. 2014; Xu et al. 2015),
examples of each handwritten digit (LeCun et al. 1998), question answering (Sukhbaatar et al. 2015; Weston et al.
and the DQN of Mnih et al. (2015) played each Atari 2015b), and learning simple algorithms (Graves et al.
video game for approximately 924 hours of unique training 2014; Grefenstette et al. 2015). We discuss question
experience (Fig. 3). In both cases, the algorithms are clearly answering and learning simple algorithms in Section 6.1.
using information less efficiently than a person learning to Yet, at least for image and caption generation, these tasks
perform the same tasks. have been mostly studied in the big data setting that is at
It is also important to mention that there are many odds with the impressive human ability to generalize
classes of concepts that people learn more slowly. Concepts from small data sets (although see Rezende et al. [2016]
that are learned in school are usually far more challenging for a deep learning approach to the Character Challenge).
and more difficult to acquire, including mathematical func- And it has been difficult to learn neural network–style rep-
tions, logarithms, derivatives, integrals, atoms, electrons, resentations that effortlessly generalize new tasks that they
gravity, DNA, and evolution. There are also domains for were not trained on (see Davis & Marcus 2015; Marcus
which machine learners outperform human learners, such 1998; 2001). What additional ingredients may be needed
as combing through financial or weather data. But for the to rapidly learn more powerful and more general-purpose
vast majority of cognitively natural concepts – the types of representations?
things that children learn as the meanings of words – A relevant case study is from our own work on the Char-
people are still far better learners than machines. This is acters Challenge (sect. 3.1; Lake 2014; Lake et al. 2015a).
the type of learning we focus on in this section, which is People and various machine learning approaches were
more suitable for the enterprise of reverse engineering compared on their ability to learn new handwritten charac-
and articulating additional principles that make human ters from the world’s alphabets. In addition to evaluating
learning successful. It also opens the possibility of building several types of deep learning models, we developed an
these ingredients into the next generation of machine algorithm using Bayesian program learning (BPL) that rep-
learning and AI algorithms, with potential for making pro- resents concepts as simple stochastic programs: structured
gress on learning concepts that are both easy and difficult procedures that generate new examples of a concept when
for humans to acquire. executed (Fig. 5A). These programs allow the model to
Even with just a few examples, people can learn remark- express causal knowledge about how the raw data are
ably rich conceptual models. One indicator of richness is formed, and the probabilistic semantics allow the model to
the variety of functions that these models support handle noise and perform creative tasks. Structure sharing
(Markman & Ross 2003; Solomon et al. 1999). Beyond clas- across concepts is accomplished by the compositional
sification, concepts support prediction (Murphy & Ross re-use of stochastic primitives that can combine in new
1994; Rips 1975), action (Barsalou 1983), communication ways to create new concepts.
(Markman & Makin 1998), imagination (Jern & Kemp Note that we are overloading the word model to refer to
2013; Ward 1994), explanation (Lombrozo 2009; Williams the BPL framework as a whole (which is a generative
& Lombrozo 2010), and composition (Murphy 1988; Osh- model), as well as the individual probabilistic models (or
erson & Smith 1981). These abilities are not independent; concepts) that it infers from images to represent novel
rather they hang together and interact (Solomon et al. handwritten characters. There is a hierarchy of models: a
1999), coming for free with the acquisition of the underly- higher-level program that generates different types of con-
ing concept. Returning to the previous example of a novel cepts, which are themselves programs that can be run to
two-wheeled vehicle, a person can sketch a range of new generate tokens of a concept. Here, describing learning
instances (Fig. 1B-ii), parse the concept into its most as “rapid model building” refers to the fact that BPL

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 13
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

Figure 5. A causal, compositional model of handwritten characters. (A) New types are generated compositionally by choosing primitive
actions (color coded) from a library (i), combining these sub-parts (ii) to make parts (iii), and combining parts with relations to define
simple programs (iv). These programs can create different tokens of a concept (v) that are rendered as binary images (vi). (B)
Probabilistic inference allows the model to generate new examples from just one example of a new concept; shown here in a visual
Turing test. An example image of a new concept is shown above each pair of grids. One grid was generated by nine people and the
other is nine samples from the BPL model. Which grid in each pair (A or B) was generated by the machine? Answers by row:
1,2;1,1. Adapted from Lake et al. (2015a).

constructs generative models (lower-level programs) that seemingly infinite space of possibilities (Fodor 1975;
produce tokens of a concept (Fig. 5B). Fodor & Pylyshyn 1988; Marcus 2001; Piantadosi 2011).
Learning models of this form allows BPL to perform a Compositionality has been broadly influential in both AI
challenging one-shot classification task at human-level per- and cognitive science, especially as it pertains to theories of
formance (Fig. 1A-i) and to outperform current deep learn- object recognition, conceptual representation, and lan-
ing models such as convolutional networks (Koch et al. guage. Here, we focus on compositional representations
2015).7 The representations that BPL learns also enable of object concepts for illustration. Structural description
it to generalize in other, more creative, human-like ways, models represent visual concepts as compositions of parts
as evaluated using “visual Turing tests” (e.g., Fig. 5B). and relations, which provides a strong inductive bias for
These tasks include generating new examples (Figs. 1A-ii constructing models of new concepts (Biederman 1987;
and 5B), parsing objects into their essential components Hummel & Biederman 1992; Marr & Nishihara 1978;
(Fig. 1A-iii), and generating new concepts in the style of van den Hengel et al. 2015; Winston 1975). For instance,
a particular alphabet (Fig. 1A-iv). The following sections the novel two-wheeled vehicle in Figure 1B might be rep-
discuss the three main ingredients – compositionality, cau- resented as two wheels connected by a platform, which
sality, and learning-to-learn – that were important to the provides the base for a post, which holds the handlebars,
success of this framework and, we believe, are important and so on. Parts can themselves be composed of sub-
to understanding human learning as rapid model building parts, forming a “partonomy” of part-whole relationships
more broadly. Although these ingredients fit naturally (Miller & Johnson-Laird 1976; Tversky & Hemenway
within a BPL or a probabilistic program induction frame- 1984). In the novel vehicle example, the parts and relations
work, they could also be integrated into deep learning can be shared and re-used from existing related concepts,
models and other types of machine learning algorithms, such as cars, scooters, motorcycles, and unicycles.
prospects we discuss in more detail below. Because the parts and relations are themselves a product
of previous learning, their facilitation of the construction
4.2.1. Compositionality. Compositionality is the classic of new models is also an example of learning-to-learn,
idea that new representations can be constructed through another ingredient that is covered below. Although compo-
the combination of primitive elements. In computer pro- sitionality and learning-to-learn fit naturally together, there
gramming, primitive functions can be combined to create are also forms of compositionality that rely less on previous
new functions, and these new functions can be further learning, such as the bottom-up, parts-based representa-
combined to create even more complex functions. This tion of Hoffman and Richards (1984).
function hierarchy provides an efficient description of Learning models of novel handwritten characters can be
higher-level functions, such as a hierarchy of parts for operationalized in a similar way. Handwritten characters
describing complex objects or scenes (Bienenstock et al. are inherently compositional, where the parts are pen
1997). Compositionality is also at the core of productivity: strokes, and relations describe how these strokes connect
an infinite number of representations can be constructed to each other. Lake et al. (2015a) modeled these parts
from a finite set of primitives, just as the mind can think using an additional layer of compositionality, where parts
an infinite number of thoughts, utter or understand an infi- are complex movements created from simpler sub-part
nite number of sentences, or learn new concepts from a movements. New characters can be constructed by

14 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

combining parts, sub-parts, and relations in novel ways represent the structure of the environment, such as model-
(Fig. 5). Compositionality is also central to the construction ing state-to-state transitions or action/state-to-state
of other types of symbolic concepts beyond characters, transitions.
where new spoken words can be created through a novel Concept learning and vision models that use causality are
combination of phonemes (Lake et al. 2014), or a new usually generative (as opposed to discriminative; see Glos-
gesture or dance move can be created through a combina- sary in Table 1), but not every generative model is also
tion of more primitive body movements. causal. Although a generative model describes a process
An efficient representation for Frostbite should be sim- for generating data, or at least assigns a probability distribu-
ilarly compositional and productive. A scene from the tion over possible data points, this generative process may
game is a composition of various object types, including not resemble how the data are produced in the real
birds, fish, ice floes, igloos, and so on (Fig. 2). Representing world. Causality refers to the subclass of generative
this compositional structure explicitly is both more eco- models that resemble, at an abstract level, how the data
nomical and better for generalization, as noted in previous are actually generated. Although generative neural net-
work on object-oriented reinforcement learning (Diuk works such as Deep Belief Networks (Hinton et al. 2006)
et al. 2008). Many repetitions of the same objects are or variational auto-encoders (Gregor et al. 2016; Kingma
present at different locations in the scene, and therefore, et al. 2014) may generate compelling handwritten digits,
representing each as an identical instance of the same they mark one end of the “causality spectrum,” because
object with the same properties is important for efficient the steps of the generative process bear little resemblance
representation and quick learning of the game. Further, to steps in the actual process of writing. In contrast, the
new levels may contain different numbers and combina- generative model for characters using BPL does resemble
tions of objects, where a compositional representation of the steps of writing, although even more causally faithful
objects – using intuitive physics and intuitive psychology models are possible.
as glue – would aid in making these crucial generalizations Causality has been influential in theories of perception.
(Fig. 2D). “Analysis-by-synthesis” theories of perception maintain
Deep neural networks have at least a limited notion of that sensory data can be more richly represented by mod-
compositionality. Networks trained for object recognition eling the process that generated it (Bever & Poeppel
encode part-like features in their deeper layers (Zeiler & 2010; Eden 1962; Halle & Stevens 1962; Neisser 1966).
Fergus 2014), whereby the presentation of new types of Relating data to their causal source provides strong priors
objects can activate novel combinations of feature detec- for perception and learning, as well as a richer basis for gen-
tors. Similarly, a DQN trained to play Frostbite may eralizing in new ways and to new tasks. The canonical exam-
learn to represent multiple replications of the same ples of this approach are speech and visual perception. For
object with the same features, facilitated by the invariance example, Liberman et al. (1967) argued that the richness of
properties of a convolutional neural network architecture. speech perception is best explained by inverting the pro-
Recent work has shown how this type of compositionality duction plan, at the level of vocal tract movements, to
can be made more explicit, where neural networks can be explain the large amounts of acoustic variability and the
used for efficient inference in more structured generative blending of cues across adjacent phonemes. As discussed,
models (both neural networks and three-dimensional causality does not have to be a literal inversion of the
scene models) that explicitly represent the number of actual generative mechanisms, as proposed in the motor
objects in a scene (Eslami et al. 2016). Beyond the compo- theory of speech. For the BPL of learning handwritten
sitionality inherent in parts, objects, and scenes, composi- characters, causality is operationalized by treating concepts
tionality can also be important at the level of goals and as motor programs, or abstract causal descriptions of how to
sub-goals. Recent work on hierarchical DQNs shows that produce examples of the concept, rather than concrete
by providing explicit object representations to a DQN, configurations of specific muscles (Fig. 5A). Causality is
and then defining sub-goals based on reaching those an important factor in the model’s success in classifying
objects, DQNs can learn to play games with sparse and generating new examples after seeing just a single
rewards (such as Montezuma’s Revenge) by combining example of a new concept (Lake et al. 2015a) (Fig. 5B).
these sub-goals together to achieve larger goals (Kulkarni Causal knowledge has also been shown to influence how
et al. 2016). people learn new concepts; providing a learner with differ-
We look forward to seeing these new ideas continue to ent types of causal knowledge changes how he or she learns
develop, potentially providing even richer notions of com- and generalizes. For example, the structure of the causal
positionality in deep neural networks that lead to faster network underlying the features of a category influences
and more flexible learning. To capture the full extent of how people categorize new examples (Rehder 2003;
the mind’s compositionality, a model must include explicit Rehder & Hastie 2001). Similarly, as related to the Charac-
representations of objects, identity, and relations, all while ters Challenge, the way people learn to write a novel hand-
maintaining a notion of “coherence” when understanding written character influences later perception and
novel configurations. Coherence is related to our next prin- categorization (Freyd 1983; 1987).
ciple, causality, which is discussed in the section that To explain the role of causality in learning, conceptual
follows. representations have been likened to intuitive theories or
explanations, providing the glue that lets core features
stick, whereas other equally applicable features wash
4.2.2. Causality. In concept learning and scene under- away (Murphy & Medin 1985). Borrowing examples from
standing, causal models represent hypothetical real-world Murphy and Medin (1985), the feature “flammable” is
processes that produce the perceptual observations. In more closely attached to wood than money because of
control and reinforcement learning, causal models the underlying causal roles of the concepts, even though

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 15
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people
the feature is equally applicable to both. These causal roles DRAW-style networks will continue to be extended and
derive from the functions of objects. Causality can also glue enriched, and could be made to pass these tests.
some features together by relating them to a deeper under- Incorporating causality may greatly improve these deep
lying cause, explaining why some features such as “can fly,” learning models; they were trained without access to
“has wings,” and “has feathers” co-occur across objects, causal data about how characters are actually produced,
whereas others do not. and without any incentive to learn the true causal
Beyond concept learning, people also understand scenes process. An attentional window is only a crude approxima-
by building causal models. Human-level scene understanding tion of the true causal process of drawing with a pen, and in
involves composing a story that explains the perceptual obser- Rezende et al. (2016) the attentional window is not pen-like
vations, drawing upon and integrating the ingredients of at all, although a more accurate pen model could be incor-
intuitive physics, intuitive psychology, and compositionality. porated. We anticipate that these sequential generative
Perception without these ingredients, and absent the causal neural networks could make sharper one-shot inferences,
glue that binds them, can lead to revealing errors. Consider with the goal of tackling the full Characters Challenge by
image captions generated by a deep neural network incorporating additional causal, compositional, and hierar-
(Fig. 6) (Karpathy & Fei-Fei 2017). In many cases, the chical structure (and by continuing to use learning-to-
network gets the key objects in a scene correct, but fails to learn, described next), potentially leading to a more compu-
understand the physical forces at work, the mental states of tationally efficient and neurally grounded variant of the
the people, or the causal relationships between the objects. BPL model of handwritten characters (Fig. 5).
In other words, it does not build the right causal model of A causal model of Frostbite would have to be more
the data. complex, gluing together object representations and
There have been steps toward deep neural networks and explaining their interactions with intuitive physics and intu-
related approaches that learn causal models. Lopez-Paz itive psychology, much like the game engine that generates
et al. (2015) introduced a discriminative, data-driven the game dynamics and, ultimately, the frames of pixel
framework for distinguishing the direction of causality images. Inference is the process of inverting this causal
from examples. Although it outperforms existing methods generative model, explaining the raw pixels as objects and
on various causal prediction tasks, it is unclear how to their interactions, such as the agent stepping on an ice
apply the approach to inferring rich hierarchies of latent floe to deactivate it or a crab pushing the agent into the
causal variables, as needed for the Frostbite Challenge water (Fig. 2). Deep neural networks could play a role in
and especially the Characters Challenge. Graves (2014) two ways: by serving as a bottom-up proposer to make
learned a generative model of cursive handwriting using a probabilistic inference more tractable in a structured gen-
recurrent neural network trained on handwriting data. erative model (sect. 4.3.1) or by serving as the causal gen-
Although it synthesizes impressive examples of handwriting erative model if imbued with the right set of ingredients.
in various styles, it requires a large training corpus and has
not been applied to other tasks. The DRAW network per- 4.2.3. Learning-to-learn. When humans or machines make
forms both recognition and generation of handwritten inferences that go far beyond the data, strong prior knowl-
digits using recurrent neural networks with a window of edge (or inductive biases or constraints) must be making up
attention, producing a limited circular area of the image the difference (Geman et al. 1992; Griffiths et al. 2010;
at each time step (Gregor et al. 2015). A more recent Tenenbaum et al. 2011). One way people acquire this
variant of DRAW was applied to generating examples of prior knowledge is through “learning-to-learn,” a term
a novel character from just a single training example introduced by Harlow (1949) and closely related to the
(Rezende et al. 2016). The model demonstrates an impres- machine learning notions of “transfer learning,” “multitask
sive ability to make plausible generalizations that go beyond learning,” and “representation learning.” These terms refer
the training examples, yet it generalizes too broadly in other to ways that learning a new task or a new concept can be
cases, in ways that are not especially human-like. It is not accelerated through previous or parallel learning of other
clear that it could yet pass any of the “visual Turing tests” related tasks or other related concepts. The strong priors,
in Lake et al. (2015a) (Fig. 5B), although we hope constraints, or inductive bias needed to learn a particular

Figure 6. Perceiving scenes without intuitive physics, intuitive psychology, compositionality, and causality. Image captions are
generated by a deep neural network (Karpathy & Fei-Fei 2017) using code from github.com/karpathy/neuraltalk2. Image credits:
Gabriel Villena Fernández (left), TVBS Taiwan/Agence France-Presse (middle), and AP Photo/Dave Martin (right). Similar examples
using images from Reuters news can be found at twitter.com/interesting_jpg.

16 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

task quickly are often shared to some extent with other learned primitive actions and larger generative pieces can
related tasks. A range of mechanisms have been developed be re-used and re-combined to define new generative
to adapt the learner’s inductive bias as they learn specific models for new characters (Fig. 5A). Further transfer
tasks and then apply these inductive biases to new tasks. occurs by learning about the typical levels of variability
In hierarchical Bayesian modeling (Gelman et al. 2004), within a typical generative model. This provides knowledge
a general prior on concepts is shared by multiple specific about how far and in what ways to generalize when we have
concepts, and the prior itself is learned over the course of seen only one example of a new character, which on its own
learning the specific concepts (Salakhutdinov et al. 2012; could not possibly carry any information about variance.
2013). These models have been used to explain the dynam- BPL could also benefit from deeper forms of learning-to-
ics of human learning-to-learn in many areas of cognition, learn than it currently does. Some of the important struc-
including word learning, causal learning, and learning intu- ture it exploits to generalize well is built in to the prior
itive theories of physical and social domains (Tenenbaum and not learned from the background pre-training,
et al. 2011). In machine vision, for deep convolutional net- whereas people might learn this knowledge, and ultimately,
works or other discriminative methods that form the core of a human-like machine learning system should as well.
recent recognition systems, learning-to-learn can occur Analogous learning-to-learn occurs for humans in learn-
through the sharing of features between the models ing many new object models, in vision and cognition: Con-
learned for old objects or old tasks and the models sider the novel two-wheeled vehicle in Figure 1B, where
learned for new objects or new tasks (Anselmi et al. 2016; learning-to-learn can operate through the transfer of pre-
Baxter 2000; Bottou 2014; Lopez-Paz et al. 2016; Rusu viously learned parts and relations (sub-concepts such as
et al. 2016; Salakhutdinov et al. 2011; Srivastava & Sala- wheels, motors, handle bars, attached, powered by) that
khutdinov, 2013; Torralba et al. 2007; Zeiler & Fergus reconfigure compositionally to create a model of the
2014). Neural networks can also learn-to-learn by optimiz- new concept. If deep neural networks could adopt
ing hyper-parameters, including the form of their weight similarly compositional, hierarchical, and causal represen-
update rule (Andrychowicz et al. 2016), over a set of tations, we expect they could benefit more from learning-
related tasks. to-learn.
Although transfer learning and multitask learning are In the Frostbite Challenge, and in video games more
already important themes across AI, and in deep learning generally, there is a similar interdependence between
in particular, they have not yet led to systems that learn the form of the representation and the effectiveness of
new tasks as rapidly and flexibly as humans do. Capturing learning-to-learn. People seem to transfer knowledge at
more human-like learning-to-learn dynamics in deep net- multiple levels, from low-level perception to high-level
works and other machine learning approaches could facili- strategy, exploiting compositionality at all levels. Most
tate much stronger transfer to new tasks and new problems. basically, they immediately parse the game environment
To gain the full benefit that humans get from learning-to- into objects, types of objects, and causal relations
learn, however, AI systems might first need to adopt the between them. People also understand that video games
more compositional (or more language-like, see sect. 5) like these have goals, which often involve approaching or
and causal forms of representations that we have argued avoiding objects based on their type. Whether the
for above. person is a child or a seasoned gamer, it seems obvious
We can see this potential in both of our challenge prob- that interacting with the birds and fish will change the
lems. In the Characters Challenge as presented in Lake game state in some way, either good or bad, because
et al. (2015a), all viable models use “pre-training” on video games typically yield costs or rewards for these
many character concepts in a background set of alphabets types of interactions (e.g., dying or points). These types
to tune the representations they use to learn new character of hypotheses can be quite specific and rely on prior
concepts in a test set of alphabets. But to perform well, knowledge: When the polar bear first appears and tracks
current neural network approaches require much more the agent’s location during advanced levels (Fig. 2D), an
pre-training than do people or our Bayesian program learn- attentive learner is sure to avoid it. Depending on the
ing approach. Humans typically learn only one or a few level, ice floes can be spaced far apart (Fig. 2A–C) or
alphabets, and even with related drawing experience, this close together (Fig. 2D), suggesting the agent may be
likely amounts to the equivalent of a few hundred charac- able to cross some gaps, but not others. In this way,
ter-like visual concepts at most. For BPL, pre-training general world knowledge and previous video games may
with characters in only five alphabets (for around 150 char- help inform exploration and generalization in new scenar-
acter types in total) is sufficient to perform human-level ios, helping people learn maximally from a single mistake
one-shot classification and generation of new examples. or avoid mistakes altogether.
With this level of pre-training, current neural networks Deep reinforcement learning systems for playing Atari
perform much worse on classification and have not even games have had some impressive successes in transfer
attempted generation; they are still far from solving the learning, but they still have not come close to learning to
Characters Challenge.8 play new games as quickly as humans can. For example,
We cannot be sure how people get to the knowledge they Parisotto et al. (2016) present the “actor-mimic” algorithm
have in this domain, but we do understand how this works that first learns 13 Atari games by watching an expert
in BPL, and we think people might be similar. BPL trans- network play and trying to mimic the expert network
fers readily to new concepts because it learns about object action selection and/or internal states (for about 4 million
parts, sub-parts, and relations, capturing learning about frames of experience each, or 18.5 hours per game). This
what each concept is like and what concepts are like in algorithm can then learn new games faster than a randomly
general. It is crucial that learning-to-learn occurs at multi- initialized DQN: Scores that might have taken 4 or 5
ple levels of the hierarchical generative process. Previously million frames of learning to reach might now be reached

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 17
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people
after 1 or 2 million frames of practice. But anecdotally, we consistency with the data and prior knowledge (Bonawitz
find that humans can still reach these scores with a few et al. 2014; Gershman et al. 2012; Ullman et al. 2012b;
minutes of practice, requiring far less experience than the Vul et al. 2014). Monte Carlo sampling has been invoked
DQNs. to explain behavioral phenomena ranging from children’s
In sum, the interaction between representation and pre- response variability (Bonawitz et al. 2014), to garden-path
vious experience may be key to building machines that effects in sentence processing (Levy et al. 2009) and
learn as fast as people. A deep learning system trained perceptual multistability (Gershman et al. 2012; Moreno-
on many video games may not, by itself, be enough to Bote et al. 2011). Moreover, we are beginning to under-
learn new games as quickly as people. Yet, if such a stand how such methods could be implemented in neural
system aims to learn compositionally structured causal circuits (Buesing et al. 2011; Huang & Rao 2014; Pecevski
models of each game – built on a foundation of intuitive et al. 2011).9
physics and psychology – it could transfer knowledge Although Monte Carlo methods are powerful and come
more efficiently and thereby learn new games much with asymptotic guarantees, it is challenging to make them
more quickly. work on complex problems like program induction and
theory learning. When the hypothesis space is vast, and
only a few hypotheses are consistent with the data, how
4.3. Thinking Fast can good models be discovered without exhaustive
The previous section focused on learning rich models from search? In at least some domains, people may not have
sparse data and proposed ingredients for achieving these an especially clever solution to this problem, instead grap-
human-like learning abilities. These cognitive abilities are pling with the full combinatorial complexity of theory learn-
even more striking when considering the speed of percep- ing (Ullman et al. 2012b). Discovering new theories can be
tion and thought: the amount of time required to under- slow and arduous, as testified by the long time scale of cog-
stand a scene, think a thought, or choose an action. In nitive development, and learning in a saltatory fashion
general, richer and more structured models require more (rather than through gradual adaptation) is characteristic
complex and slower inference algorithms, similar to how of aspects of human intelligence, including discovery and
complex models require more data, making the speed of insight during development (Schulz 2012b), problem-
perception and thought all the more remarkable. solving (Sternberg & Davidson 1995), and epoch-making
The combination of rich models with efficient inference discoveries in scientific research (Langley et al. 1987). Dis-
suggests another way psychology and neuroscience may covering new theories can also occur much more quickly. A
usefully inform AI. It also suggests an additional way to person learning the rules of Frostbite will probably
build on the successes of deep learning, where efficient undergo a loosely ordered sequence of “Aha!” moments:
inference and scalable learning are important strengths of He or she will learn that jumping on ice floes causes
the approach. This section discusses possible paths toward them to change color, that changing the color of ice floes
resolving the conflict between fast inference and structured causes an igloo to be constructed piece-by-piece, that
representations, including Helmholtz machine–style birds make him or her lose points, that fish make him or
approximate inference in generative models (Dayan et al. her gain points, that he or she can change the direction
1995; Hinton et al. 1995) and cooperation between of ice floes at the cost of one igloo piece, and so on.
model-free and model-based reinforcement learning These little fragments of a “Frostbite theory” are assembled
systems. to form a causal understanding of the game relatively
quickly, in what seems more like a guided process than
4.3.1. Approximate inference in structured models. Hier- arbitrary proposals in a Monte Carlo inference scheme.
arhical Bayesian models operating over probabilistic Similarly, as described in the Characters Challenge,
programs (Goodman et al. 2008; Lake et al. 2015a; Tenen- people can quickly infer motor programs to draw a new
baum et al. 2011) are equipped to deal with theory-like character in a similarly guided processes.
structures and rich causal representations of the world, For domains where program or theory learning occurs
yet there are formidable algorithmic challenges for efficient quickly, it is possible that people employ inductive biases
inference. Computing a probability distribution over an not only to evaluate hypotheses, but also to guide hypothe-
entire space of programs is usually intractable, and often sis selection. Schulz (2012b) has suggested that abstract
even finding a single high-probability program poses an structural properties of problems contain information
intractable search problem. In contrast, whereas represent- about the abstract forms of their solutions. Even without
ing intuitive theories and structured causal models is less knowing the answer to the question, “Where is the
natural in deep neural networks, recent progress has dem- deepest point in the Pacific Ocean?” one still knows that
onstrated the remarkable effectiveness of gradient-based the answer must be a location on a map. The answer “20
learning in high-dimensional parameter spaces. A complete inches” to the question, “What year was Lincoln born?”
account of learning and inference must explain how the can be invalidated a priori, even without knowing the
brain does so much with limited computational resources correct answer. In recent experiments, Tsividis et al.
(Gershman et al. 2015; Vul et al. 2014). (2015) found that children can use high-level abstract fea-
Popular algorithms for approximate inference in proba- tures of a domain to guide hypothesis selection, by reason-
bilistic machine learning have been proposed as psycholog- ing about distributional properties like the ratio of seeds to
ical models (see Griffiths et al. [2012] for a review). Most flowers, and dynamical properties like periodic or mono-
prominently, it has been proposed that humans can approx- tonic relationships between causes and effects (see also
imate Bayesian inference using Monte Carlo methods, Magid et al. 2015).
which stochastically sample the space of possible hypothe- How might efficient mappings from questions to a plau-
ses and evaluate these samples according to their sible subset of answers be learned? Recent work in AI,

18 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

spanning both deep learning and graphical models, has with routine application, possibly reflecting a shift from
attempted to tackle this challenge by “amortizing” probabi- model-based to model-free control. This shift may arise
listic inference computations into an efficient feed-forward from a rational arbitration between learning systems to
mapping (Eslami et al. 2014; Heess et al. 2013; Mnih & balance the trade-off between flexibility and speed (Daw
Gregor, 2014; Stuhlmüller et al. 2013). We can also think et al. 2005; Keramati et al. 2011).
of this as “learning to do inference,” which is independent Similarly to how probabilistic computations can be amor-
from the ideas of learning as model building discussed in tized for efficiency (see previous section), plans can be
the previous section. These feed-forward mappings can amortized into cached values by allowing the model-
be learned in various ways, for example, using paired gen- based system to simulate training data for the model-free
erative/recognition networks (Dayan et al. 1995; Hinton system (Sutton 1990). This process might occur offline
et al. 1995) and variational optimization (Gregor et al. (e.g., in dreaming or quiet wakefulness), suggesting a
2015; Mnih & Gregor 2014; Rezende et al. 2014), or form of consolidation in reinforcement learning (Gershman
nearest-neighbor density estimation (Kulkarni et al. et al. 2014). Consistent with the idea of cooperation
2015a; Stuhlmüller et al. 2013). One implication of amorti- between learning systems, a recent experiment demon-
zation is that solutions to different problems will become strated that model-based behavior becomes automatic
correlated because of the sharing of amortized computa- over the course of training (Economides et al. 2015).
tions. Some evidence for inferential correlations in Thus, a marriage of flexibility and efficiency might be
humans was reported by Gershman and Goodman achievable if we use the human reinforcement learning
(2014). This trend is an avenue of potential integration of systems as guidance.
deep learning models with probabilistic models and proba- Intrinsic motivation also plays an important role in
bilistic programming: Training neural networks to help human learning and behavior (Berlyne 1966; Harlow
perform probabilistic inference in a generative model or 1950; Ryan & Deci 2007). Although much of the previous
a probabilistic program (Eslami et al. 2016; Kulkarni discussion assumes the standard view of behavior as seeking
et al. 2015b; Yildirim et al. 2015). Another avenue for to maximize reward and minimize punishment, all exter-
potential integration is through differentiable program- nally provided rewards are reinterpreted according to the
ming (Dalrymple 2016), by ensuring that the program- “internal value” of the agent, which may depend on the
like hypotheses are differentiable and thus learnable via current goal and mental state. There may also be an intrin-
gradient descent – a possibility discussed in the concluding sic drive to reduce uncertainty and construct models of the
section (Section 6.1). environment (Edelman 2015; Schmidhuber 2015), closely
related to learning-to-learn and multitask learning. Deep
4.3.2. Model-based and model-free reinforcement reinforcement learning is only just starting to address
learning. The DQN introduced by Mnih et al. (2015) intrinsically motivated learning (Kulkarni et al. 2016;
used a simple form of model-free reinforcement learning Mohamed & Rezende 2015).
in a deep neural network that allows for fast selection of
actions. There is indeed substantial evidence that the
brain uses similar model-free learning algorithms in 5. Responses to common questions
simple associative learning or discrimination learning
tasks (see Niv 2009, for a review). In particular, the In disussing the arguments in this article with colleagues,
phasic firing of midbrain dopaminergic neurons is qualita- three lines of questioning or critiques have frequently
tively (Schultz et al. 1997) and quantitatively (Bayer & arisen. We think it is helpful to address these points
Glimcher 2005) consistent with the reward prediction directly, to maximize the potential for moving forward
error that drives updating of model-free value estimates. together.
Model-free learning is not, however, the whole story.
Considerable evidence suggests that the brain also has a
5.1. Comparing the learning speeds of humans and
model-based learning system, responsible for building a
neural networks on specific tasks is not meaningful,
“cognitive map” of the environment and using it to plan
because humans have extensive prior experience
action sequences for more complex tasks (Daw et al.
2005; Dolan & Dayan 2013). Model-based planning is an It may seem unfair to compare neural networks and
essential ingredient of human intelligence, enabling flexible humans on the amount of training experience required to
adaptation to new tasks and goals; it is where all of the rich perform a task, such as learning to play new Atari games
model-building abilities discussed in the previous sections or learning new handwritten characters, when humans
earn their value as guides to action. As we argued in our dis- have had extensive prior experience that these networks
cussion of Frostbite, one can design numerous variants of have not benefited from. People have had many hours
this simple video game that are identical except for the playing other games, and experience reading or writing
reward function; that is, governed by an identical environ- many other handwritten characters, not to mention experi-
ment model of state-action–dependent transitions. We ence in a variety of more loosely related tasks. If neural net-
conjecture that a competent Frostbite player can easily works were “pre-trained” on the same experience, the
shift behavior appropriately, with little or no additional argument goes, then they might generalize similarly to
learning, and it is hard to imagine a way of doing that humans when exposed to novel tasks.
other than having a model-based planning approach in This has been the rationale behind multitask learning or
which the environment model can be modularly combined transfer learning, a strategy with a long history that has
with arbitrary new reward functions and then deployed shown some promising results recently with deep networks
immediately for planning. One boundary condition on (e.g., Donahue et al. 2014; Luong et al. 2015; Parisotto
this flexibility is the fact that the skills become “habitized” et al. 2016). Furthermore, some deep learning advocates

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 19
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people
argue the human brain effectively benefits from even more dynamics of structure search may look much more like
experience through evolution. If deep learning researchers the slow random hill climbing of evolution than the
see themselves as trying to capture the equivalent of smooth, methodical progress of stochastic gradient
humans’ collective evolutionary experience, this would be descent. An alternative strategy is to build in appropriate
equivalent to a truly immense “pre-training” phase. infant-like knowledge representations and core ingredients
We agree that humans have a much richer starting point as the starting point for our learning-based AI systems, or to
than neural networks when learning most new tasks, includ- build learning systems with strong inductive biases that
ing learning a new concept or learning to play a new video guide them in this direction.
game. That is the point of the “developmental start-up soft- Regardless of which way an AI developer chooses to go,
ware” and other building blocks that we argued are key to our main points are orthogonal to this objection. There are
creating this richer starting point. We are less committed a set of core cognitive ingredients for human-like learning
to a particular story regarding the origins of the ingredients, and thought. Deep learning models could incorporate
including the relative roles of genetically programmed and these ingredients through some combination of additional
experience-driven developmental mechanisms in building structure and perhaps additional learning mechanisms,
these components in early infancy. Either way, we see but for the most part have yet to do so. Any approach to
them as fundamental building blocks for facilitating rapid human-like AI, whether based on deep learning or not, is
learning from sparse data. likely to gain from incorporating these ingredients.
Learning-to-learn across multiple tasks is conceivably
one route to acquiring these ingredients, but simply train-
5.2. Biological plausibility suggests theories of
ing conventional neural networks on many related tasks
intelligence should start with neural networks
may not be sufficient to generalize in human-like ways for
novel tasks. As we argued in Section 4.2.3, successful learn- We have focused on how cognitive science can motivate
ing-to-learn – or, at least, human-level transfer learning – is and guide efforts to engineer human-like AI, in contrast
enabled by having models with the right representational to some advocates of deep neural networks who cite neuro-
structure, including the other building blocks discussed in science for inspiration. Our approach is guided by a prag-
this article. Learning-to-learn is a powerful ingredient, matic view that the clearest path to a computational
but it can be more powerful when operating over composi- formalization of human intelligence comes from under-
tional representations that capture the underlying causal standing the “software” before the “hardware.” In the
structure of the environment, while also building on intui- case of this article, we proposed key ingredients of this soft-
tive physics and psychology. ware in previous sections.
Finally, we recognize that some researchers still hold out Nonetheless, a cognitive approach to intelligence should
hope that if only they can just get big enough training data not ignore what we know about the brain. Neuroscience
sets, sufficiently rich tasks, and enough computing power – can provide valuable inspirations for both cognitive
far beyond what has been tried out so far – then deep learn- models and AI researchers: The centrality of neural net-
ing methods might be sufficient to learn representations works and model-free reinforcement learning in our pro-
equivalent to what evolution and learning provide posals for “thinking fast” (sect. 4.3) are prime exemplars.
humans. We can sympathize with that hope, and believe Neuroscience can also, in principle, impose constraints on
it deserves further exploration, although we are not sure cognitive accounts, at both the cellular and systems levels.
it is a realistic one. We understand in principle how evolu- If deep learning embodies brain-like computational mech-
tion could build a brain with the cognitive ingredients we anisms and those mechanisms are incompatible with some
discuss here. Stochastic hill climbing is slow. It may cognitive theory, then this is an argument against that cog-
require massively parallel exploration, over millions of nitive theory and in favor of deep learning. Unfortunately,
years with innumerable dead ends, but it can build what we “know” about the brain is not all that clear-cut.
complex structures with complex functions if we are Many seemingly well-accepted ideas regarding neural com-
willing to wait long enough. In contrast, trying to build putation are in fact biologically dubious, or uncertain at
these representations from scratch using backpropagation, best, and therefore should not disqualify cognitive ingredi-
Deep Q-learning, or any stochastic gradient-descent weight ents that pose challenges for implementation within that
update rule in a fixed network architecture, may be unfea- approach.
sible regardless of how much training data are available. To For example, most neural networks use some form of
build these representations from scratch might require gradient-based (e.g., backpropagation) or Hebbian learn-
exploring fundamental structural variations in the net- ing. It has long been argued, however, that backpropaga-
work’s architecture, which gradient-based learning in tion is not biologically plausible. As Crick (1989) famously
weight space is not prepared to do. Although deep learning pointed out, backpropagation seems to require that infor-
researchers do explore many such architectural variations, mation be transmitted backward along the axon, which
and have been devising increasingly clever and powerful does not fit with realistic models of neuronal function
ones recently, it is the researchers who are driving and (although recent models circumvent this problem in
directing this process. Exploration and creative innovation various ways [Liao et al. 2015; Lillicrap et al. 2014; Scellier
in the space of network architectures have not yet been & Bengio 2016]). This has not prevented backpropagation
made algorithmic. Perhaps they could, using genetic pro- from being put to good use in connectionist models of cog-
gramming methods (Koza 1992) or other structure-search nition or in building deep neural networks for AI. Neural
algorithms (Yamins et al. 2014). We think this would be a network researchers must regard it as a very good thing,
fascinating and promising direction to explore, but we in this case, that concerns of biological plausibility did not
may have to acquire more patience than machine-learning hold back research on this particular algorithmic approach
researchers typically express with their algorithms: the to learning.10 We strongly agree: Although neuroscientists

20 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

have not found any mechanisms for implementing backpro- causal models that we focus on. These capacities are in
pagation in the brain, neither have they produced definitive place before children master language, and they provide
evidence against it. The existing data simply offer little con- the building blocks for linguistic meaning and language
straint either way, and backpropagation has been of obvi- acquisition (Carey 2009; Jackendoff 2003; Kemp 2007;
ously great value in engineering today’s best pattern O’Donnell 2015; Pinker 2007; Xu & Tenenbaum 2007).
recognition systems. We hope that by better understanding these earlier ingre-
Hebbian learning is another case in point. In the form of dients and how to implement and integrate them computa-
long-term potentiation (LTP) and spike-timing dependent tionally, we will be better positioned to understand
plasticity (STDP), Hebbian learning mechanisms are linguistic meaning and acquisition in computational terms
often cited as biologically supported (Bi & Poo 2001). and to explore other ingredients that make human language
However, the cognitive significance of any biologically possible.
grounded form of Hebbian learning is unclear. Gallistel What else might we need to add to these core ingredi-
and Matzel (2013) have persuasively argued that the critical ents to get language? Many researchers have speculated
interstimulus interval for LTP is orders of magnitude about key features of human cognition that give rise to lan-
smaller than the intervals that are behaviorally relevant in guage and other uniquely human modes of thought: Is it
most forms of learning. In fact, experiments that simultane- recursion, or some new kind of recursive structure building
ously manipulate the interstimulus and intertrial intervals ability (Berwick & Chomsky 2016; Hauser et al. 2002)? Is it
demonstrate that no critical interval exists. Behavior can the ability to re-use symbols by name (Deacon 1998)? Is it
persist for weeks or months, whereas LTP decays to base- the ability to understand others intentionally and build
line over the course of days (Power et al. 1997). Learned shared intentionality (Bloom 2000; Frank et al. 2009; Tom-
behavior is rapidly re-acquired after extinction (Bouton asello 2010)? Is it some new version of these things, or is it
2004), whereas no such facilitation is observed for LTP just more of the aspects of these capacities that are already
(Jonge & Racine 1985). Most relevantly for our focus, it present in infants? These are important questions for
would be especially challenging to try to implement the future work with the potential to expand the list of key
ingredients described in this article using purely Hebbian ingredients; we did not intend our list to be complete.
mechanisms. Finally, we should keep in mind all of the ways that
Claims of biological plausibility or implausibility usually acquiring language extends and enriches the ingredients
rest on rather stylized assumptions about the brain that of cognition that we focus on in this article. The intuitive
are wrong in many of their details. Moreover, these physics and psychology of infants are likely limited to rea-
claims usually pertain to the cellular and synaptic levels, soning about objects and agents in their immediate
with few connections made to systems-level neuroscience spatial and temporal vicinity and to their simplest proper-
and subcortical brain organization (Edelman 2015). Under- ties and states. But with language, older children become
standing which details matter and which do not requires a able to reason about a much wider range of physical and
computational theory (Marr 1982). Moreover, in the psychological situations (Carey 2009). Language also facil-
absence of strong constraints from neuroscience, we can itates more powerful learning-to-learn and compositional-
turn the biological argument around: Perhaps a hypotheti- ity (Mikolov et al. 2016), allowing people to learn more
cal biological mechanism should be viewed with skepticism quickly and flexibly by representing new concepts and
if it is cognitively implausible. In the long run, we are opti- thoughts in relation to existing concepts (Lupyan &
mistic that neuroscience will eventually place more con- Bergen 2016; Lupyan & Clark 2015). Ultimately, the full
straints on theories of intelligence. For now, we believe project of building machines that learn and think like
cognitive plausibility offers a surer foundation. humans must have language at its core.

5.3. Language is essential for human intelligence. Why is


it not more prominent here? 6. Looking forward

We have said little in this article about people’s ability to In the last few decades, AI and machine learning have
communicate and think in natural language, a distinctively made remarkable progress: Computer programs beat
human cognitive capacity where machine capabilities strik- chess masters; AI systems beat Jeopardy champions; apps
ingly lag. Certainly one could argue that language should be recognize photos of your friends; machines rival humans
included on any short list of key ingredients in human intel- on large-scale object recognition; smart phones recognize
ligence: For example, Mikolov et al. (2016) featured lan- (and, to a limited extent, understand) speech. The
guage prominently in their recent paper sketching coming years promise still more exciting AI applications,
challenge problems and a road map for AI. Moreover, in areas as varied as self-driving cars, medicine, genetics,
whereas natural language processing is an active area of drug design, and robotics. As a field, AI should be proud
research in deep learning (e.g., Bahdanau et al. 2015; of these accomplishments, which have helped move
Mikolov et al. 2013; Xu et al. 2015), it is widely recognized research from academic journals into systems that
that neural networks are far from implementing human improve our daily lives.
language abilities. The question is, how do we develop We should also be mindful of what AI has and has not
machines with a richer capacity for language? achieved. Although the pace of progress has been impres-
We believe that understanding language and its role in sive, natural intelligence is still by far the best example of
intelligence goes hand-in-hand with understanding the intelligence. Machine performance may rival or exceed
building blocks discussed in this article. It is also true that human performance on particular tasks, and algorithms
language builds on the core abilities for intuitive physics, may take inspiration from neuroscience or aspects of psy-
intuitive psychology, and rapid learning with compositional, chology, but it does not follow that the algorithm learns

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 21
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people
or thinks like a person. This is a higher bar worth reaching networks, such as a neural version of the Bayesian
for, potentially leading to more powerful algorithms, while program learning model that could be applied to tackling
also helping unlock the mysteries of the human mind. the Characters Challenge (sect. 3.1).
When comparing people with the current best algo- Researchers are also developing neural networks with
rithms in AI and machine learning, people learn from “working memories” that augment the shorter-term
fewer data and generalize in richer and more flexible memory provided by unit activation and the longer-term
ways. Even for relatively simple concepts such as handwrit- memory provided by the connection weights (Graves et al.
ten characters, people need to see just one or a few exam- 2014; 2016; Grefenstette et al. 2015; Reed & Freitas 2016;
ples of a new concept before being able to recognize new Sukhbaatar et al. 2015; Weston et al. 2015b). These develop-
examples, generate new examples, and generate new con- ments are also part of a broader trend toward “differentiable
cepts based on related ones (Fig. 1A). So far, these abilities programming,” the incorporation of classic data structures,
elude even the best deep neural networks for character rec- such as random access memory, stacks, and queues, into gra-
ognition (Ciresan et al. 2012), which are trained on many dient-based learning systems (Dalrymple 2016). For
examples of each concept and do not flexibly generalize example, the neural Turing machine (NTM) (Graves et al.
to new tasks. We suggest that the comparative power and 2014) and its successor the differentiable neural computer
flexibility of people’s inferences come from the causal (DNC) (Graves et al. 2016) are neural networks augmented
and compositional nature of their representations. with a random access external memory with read and write
We believe that deep learning and other learning para- operations that maintain end-to-end differentiability. The
digms can move closer to human-like learning and NTM has been trained to perform sequence-to-sequence
thought if they incorporate psychological ingredients, prediction tasks such as sequence copying and sorting, and
including those outlined in this article. Before closing, we the DNC has been applied to solving block puzzles and
discuss some recent trends that we see as some of the finding paths between nodes in a graph after memorizing
most promising developments in deep learning – trends the graph. Additionally, neural programmer-interpreters
we hope will continue and lead to more important learn to represent and execute algorithms such as addition
advances. and sorting from fewer examples, by observing input-
output pairs (like the NTM and DNC), as well as execution
traces (Reed & Freitas 2016). Each model seems to learn
6.1. Promising directions in deep learning
genuine programs from examples, albeit in a representation
There has been recent interest in integrating psychological more like assembly language than a high-level programming
ingredients with deep neural networks, especially selective language.
attention (Bahdanau et al. 2015; Mnih et al. 2014; Xu et al. Although this new generation of neural networks has yet to
2015), augmented working memory (Graves et al. 2014; tackle the types of challenge problems introduced in this
2016; Grefenstette et al. 2015; Sukhbaatar et al. 2015; article, differentiable programming suggests the intriguing pos-
Weston et al. 2015b), and experience replay (McClelland sibility of combining the best of program induction and deep
et al. 1995; Mnih et al. 2015). These ingredients are learning. The types of structured representations and model
lower-level than the key cognitive ingredients discussed building ingredients discussed in this article – objects,
in this article. yet they suggest a promising trend of using forces, agents, causality, and compositionality – help
insights from cognitive psychology to improve deep learn- explain important facets of human learning and thinking,
ing, one that may be even furthered by incorporating yet they also bring challenges for performing efficient
higher-level cognitive ingredients. inference (sect. 4.3.1). Deep learning systems have not yet
Paralleling the human perceptual apparatus, selective shown they can work with these representations, but they
attention forces deep learning models to process raw, per- have demonstrated the surprising effectiveness of gradient
ceptual data as a series of high-resolution “foveal glimpses” descent in large models with high-dimensional parameter
rather than all at once. Somewhat surprisingly, the incorpo- spaces. A synthesis of these approaches, able to perform effi-
ration of attention has led to substantial performance gains cient inference over programs that richly model the causal
in a variety of domains, including in machine translation structure an infant sees in the world, would be a major
(Bahdanau et al. 2015), object recognition (Mnih et al. step forward in building human-like AI.
2014), and image caption generation (Xu et al. 2015). Another example of combining pattern recognition and
Attention may help these models in several ways. It helps model-based search comes from recent AI research into
to coordinate complex, often sequential, outputs by attend- the game Go. Go is considerably more difficult for AI
ing to only specific aspects of the input, allowing the model than chess, and it was only recently that a computer
to focus on smaller sub-tasks rather than solving an entire program – AlphaGo – first beat a world-class player
problem in one shot. For example, during caption genera- (Chouard 2016) by using a combination of deep convolu-
tion, the attentional window has been shown to track the tional neural networks (ConvNets) and Monte-Carlo Tree
objects as they are mentioned in the caption, where the Search (Silver et al. 2016). Each of these components has
network may focus on a boy and then a Frisbee when pro- made gains against artificial and real Go players (Gelly &
ducing a caption like, “A boy throws a Frisbee” (Xu et al. Silver 2008; 2011; Silver et al. 2016; Tian & Zhu 2016),
2015). Attention also allows larger models to be trained and the notion of combining pattern recognition and
without requiring every model parameter to affect every model-based search goes back decades in Go and other
output or action. In generative neural network models, games. Showing that these approaches can be integrated
attention has been used to concentrate on generating par- to beat a human Go champion is an important AI accom-
ticular regions of the image rather than the whole image at plishment (see Fig. 7). Just as important, however, are
once (Gregor et al. 2015). This could be a stepping stone the new questions and directions they open up for the
toward building more causal generative models in neural long-term project of building genuinely human-like AI.

22 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people

Figure 7. An AI system for playing Go, combining a deep convolutional network (ConvNet) and model-based search through Monte-
Carlo Tree Search (MCTS). (A) The ConvNet on its own can be used to predict the next k moves given the current board. (B) A search
tree with the current board state as its root and the current “win/total” statistics at each node. A new MCTS rollout selects moves along the
tree according to the MCTS policy (red arrows) until it reaches a new leaf (red circle), where the next move is chosen by the ConvNet.
From there, play proceeds until the game’s end according to a pre-defined default policy based on the Pachi program (Baudiš & Gailly
2012), itself based on MCTS. (C) The end-game result of the new leaf is used to update the search tree. Adapted from Tian and Zhu
(2016) with permission.

One worthy goal would be to build an AI system that page “Go variants” describes versions such as playing
beats a world-class player with the amount and kind of on bigger or smaller board sizes (ranging from 9 × 9 to
training human champions receive, rather than overpower- 38 × 38, not just the usual 19 × 19 board), or playing on
ing them with Google-scale computational resources. boards of different shapes and connectivity structures (rect-
AlphaGo is initially trained on 28.4 million positions and angles, triangles, hexagons, even a map of the English city
moves from 160,000 unique games played by human Milton Keynes). The board can be a torus, a mobius strip, a
experts; it then improves through reinforcement learning, cube, or a diamond lattice in three dimensions. Holes can
playing 30 million more games against itself. Between the be cut in the board, in regular or irregular ways. The
publication of Silver et al. (2016) and facing world cham- rules can be adapted to what is known as First Capture
pion Lee Sedol, AlphaGo was iteratively retrained several Go (the first player to capture a stone wins), NoGo (the
times in this way. The basic system always learned from player who avoids capturing any enemy stones longer
30 million games, but it played against successively stronger wins), or Time Is Money Go (players begin with a fixed
versions of itself, effectively learning from 100 million or amount of time and at the end of the game, the number
more games altogether (D. Silver, personal communica- of seconds remaining on each player’s clock is added to
tion, 2017). In contrast, Lee has probably played around his or her score). Players may receive bonuses for creating
50,000 games in his entire life. Looking at numbers like certain stone patterns or capturing territory near certain
these, it is impressive that Lee can even compete with landmarks. There could be four or more players, competing
AlphaGo. What would it take to build a professional-level individually or in teams. In each of these variants, effective
Go AI that learns from only 50,000 games? Perhaps a play needs to change from the basic game, but a skilled
system that combines the advances of AlphaGo with player can adapt, and does not simply have to relearn the
some of the complementary ingredients for intelligence game from scratch. Could AlphaGo quickly adapt to new
we argue for here would be a route to that end. variants of Go? Although techniques for handling vari-
Artificial intelligence could also gain much by trying to able-sized inputs in ConvNets may help in playing on dif-
match the learning speed and flexibility of normal human ferent board sizes (Sermanet et al. 2014), the value
Go players. People take a long time to master the game functions and policies that AlphaGo learns seem unlikely
of Go, but as with the Frostbite and Characters challenges to generalize as flexibly and automatically as people.
(sects. 3.1 and 3.2), humans can quickly learn the basics of Many of the variants described above would require signifi-
the game through a combination of explicit instruction, cant reprogramming and retraining, directed by the smart
watching others, and experience. Playing just a few games humans who programmed AlphaGo, not the system itself.
teaches a human enough to beat someone who has just As impressive as AlphaGo is in beating the world’s best
learned the rules but never played before. Could players at the standard game – and it is extremely impres-
AlphaGo model these earliest stages of real human learning sive – the fact that it cannot even conceive of these variants,
curves? Human Go players can also adapt what they have let alone adapt to them autonomously, is a sign that it does
learned to innumerable game variants. The Wikipedia not understand the game as humans do. Human players can

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 23
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Lake et al.: Building machines that learn and think like people
understand these variants and adapt to them because they hammering a nail into a wall, or using a saw horse to
explicitly represent Go as a game, with a goal to beat an support a beam being cut by a saw). If the scene includes
adversary who is playing to achieve the same goal he or people acting or interacting, it will be nearly impossible
she is, governed by rules about how stones can be placed to understand their actions without thinking about their
on a board and how board positions are scored. Humans thoughts and especially their goals and intentions toward
represent their strategies as a response to these constraints, the other objects and agents they believe are present.
such that if the game changes, they can begin to adjust their 2. Autonomous agents and intelligent devices. Robots
strategies accordingly. and personal assistants such as cell phones cannot be pre-
In sum, Go presents compelling challenges for AI beyond trained on all possible concepts they may encounter. Like
matching world-class human performance, in trying to a child learning the meaning of new words, an intelligent
match human levels of understanding and generalization, and adaptive system should be able to learn new concepts
based on the same kinds and amounts of data, explicit from a small number of examples, as they are encountered
instructions, and opportunities for social learning afforded naturally in the environment. Common concept types
to people. In learning to play Go as quickly and as flexibly include new spoken words (names like “Ban Ki-Moon”
as they do, people are drawing on most of the cognitive and “Kofi Annan”), new gestures (a secret handshake and
ingredients this article has laid out. They are learning-to- a “fist bump”), and new activities, and a human-like
learn with compositional knowledge. They are using their system would be able to learn both to recognize and to
core intuitive psychology and aspects of their intuitive produce new instances from a small number of examples.
physics (spatial and object representations). And like As with handwritten characters, a system may be able to
AlphaGo, they are also integrating model-free pattern rec- quickly learn new concepts by constructing them from
ognition with model-based search. We believe that Go AI pre-existing primitive actions, informed by knowledge of
systems could be built to do all of these things, potentially the underlying causal process and learning-to-learn.
better capturing how humans learn and understand the 3. Autonomous driving. Perfect autonomous driving
game. We believe it would be richly rewarding for AI and requires intuitive psychology. Beyond detecting and avoid-
cognitive science to pursue this challenge together and ing pedestrians, autonomous cars could more accurately
that such systems could be a compelling testbed for the predict pedestrian behavior by inferring mental states,
principles this article suggests, as well as building on all of including their beliefs (e.g., Do they think it is safe to
the progress to date that AlphaGo represents. cross the street? Are they paying attention?) and desires
(e.g., Where do they want to go? Do they want to cross?
Are they retrieving a ball lost in the street?). Similarly,
6.2. Future applications to practical AI problems other drivers on the road have similarly complex mental
In this article, we suggested some ingredients for building states underlying their behavior (e.g., Does he or she
computational models with more human-like learning and want to change lanes? Pass another car? Is he or she swerv-
thought. These principles were explained in the context ing to avoid a hidden hazard? Is he or she distracted?). This
of the Characters and Frostbite Challenges, with special type of psychological reasoning, along with other types of
emphasis on reducing the amount of training data required model-based causal and physical reasoning, are likely to be
and facilitating transfer to novel yet related tasks. We also especially valuable in challenging and novel driving circum-
see ways these ingredients can spur progress on core AI stances for which there are few relevant training data (e.g.,
problems with practical applications. Here we offer some navigating unusual construction zones, natural disasters).
speculative thoughts on these applications. 4. Creative design. Creativity is often thought to be a
pinnacle of human intelligence. Chefs design new dishes,
1. Scene understanding. Deep learning is moving musicians write new songs, architects design new buildings,
beyond object recognition and toward scene understand- and entrepreneurs start new businesses. Although we are
ing, as evidenced by a flurry of recent work focused on gen- still far from developing AI systems that can tackle these
erating natural language captions for images (Karpathy & types of tasks, we see compositionality and causality as
Fei-Fei 2017; Vinyals et al. 2014; Xu et al. 2015). Yet central to this goal. Many commonplace acts of creativity
current algorithms are still better at recognizing objects are combinatorial, meaning they are unexpected combina-
than understanding scenes, often getting the key objects tions of familiar concepts or ideas (Boden 1998; Ward
right but their causal relationships wrong (Fig. 6). We see 1994). As illustrated in Figure 1-iv, novel vehicles can be
compositionality, causality, intuitive physics, and intuitive created as a combination of parts from existing vehicles,
psychology as playing an increasingly important role in and similarly, novel characters can be constructed from
reaching true scene understanding. For example, picture the parts of stylistically similar characters, or familiar char-
a cluttered garage workshop with screw drivers and acters can be re-conceptualized in novel styles (Rehling
hammers hanging from the wall, wood pieces and tools 2001). In each case, the free combination of parts is not
stacked precariously on a work desk, and shelving and enough on its own: Although compositionality and learn-
boxes framing the scene. For an autonomous agent to ing-to-learn can provide the parts for new ideas, causality
effectively navigate and perform tasks in this environment, provides the glue that gives them coherence and purpose.
the agent would need intuitive physics to properly reason
about stability and support. A holistic model of the scene
would require the composition of individual object
6.3. Toward more human-like learning and thinking
models, glued together by relations. Finally, causality
machines
helps infuse the recognition of existing tools or the learning
of new ones with an understanding of their use, helping to Since the birth of AI in the 1950s, people have wanted to
connect different object models in the proper way (e.g., build machines that learn and think like people. We hope

24 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people

researchers in AI, machine learning, and cognitive science are referring here to scenarios, which require inferring goals, utili-
will accept our challenge problems as a testbed for pro- ties, and relations.
gress. Rather than just building systems that recognize 6. We must be careful here about what “simple” means. An
handwritten characters and play Frostbite or Go as the inductive bias may appear simple in the sense that we can com-
pactly describe it, but it may require complex computation (e.g.,
end result of an asymptotic process, we suggest that deep
motion analysis, parsing images into objects, etc.) just to
learning and other computational paradigms should aim produce its inputs in a suitable form.
to tackle these tasks using as few training data as people 7. A new approach using convolutional “matching networks”
need, and also to evaluate models on a range of human- achieves good one-shot classification performance when discrim-
like generalizations beyond the one task on which the inating between characters from different alphabets (Vinyals
model was trained. We hope that the ingredients outlined et al. 2016). It has not yet been directly compared with BPL,
in this article will prove useful for working toward this which was evaluated on one-shot classification with characters
goal: seeing objects and agents rather than features, build- from the same alphabet.
ing causal models and not just recognizing patterns, recom- 8. Deep convolutional neural network classifiers have error
bining representations without needing to retrain, and rates approximately five times higher than those of humans
when pre-trained with five alphabets (23% versus 4% error),
learning-to-learn rather than starting from scratch.
and two to three times higher when pre-training on six times as
much data (30 alphabets) (Lake et al. 2015a). The current need
ACKNOWLEDGMENTS for extensive pre-training is illustrated for deep generative
We are grateful to Peter Battaglia, Matt Botvinick, Y-Lan Boureau, models by Rezende et al. (2016), who present extensions of the
Shimon Edelman, Nando de Freitas, Anatole Gershman, George DRAW architecture capable of one-shot learning.
Kachergis, Leslie Kaelbling, Andrej Karpathy, George Konidaris, 9. In the interest of brevity, we do not discuss here another
Tejas Kulkarni, Tammy Kwan, Michael Littman, Gary Marcus, important vein of work linking neural circuits to variational
Kevin Murphy, Steven Pinker, Pat Shafto, David Sontag, Pedro approximations (Bastos et al. 2012), which have received less
Tsividis, and four anonymous reviewers for helpful comments on attention in the psychological literature.
early versions of this article. Tom Schaul and Matteo Hessel were 10. Michael Jordan made this point forcefully in his 2015
very helpful in answering questions regarding the DQN learning speech accepting the Rumelhart Prize.
curves and Frostbite scoring. This work was supported by The
Center for Minds, Brains and Machines (CBMM), under
National Science Foundation (NSF) Science and Technology
Centers (NTS) Award CCF-1231216, and the Moore–Sloan Data
Science Environment at New York University.
Open Peer Commentary
NOTES
1. In their influential textbook, Russell and Norvig (2003) state
that “The quest for ‘artificial flight’ succeeded when the Wright
brothers and others stopped imitating birds and started using The architecture challenge: Future artificial-
wind tunnels and learning about aerodynamics” (p. 3).
2. The time required to train the DQN (compute time) is not intelligence systems will require sophisticated
the same as the game (experience) time. architectures, and knowledge of the brain
3. The Atari games are deterministic, raising the possibility that might guide their construction
a learner can succeed by memorizing long sequences of actions
without learning to generalize (van Hasselt et al. 2016). A doi:10.1017/S0140525X17000036, e254
recent article shows that one can outperform DQNs early in learn-
ing (and make non-trivial generalizations) with an “episodic con- Gianluca Baldassarre, Vieri Giuliano Santucci, Emilio Cartoni,
troller” that chooses actions based on memory and simple and Daniele Caligiore
interpolation (Blundell et al. 2016). Although it is unclear if the Laboratory of Computational Embodied Neuroscience, Institute of Cognitive
DQN also memorizes action sequences, an alternative “human Sciences and Technologies, National Research Council of Italy, Rome, Italy.
starts” metric provides a stronger test of generalization (van [email protected] [email protected]
Hasselt et al. 2016), evaluating the algorithms on a wider variety [email protected] [email protected]
of start states and levels that are sampled from human play. It http://www.istc.cnr.it/people/
would be preferable to compare people and algorithms on the http://www.istc.cnr.it/people/gianluca-baldassarre
human starts metric, but most learning curves to date have only http://www.istc.cnr.it/people/vieri-giuliano-santucci
been reported using standard test performance, which starts the http://www.istc.cnr.it/people/emilio-cartoni
game from the beginning with some added jitter. http://www.istc.cnr.it/people/daniele-caligiore
4. More precisely, the human expert in Mnih et al. (2015)
scored an average of 4335 points across 30 game sessions of up Abstract: In this commentary, we highlight a crucial challenge posed by
the proposal of Lake et al. to introduce key elements of human
to 5 minutes of play. In individual sessions lasting no longer
cognition into deep neural networks and future artificial-intelligence
than 5 minutes, author TDU obtained scores of 3520 points systems: the need to design effective sophisticated architectures. We
after approximately 5 minutes of gameplay, 3510 points after 10 propose that looking at the brain is an important means of facing this
minutes, and 7810 points after 15 minutes. Author JBT obtained great challenge.
4060 after approximately 5 minutes of gameplay, 4920 after 10 to
15 minutes, and 6710 after no more than 20 minutes. TDU and We agree with the claim of Lake et al. that to obtain human-level
JBT each watched approximately 2 minutes of expert play on learning speed and cognitive flexibility, future artificial-intelli-
YouTube (e.g., https://www.youtube.com/watch?v=ZpUFztf9Fjc, gence (AI) systems will have to incorporate key elements of
but there are many similar examples that can be found in a human cognition: from causal models of the world, to intuitive
YouTube search). psychological theories, compositionality, and knowledge transfer.
5. Although connectionist networks have been used to model However, the authors largely overlook the importance of a
the general transition that children undergo between the ages of major challenge to implementation of the functions they advocate:
3 and 4 regarding false belief (e.g., Berthiaume et al. 2013), we the need to develop sophisticated architectures to learn,

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 25
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
represent, and process the knowledge related to those functions. involving basal ganglia–cortical loops learns and implements stim-
Here we call this the architecture challenge. In this commentary, ulus–response habitual behaviours (used to act in familiar situa-
we make two claims: (1) tackling the architecture challenge is fun- tions) and goal-directed behaviours (important for problem
damental to success in developing human-level AI systems; (2) solving and planning when new challenges are encountered) (Bal-
looking at the brain can furnish important insights on how to dassarre et al. 2013b; Mannella et al. 2013). These brain structures
face the architecture challenge. form a sophisticated network, knowledge of which might help in
The difficulty of the architecture challenge stems from the fact designing the architectures of human-like embodied AI systems
that the space of the architectures needed to implement the able to act in the real world.
several functions advocated by Lake et al. is huge. The authors A last example of the need for sophisticated architectures starts
get close to this problem when they recognize that one thing with the recognition by Lake et al. that we need to endow AI
that the enormous genetic algorithm of evolution has done in mil- systems with a “developmental start-up software.” In this respect,
lions of years of the stochastic hill-climbing search is to develop together with other authors (e.g., Weng et al. 2001; see Baldassarre
suitable brain architectures. One possible way to attack the archi- et al. 2013b; 2014, for collections of works) we believe that human-
tecture challenge, also mentioned by Lake et al., would be to use level intelligence can be achieved only through open-ended learn-
evolutionary techniques mimicking evolution. We think that today ing, that is, the cumulative learning of progressively more
this strategy is out of reach, given the “ocean-like” size of the complex skills and knowledge, driven by intrinsic motivations,
search space. At most, we can use such techniques to explore which are motivations related to the acquisition of knowledge and
small, interesting “islands lost within the ocean.” But how do we skills rather than material resources (Baldassarre 2011). The brain
find those islands in the first place? We propose looking at the (e.g., Lisman & Grace 2005; Redgrave & Gurney 2006) and com-
architecture of real brains, the product of the evolution genetic putational theories and models (e.g., Baldassarre & Mirolli 2013;
algorithm, and try to “steal insights” from nature. Indeed, we Baldassarre et al. 2014; Santucci et al. 2016) indicate how the
think that much of the intelligence of the brain resides in its archi- implementation of these processes indeed requires very sophisti-
tecture. Obviously, identifying the proper insights is not easy to do, cated architectures able to store multiple skills, to transfer knowl-
as the brain is very difficult to understand. However, it might be edge while avoiding catastrophic interference, to explore the
useful to try, as the effort might give us at least some general indi- environment based on the acquired skills, to self-generate goals/
cations, a compass, to find the islands in the ocean. Here we tasks, and to focus on goals that ensure a maximum knowledge gain.
present some examples to support our intuition.
When building architectures of AI systems, even when following
cognitive science indications (e.g., Franklin 2007), the tendency is
to “divide and conquer,” that is, to list the needed high-level func- Building machines that learn and think for
tions, implement a module for each of them, and suitably interface themselves
the modules. However, the organisation of the brain can be under-
stood on the basis of not only high-level functions (see below), but doi:10.1017/S0140525X17000048, e255
also “low-level” functions (usually called “mechanisms”). An
example of a mechanism is brain organisation based on macro- Matthew Botvinick, David G. T. Barrett, Peter Battaglia,
structures, each having fine repeated micro-architectures imple- Nando de Freitas, Darshan Kumaran, Joel Z Leibo,
menting specific computations and learning processes (Caligiore Timothy Lillicrap, Joseph Modayil, Shakir Mohamed,
et al. 2016; Doya 1999): the cortex to statically and dynamically Neil C. Rabinowitz, Danilo J. Rezende, Adam Santoro,
store knowledge acquired by associative learning processes Tom Schaul, Christopher Summerfield, Greg Wayne,
(Penhune & Steele 2012; Shadmehr & Krakauer 2008), the basal Theophane Weber, Daan Wierstra, Shane Legg, and
ganglia to learn to select information by reinforcement learning Demis Hassabis
(Graybiel 2005; Houk et al. 1995), the cerebellum to implement DeepMind, Kings Cross, London N1c4AG, United Kingdom.
fast time-scale computations possibly acquired with supervised [email protected] [email protected]
learning (Kawato et al. 2011; Wolpert et al. 1998), and the [email protected] [email protected]
limbic brain structures interfacing the brain to the body and gen- [email protected] [email protected]
erating motivations, emotions, and the value of things (Mirolli et al. [email protected] [email protected]
2010; Mogenson et al. 1980). Each of these mechanisms supports [email protected] [email protected]
multiple, high-level functions (see below). [email protected] [email protected]
Brain architecture is also forged by the fact that natural intelli- [email protected] csummerfi[email protected]
gence is strongly embodied and situated (an aspect not much [email protected] [email protected]
stressed by Lake et al.); that is, it is shaped to adaptively interact [email protected] [email protected]
with the physical world (Anderson 2003; Pfeifer & Gómez 2009) [email protected]
to satisfy the organism’s needs and goals (Mannella et al. 2013). http://www.deepmind.com
Thus, the cortex is organised along multiple cortical pathways
running from sensors to actuators (Baldassarre et al. 2013a) and Abstract: We agree with Lake and colleagues on their list of “key
“intercepted” by the basal ganglia selective processes in their ingredients” for building human-like intelligence, including the idea that
last part closer to action (Mannella & Baldassarre 2015). These model-based reasoning is essential. However, we favor an approach that
pathways are organised in a hierarchical fashion, with the higher centers on one additional ingredient: autonomy. In particular, we aim
ones that process needs and motivational information controlling toward agents that can both build and exploit their own internal models,
the lower ones closer to sensation/action. The lowest pathways with minimal human hand engineering. We believe an approach centered
dynamically connect musculoskeletal body proprioception with on autonomous learning has the greatest chance of success as we scale
toward real-world complexity, tackling domains for which ready-made
primary motor areas (Churchland et al. 2012). Higher-level formal models are not available. Here, we survey several important
“dorsal” pathways control the lowest pathways by processing examples of the progress that has been made toward building autonomous
visual/auditory information used to interact with the environment agents with human-like abilities, and highlight some outstanding challenges.
(Scott 2004). Even higher-level “ventral” pathways inform the
brain on the identity and nature of resources in the environment Lake et al. identify some extremely important desiderata for
to support decisions (Caligiore et al. 2010; Milner & Goodale human-like intelligence. We agree with many of their central
2006). At the hierarchy apex, the limbic brain supports goal selec- assertions: Human-like learning and decision making surely do
tion based on visceral, social, and other types of needs/goals. depend upon rich internal models; the learning process must be
Embedded within the higher pathways, an important structure informed and constrained by prior knowledge, whether this is

26 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
part of the agent’s initial endowment or acquired through learn- generative models (e.g. van den Oord 2016; Ranzato et al. 2016)
ing; and naturally, prior knowledge will offer the greatest leverage indicates that they can identify quite rich structure, increasingly
when it reflects the most pervasive or ubiquitous structures in the avoiding silly mistakes like those highlighted in Lake et al.’s
environment, including physical laws, the mental states of others, Figure 6.
and more abstract regularities such as compositionality and causality. Importantly, a learning-centered approach does not prevent us
Together, these points comprise a powerful set of target goals for AI from endowing learning systems with some forms of a priori
research. However, while we concur on these goals, we choose a knowledge. Indeed, the current resurgence in neural network
differently calibrated strategy for accomplishing them. In particular, research was triggered largely by work that does just this, for
we favor an approach that prioritizes autonomy, empowering artifi- example, by building an assumption of translational invariance
cial agents to learn their own internal models and how to use them, into the weight matrix of image classification networks (Krizhevsky
mitigating their reliance on detailed configuration by a human et al. 2012a). The same strategy can be taken to endow learning
engineer. systems with assumptions about compositional and causal struc-
Lake et al. characterize their position as “agnostic with regards ture, yielding architectures that learn efficiently about the dynam-
to the origins of the key ingredients” (sect. 4, para. 2) of human- ics of physical systems, and even generalize to previously unseen
like intelligence. This agnosticism implicitly licenses a modeling numbers of objects (Battaglia et al. 2016), another challenge
approach in which detailed, domain-specific information can be problem highlighted by Lake et al. In such cases, however, the
imparted to an agent directly, an approach for which some inbuilt knowledge takes a highly generic form, leaving wide
of the authors’ Bayesian Program Learning (BPL) work is scope for learning to absorb domain-specific structure (see also
emblematic. The two domains Lake and colleagues focus most Eslami et al 2016; Raposo et al. 2017; Reed and de Freitas 2016).
upon – physics and theory of mind – are amenable to such an Under the approach we advocate, high-level prior knowledge
approach, in that these happen to be fields for which mature sci- and learning biases can be installed not only at the level of repre-
entific disciplines exist. This provides unusually rich support for sentational structure, but also through larger-scale architectural
hand design of cognitive models. However, it is not clear that and algorithmic factors, such as attentional filtering (Eslami
such hand design will be feasible in other more idiosyncratic et al. 2016), intrinsic motivation mechanisms (Bellemare et al.
domains where comparable scaffolding is unavailable. Lake et al. 2016), and episodic learning (Blundell et al. 2016). Recently
(2015a) were able to extend the approach to Omniglot characters developed architectures for memory storage (e.g., Graves et al.
by intuiting a suitable (stroke-based) model, but are we in a position 2016) offer a critical example. Lake et al. describe neural networks
to build comparably detailed domain models for such things as as implementing “learning as a process of gradual adjustment of
human dialogue and architecture? What about Japanese cuisine or connection strengths.” However, recent work has introduced a
ice skating? Even video-game play appears daunting, when one number of architectures within which learning depends on rapid
takes into account the vast amount of semantic knowledge that is storage mechanisms, independent of connection-weight changes
plausibly relevant (knowledge about igloos, ice floes, cold water, (Duan et al. 2016; Graves et al. 2016; Wang et al. 2017; Vinyals
polar bears, video-game levels, avatars, lives, points, and so forth). et al. 2016). Indeed, such mechanisms have even been applied
In short, it is not clear that detailed knowledge engineering will be to one-shot classification of Omniglot characters (Santoro et al.,
realistically attainable in all areas we will want our agents to tackle. 2016) and Atari video game play (Blundell et al. 2016). Further-
Given this observation, it would appear most promising to focus more, the connection-weight changes that do occur in such
our efforts on developing learning systems that can be flexibly models can serve in part to support learning-to-learn (Duan
applied across a wide range of domains, without an unattainable et al. 2016; Graves et al. 2016; Ravi and Larochelle 2017;
overhead in terms of a priori knowledge. Encouraging this view, Vinyals et al. 2016; Wang et al. 2017), another of Lake et al.’s
the recent machine learning literature offers many examples of key ingredients for human-like intelligence. As recent work has
learning systems conquering tasks that had long eluded more shown (Andrychowicz et al. 2016; Denil et al. 2016; Duan et al.
hand-crafted approaches, including object recognition, speech 2016; Hochreiter et al. 2001; Santoro et al. 2016; Wang et al.
recognition, speech generation, language translation, and (signifi- 2017), this learning-to-learn mechanism can allow agents to
cantly) game play (Silver et al. 2016). In many cases, such suc- adapt rapidly to new problems, providing a novel route to install
cesses have depended on large amounts of training data, and prior knowledge through learning, rather than by hand. Learning
have implemented an essentially model-free approach. to learn enables us to learn a neural network agent over a long
However, a growing volume of work suggests that flexible, time. This network, however, is trained to be good at learning
domain-general learning can also be successful on tasks where rapidly from few examples, regardless of what those examples
training data are scarcer and where model-based inference is might be. So, although the meta-learning process might be
important. slow, the product is a neural network agent that can learn to
For example, Rezende and colleagues (2016) reported a deep harness a few data points to carry out numerous tasks, including
generative model that produces plausible novel instances of imitation, inference, task specialization, and prediction.
Omniglot characters after one presentation of a model character, Another reason why we believe it may be advantageous to
going a significant distance toward answering Lake’s “Character autonomously learn internal models is that such models can be
Challenge.” Lake et al. call attention to this model’s “need for shaped directly by specific, concrete tasks. A model is valuable
extensive pre-training.” However, it is not clear why their pre- not because it veridically captures some ground truth, but
installed model is to be preferred over knowledge acquired because it can be efficiently leveraged to support adaptive behav-
through pre-training. In weighing this point, it is important to ior. Just as Newtonian mechanics is sufficient for explaining many
note that the human modeler, to furnish the BPL architecture everyday phenomena, yet too crude to be useful to particle phys-
with its “start-up software,” must draw on his or her own large icists and cosmologists, an agent’s models should be calibrated to
volume of prior experience. In this sense, the resulting BPL its tasks. This is essential for models to scale to real-world com-
model is dependent on the human designer’s own “pre-training.” plexity, because it is usually too expensive, or even impossible,
A more significant aspect of the Rezende model is that it can be for a system to acquire and work with extremely fine-grained
applied without change to very different domains, as Rezende and models of the world (Botvinick & Weinstein 2015; Silver et al.
colleagues (2016) demonstrate through experiments on human 2017). Of course, a good model of the world should be applicable
facial images. This flexibility is one hallmark of an autonomous across a range of task conditions, even ones that have not been
learning system, and contrasts with the more purpose-built previously encountered. However, this simply implies that
flavor of the BPL approach, which relies on irreducible primitives models should be calibrated not only to individual tasks, but
with domain-specific content (e.g., the strokes in Lake’s Omniglot also to the distribution of tasks – inferred through experience or
model). Furthermore, a range of recent work with deep evolution – that is likely to arise in practice.

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 27
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
d
Finally, in addition to the importance of model building, it is Harvard University Department of Romance Languages and Literatures,
important to recognize that real autonomy also depends on Cambridge, MA 02138.
control functions, the processes that leverage models to make [email protected]. [email protected]
actual decisions. An autonomous agent needs good models, but [email protected] [email protected]
it also needs to know how to make use of them (Botvinick & www.semeion.it www.researchgate.net/profile/Massimo_Buscema
Cohen 2014), especially in settings where task goals may vary www.researchgate.net/profile/Pier_Sacco
over time. This point also favors a learning and agent-based
Abstract: We propose an alternative approach to “deep” learning that is
approach, because it allows control structures to co-evolve with
based on computational ecologies of structurally diverse artificial neural
internal models, maximizing their compatibility. Though efforts networks, and on dynamic associative memory responses to stimuli.
to capitalize on these advantages in practice are only in their Rather than focusing on massive computation of many different
infancy, recent work from Hamrick and colleagues (2017), examples of a single situation, we opt for model-based learning and
which simultaneously trained an internal model and a correspond- adaptive flexibility. Cross-fertilization of learning processes across
ing set of control functions, provides a case study of how this multiple domains is the fundamental feature of human intelligence that
might work. must inform “new” artificial intelligence.
Our comments here, like the target article, have focused on
model-based cognition. However, an aside on model-free In The Society of Mind, Minsky (1986) argued that the human
methods is warranted. Lake et al. describe model-free methods brain is more similar to a complex society of diverse neural net-
as providing peripheral support for model-based approaches. works, than to a large, single one. The current theoretical main-
However, there is abundant evidence that model-free mecha- stream in “deep” (artificial neural network [ANN]-based)
nisms play a pervasive role in human learning and decision learning leans in the opposite direction: building large ANNs
making (Kahneman 2011). Furthermore, the dramatic recent suc- with many layers of hidden units, relying more on computational
cesses of model-free learning in areas such as game play, naviga- power than on reverse engineering of brain functioning (Bengio
tion, and robotics suggest that it may constitute a first-class, 2009). The distinctive structural feature of the human brain is
independently valuable approach for machine learning. Lake its synthesis of uniformity and diversity. Although the structure
et al. call attention to the heavy data demands of model-free learn- and functioning of neurons are uniform across the brain and
ing, as reflected in DQN learning curves. However, even since the across humans, the structure and evolution of neural connections
initial report on DQN (Mnih et al. 2015), techniques have been make every human subject unique. Moreover, the mode of func-
developed that significantly reduce the data requirements of this tioning of the left versus right hemisphere of the brain seems dis-
and related model-free learning methods, including prioritized tinctively different (Gazzaniga 2004). If we do not wonder about
memory replay (Schaul et al. 2016), improved exploration this homogeneity of components that results in a diversity of func-
methods (Bellemare et al. 2016), and techniques for episodic rein- tions, we cannot understand the computational design principles
forcement learning (Blundell et al. 2016). Given the pace of such of the brain, or make sense of the variety of “constitutional
advances, it may be premature to relegate model-free methods to arrangements” in the governance of neural interactions at
a merely supporting role. various levels – “monarchic” in some cases, “democratic” or “fed-
To conclude, despite the differences we have focused on here, erative” in others.
we agree strongly with Lake et al. that human-like intelligence In an environment characterized by considerable stimulus vari-
depends at least in part on richly structured internal models. ability, a biological machine that responds by combining two dif-
Our approach to building human-like intelligence can be summa- ferent principles (as embodied in its two hemispheres) has a
rized as a commitment to developing autonomous agents: agents better chance of devising solutions that can flexibly adapt to
that shoulder the burden of building their own models and arriv- circumstances, and even anticipate singular events. The two
ing at their own procedures for leveraging them. Autonomy, in hemispheres seem to follow two opposite criteria: an analogical-
this sense, confers a capacity to build economical task-sensitive intuitive one, gradient descent-like, and a digital-rational one,
internal models, and to adapt flexibly to diverse circumstances, vector quantization-like. The former aims at anticipating and
while avoiding a dependence on detailed, domain-specific prior understanding sudden environmental changes – the “black
information. A key challenge in pursuing greater autonomy is swans.” The latter extrapolates trends from (currently classified
the need to find more efficient means of extracting knowledge as) familiar contexts and situations. These two criteria are concep-
from potentially limited data. But recent work on memory, tually orthogonal and, therefore, span a very rich space of cogni-
exploration, compositional representation, and processing tive functioning through their complex cooperation. On the
architectures, provides grounds for optimism. In fairness, the other hand, the Bayesian approach advocated by the authors to
authors of the target article have also offered, in other work, complement the current “deep” learning agenda is useful only
some indication of how their approach might be elaborated to to simulate the functioning of the left-brain hemisphere.
support greater agent autonomy (Lake et al. 2016). We may The best way to capture these structural features is to imagine
therefore be following slowly converging paths. On a final the brain as a society of agents (Minsky 1986), very heterogeneous
note, it is worth pointing out that as our agents gain in and communicating through their common neural base by means
autonomy, the opportunity increasingly arises for us to obtain of shared protocols, much like the Internet. The brain, as a highly
new insights from what they themselves discover. In this way, functionally bio-diverse computational ecology, may therefore
the pursuit of agent autonomy carries the potential to transform extract, from a large volume of external data, limited meaningful
the current AI landscape, revealing new paths toward human-like subsets (small data sets), to generate a variety of possible
intelligence. responses to these data sets and to learn from these very
responses. This logic is antithetical to the mainstream notion of
“deep learning” and of the consequential “big data” philosophy
of processing large volumes of data to generate a few, “static”
Digging deeper on “deep” learning: A (i.e., very domain specific) responses – and which could,
computational ecology approach perhaps, more appropriately be called “fat” learning. Such dichot-
omy clearly echoes the tension between model-based learning and
doi:10.1017/S0140525X1700005X, e256 pattern recognition highlighted by the authors of the target article.
Teaching a single, large, neural network how to associate an
Massimo Buscemaa,b and Pier Luigi Saccoc,d output to a certain input through millions of examples of a
a
Semeion Research Center, 00128 Rome, Italy; bUniversity of Colorado at single situation is an exercise in brute force. It would be much
Denver, Denver, CO 80217; cIULM University of Milan, 20143 Milan, Italy; and more effective, in our view, to train a whole population of

28 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
“deep” ANNs, mathematically very different from one another, on [email protected] [email protected]
the same problem and to filter their results by means of a Meta- https://leylaroksancaglar.github.io/
Net (Buscema 1998; Buscema et al. 2010; 2013) that ignores http://nwkpsych.rutgers.edu/~jose/
their specific architectures, in terms of both prediction perfor-
mance and biological plausibility. Abstract: The claims that learning systems must build causal models and
provide explanations of their inferences are not new, and advocate a
We can therefore sum up the main tenets of our approach as
cognitive functionalism for artificial intelligence. This view conflates the
follows: relationships between implicit and explicit knowledge representation. We
1. There is extreme diversity in the architectures, logical prin- present recent evidence that neural networks do engage in model building,
ciples, and mathematical structures of the deployed ANNs. which is implicit, and cannot be dissociated from the learning process.
2. “parliament” is created whereby each ANN proposes its sol-
ution to each case, in view of its past track record for similar The neural network revolution occurred more than 30 years ago,
occurrences. stirring intense debate over what neural networks (NNs) can and
3. There is dynamic negotiation among the various hypotheses: cannot learn and represent. Much of the target article resurrects
The solution proposal of an ANN and its reputation re-enter as these earlier concerns, but in the context of the latest NN revolution,
inputs for the other ANNs, until the ANN assembly reaches a spearheaded by an algorithm that was known, but failed because of
consensus. scale and computational power, namely, deep learning (DL).
4. Another highly diverse pool of ANNs learns the whole Claims that learning systems must build causal models and
dynamic process generated by the previous negotiation. provide explanations of their inferences are not new (DeJong
Responding to a pattern with a dynamic process rather than 1986; Lenat 1995; Mitchell 1986), nor have they been proven suc-
with a single output is much closer to the actual functioning of cessful. Advocating the idea that artificial intelligence (AI) systems
the human brain than associating a single output in a very need commonsense knowledge, ambitious projects such as “Cyc”
domain-specific way, however nonlinear. Associative memory is (Lenat 1990) created hand-crafted and labor-intensive knowledge
a fundamental component of human intelligence: It is a cognitive bases, combined with an inference engine to derive answers in the
morphing that connects apparently diverse experiences such as a form of explicit knowledge. Despite feeding a large but finite
lightning bolt and the fracture of a window pane. Human intelli- number of factual assertions and explicit rules into such
gence is a prediction engine working on hypotheses, generated systems, the desired human-like performance was never accom-
from a relatively small database and constantly verified through plished. Other explanation-based and expert systems (e.g.,
sequential sampling: a cycle of perception, prediction, validation, WordNet [Miller 1990]) proved useful in some applied
and modification. Novelties, or changes in an already known envi- domains, but were equally unable to solve the problem of AI.
ronmental scene, will command immediate attention. Pattern rec- At the essence of such projects lies the idea of “cognitive function-
ognition, therefore, is but the first step in understanding human alism.” Proposing that mental states are functional states deter-
intelligence. The next step should be building machines that gen- mined and individuated by their causal relations to other mental
erate dynamic responses to stimuli, that is, behave as dynamic states and behaviors, it suggests that mental states are programma-
associative memories (Buscema 1995; 1998; 2013; Buscema ble with explicitly determined representational structures (Fodor,
et al. 2015). The very same associative process generated by the 1981; Hayes 1974; McCarthy & Hayes 1969; Putnam 1967). Such
machine, in addition to interacting with itself and the external a view stresses the importance of “formalizing concepts of causa-
stimuli, must itself become the object of learning: This is learn- lity, ability, and knowledge” to create “a computer program that
ing-to-learn in its fuller meaning. In this way, the artificial intelli- decides what to do by inferring in a formal language that a
gence frontier moves from pattern recognition to recognition of certain strategy will achieve its assigned goal” (McCarthy &
pattern transformations – learning the topology used by the Hayes, 1969, p. 1). Lake et al.’s appeal to causal mechanisms
brain to connect environmental scenes. Analyzing the cause- and their need for explicit model representations is closely
effect links within these internal processes provides the basis to related to this cognitive functionalism, which had been put forth
identify meaningful rules of folk psychology or cognitive biases: as a set of principles by many founders of the AI field (Hayes
A pound of feathers may be judged lighter than a pound of lead 1974; McCarthy 1959; McCarthy & Hayes 1969; Newell &
only in a thought process where feathers are associated with light- Simon, 1956).
ness. The meta-analysis of the connections generated by a mind One important shortcoming of cognitive functionalism is its
may yield physically absurd, but psychologically consistent, failure to acknowledge that the same behavior/function may be
associations. caused by different representations and mechanisms (Block
An approach based on ecologies of computational diversity and 1978; Hanson 1995). Consequently, the problem with this propo-
dynamic brain associations seems to us the most promising route sition that knowledge within a learning system must be explicit is
to a model-based learning paradigm that capitalizes on our that it conflates the relationship between implicit knowledge and
knowledge of the brain’s computational potential. And this also explicit knowledge and their representations. The ability to throw
means allowing for mental disturbances, hallucinations, or delir- a low hanging fast ball would be difficult, if not impossible, to
ium. A “deep” machine that cannot reproduce a dissociated encode as a series of rules. However, this type of implicit know-
brain is just not intelligent enough, and if it merely maximizes ledge can indeed be captured in a neural network, simply by
IQ, it is, in a sense, “dumb.” A system that can also contemplate having it learn from an analog perception–action system and a
stupidity or craziness is the real challenge of the “new” artificial series of ball throws – all while also having the ability to represent
intelligence. rule-based knowledge (Horgan & Tienson 1996). This associative
versus rule learning debate, referred to in this article as “pattern
recognition” versus “model building,” was shown a number of
times to be a meaningless dichotomy (Hanson & Burr 1990;
Hanson et al. 2002; Prasada & Pinker 1993).
Back to the future: The return of cognitive Although we agree with Lake et al. that “model building” is
functionalism indeed an important component of any AI system, we do not
agree that NNs merely recognize patterns and lack the ability to
doi:10.1017/S0140525X17000061, e257 build models. Our disagreement arises from the presumption
that “a model must include explicit representations of objects,
Leyla Roskan Çağ lar and Stephen José Hanson identity and relations” (Lake et al. 2016, pp. 38–39). Rather
Psychology Department, Rutgers University Brain Imaging Center (RUBIC), than being explicit or absent altogether, model representation is
Rutgers University, Newark, NJ 07102. implicit in NNs. Investigating implicitly learned models is

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 29
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
somewhat more challenging, but work on learning dynamics and compatible both with non-parametric Bayesian modelling and with sub-
learning functions with respect to their relationship to representa- symbolic methods such as neural networks.
tions provides insights into these implicit models (Caglar &
Hanson 2016; Cleeremans 1993; Hanson & Burr 1990; Metcalf Lake et al. make a powerful case that modelling human-like intelli-
et al. 1992; Saxe et al. 2014). gence depends on highly flexible, compositional representations,
Recent work has shown that in DL, the internal structure, or to embody world knowledge. But will such knowledge really be
“model,” accumulates at later layers, and is effectively constructing embedded in “intuitive theories” of physics or psychology? This com-
“scaffolds” over the learning process that are then used to train sub- mentary argues that there is a paradox at the heart of the “intuitive
sequent layers (Caglar & Hanson 2016; Saxe 2013). These learning theory” viewpoint, that has bedevilled analytic philosophy and sym-
dynamics can be investigated through analysis of the learning bolic artificial intelligence: human knowledge is both (1) extremely
curves and the internal representations resultant in the hidden sparse and (2) self-contradictory (e.g., Oaksford & Chater 1991).
units. Analysis of the learning curves of NNs with different archi- The sparseness of intuitive knowledge is exemplified in Rozen-
tectures reveals that merely adding depth to a NN results in differ- blit and Keil’s (2002) discussion of the “illusion of explanatory
ent learning dynamics and representational structures, which do depth.” We have the feeling that we understand how a crossbow
not require explicit preprogramming or pre-training (Caglar & works, how a fridge stays cold, or how electricity flows around
Hanson 2016). In fact, the shape of the learning curves for the house. Yet, when pressed, few of us can provide much more
single-layer NNs and for multilayered DLs are qualitatively differ- than sketchy and incoherent fragments of explanation. Therefore,
ent, with the former fitting a negative exponential function (“asso- our causal models of the physical world appear shallow. The
ciative”) and the latter fitting a hyperbolic function sparseness of intuitive psychology seems at least as striking.
(“accumulative”). This type of structured learning, consistent Indeed, our explanations of our own and others’ behavior often
with the shape of the learning curves, can be shown to be equivalent appear to be highly ad hoc (Nisbett & Ross 1980).
to the “learning-to-learn” component suggested by the authors. Moreover, our physical and psychological intuitions are also
Appearing across different layers of the NNs, it also satisfies the self-contradictory. The foundations of physics and rational
need for “learning-to-learn to occur at multiple levels of the hierar- choice theory have consistently shown how remarkably few
chical generative process” (Lake et al., sect. 4.2.3, para. 5). axioms (e.g., the laws of thermodynamics, the axioms of decision
Furthermore, in category learning tasks with DLs, the internal theory) completely fix a considerable body of theory. Yet our intu-
representation of the hidden units shows that it creates proto- itions about heat and work, or probability and utility, are vastly
type-like representations at each layer of the network (Caglar & richer and more amorphous, and cannot be captured in any con-
Hanson 2016). These higher-level representations are the result sistent system (e.g., some of our intuitions may imply our axioms,
of concept learning from exemplars, and go far beyond simple but others will contradict them). Indeed, contradictions can also
pattern recognition. Additionally, the plateau characteristic of the be evident even in apparent innocuous mathematical or logical
hyperbolic learning curves provides evidence for rapid learning, assumptions (as illustrated by Russell’s paradox, which unexpect-
as well as one-shot learning once this kind of implicit conceptual edly exposed a contradiction in Frege’s attempted logical founda-
representation has been formed over some subset of exemplars tion for mathematics [Irvine & Deutsch 2016]).
(similar to a “prior”) (Saxe 2014). Longstanding investigation in The sparse and contradictory nature of our intuition explains
the learning theory literature proposes that the hyperbolic learning why explicit theorizing requires continually ironing out contradic-
curve of DLs is also the shape that best describes human learning tions, making vague concepts precise, and radically distorting or
(Mazur & Hastie 1978; Thurstone 1919), thereby suggesting that replacing existing concepts. And the lesson of two and half millen-
the learning mechanisms of DLs and humans might be more nia of philosophy is arguable, that clarifying even the most basic
similar than thought (Hanso et al., in preparation). concepts, such as “object” or “the good” can be entirely intracta-
Taken together, the analysis of learning curves and internal rep- ble, a lesson re-learned in symbolic artificial intelligence. In any
resentations of hidden units indicates that NNs do in fact build case, the raw materials for this endeavor – our disparate intui-
models and create representational structures. However, these tions – may not be properly viewed as organized as theories at all.
models are implicitly built into the learning process and cannot If this is so, how do we interact so successfully in the physical and
be explicitly dissociated from it. Exploiting the rich information social worlds? We have experience, whether direct, or by observa-
of the stimulus and its context, the learning process creates tion or instruction – of crossbows, fridges, and electricity – to be
models and shapes representational structures without the need able to interact with them in familiar ways. Indeed, our ability to
for explicit preprogramming. make sense of new physical situations often appears to involve cre-
ative extrapolation from familiar examples: for example, assuming
that heavy objects will fall faster than light objects, even in a
vacuum, or where air resistance can be neglected. Similarly, we
have a vast repertoire of experience of human interaction, from
Theories or fragments? which we can generalize to new interactions. Generalization from
such experiences, to deal with new cases, can be extremely flexible
doi:10.1017/S0140525X17000073, e258 and abstract (Hofstadter 2001). For example, the perceptual system
uses astonishing ingenuity to construct complex percepts (e.g.,
Nick Chatera and Mike Oaksfordb human faces) from highly impoverished signals (e.g., Hoffman
a
Behavioural Science Group, Warwick Business School, University of 2000; Rock 1983) or to interpret art (Gombrich 1960).
Warwick, Coventry CV4 7AL, United Kingdom; bDepartment of Psychological We suspect that the growth and operation of cognition are more
Sciences, Birkbeck, University of London, London WC1E 7HX, United closely analogous to case law than to scientific theory. Each new
Kingdom. case is decided by reference to the facts of that present case
[email protected] [email protected] and to ingenious and open-ended links to precedents from past
http://www.wbs.ac.uk/about/person/nick-chater/ cases; and the history of cases creates an intellectual tradition
http://www.bbk.ac.uk/psychology/our-staff/mike-oaksford that is only locally coherent, often ill-defined, but surprisingly
effective in dealing with a complex and ever-changing world. In
Abstract: Lake et al. argue persuasively that modelling human-like
intelligence requires flexible, compositional representations in order to
short, knowledge has the form of a loosely interlinked history of
embody world knowledge. But human knowledge is too sparse and self- reusable fragments, each building on the last, rather than being
contradictory to be embedded in “intuitive theories.” We argue, instead, organized into anything resembling a scientific theory.
that knowledge is grounded in exemplar-based learning and Recent work on construction-based approaches to language
generalization, combined with high flexible generalization, a viewpoint exemplify this viewpoint in the context of linguistics (e.g.,

30 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
Goldberg 1995). Rather than seeing language as generated by a will certainly endow contemporary state-of-the-art machines
theory (a formally specified grammar), and the acquisition of lan- with greater human-like cognitive qualities. But, in Lake et al.’s
guage as the fine-tuning of that theory, such approaches see lan- efforts to create a standard of human-like machine learning and
guage as a tradition, where each new language processing thinking, they awkwardly, and perhaps ironically, erect barriers
episode, like a new legal case, is dealt with by reference to past to realizing ideal human simulation by ignoring what is also
instances (Christiansen & Chater 2016). In both law and language very human – variations in cognitive-emotional neural network
(see Blackburn 1984), there will be a tendency to impose local structure and function capable of giving rise to non-normative
coherence across similar instances, but there will typically be no (or unique) personalities and, therefore, dynamic expression of
globally coherent theory from which all cases can be generated. human intelligences and identities (Clark 2012; 2015; in press-a;
Case instance or exemplar-based theorizing has been wide- in press-b; in press-c). Moreover, this same, somewhat counterin-
spread in the cognitive sciences (e.g., Kolodner 1993; Logan tuitive, problem in the authors’ otherwise rational approach dan-
1988; Medin & Shaffer 1978). Exploring how creative extensions gerously leaves unaddressed the major ethical and security issues
of past experience can be used to deal with new experience of “free-willed” personified artificial sentient agents, often popu-
(presumably by processes of analogy and metaphor rather than larized by fantasists and futurists alike (Bostrom 2014; Briegel
deductive theorizing from basic principles) provides an 2012; Davies 2016; Fung 2015).
exciting challenge for artificial intelligence, whether from a non- Classic interpretations of perfect humanness arising from the fal-
parametric Bayesian standpoint or a neural network perspective, libility of humans (e.g., Clark 2012; Nisbett & Ross 1980; Parker &
and is likely to require drawing on the strengths of both. McKinney 1999; Wolfram 2002) appreciably impact the technical
feasibility and socio-cultural significance of building and deploying
ACKNOWLEDGMENTS human-emulating personified machines under both nonsocial and
N.C. was supported by ERC Grant 295917-RATIONALITY, the ESRC social constraints. Humans, as do all sentient biological entities, fall
Network for Integrated Behavioural Science (Grant ES/K002201/1), the within a fuzzy organizational and operational template that bounds
Leverhulme Trust (Grant RP2012-V-022], and Research Councils UK emergence of phylogenic, ontogenic, and sociogenic individuality
Grant EP/K039830/1. (cf. Fogel & Fogel 1995; Romanes 1884). Extreme selected varia-
tions in individuality, embodied here by modifiable personality and
its link to mind, can greatly elevate or diminish human expression,
depending on pressures of situational contexts. Examples may
The humanness of artificial non-normative include the presence or absence of resoluteness, daring, agile delib-
personalities eration, creativity, and meticulousness essential to achieving match-
less, unconventional artistic and scientific accomplishments. Amid
doi:10.1017/S0140525X17000085, e259 even further examples, they may also include the presence or
absence of empathy, morality, or ethics in response to severe
Kevin B. Clark human plight and need. Regardless, to completely simulate the
Research and Development Service, Veterans Affairs Greater Los Angeles range of human intelligence, particularly solitary to sociable and
Healthcare System, Los Angeles, CA 90073; California NanoSystems Institute, selfish to selfless tendencies critical for now-nascent social-like
University of California at Los Angeles, Los Angeles, CA 90095; Extreme human-machine and machine-machine interactions, scientists and
Science and Engineering Discovery Environment (XSEDE), National Center for technologists must account for, and better understand, personality
Supercomputing Applications, University of Illinois at Urbana–Champaign,
trait formation and development in autonomous artificial technolo-
Urbana, IL 61801; Biological Collaborative Research Environment (BioCoRE),
Theoretical and Computational Biophysics Group, NIH Center for
gies (Cardon 2006; Clark 2012; 2015; Kaipa et al. 2010; McShea
Macromolecular Modeling and Bioinformatics, Beckman Institute for 2013). These kinds of undertakings will help yield desirable insights
Advanced Science and Technology, University of Illinois at Urbana- into the evolution of technology-augmented human nature and,
Champaign, Urbana, IL 61801. perhaps more importantly, will inform best practices when estab-
[email protected] lishing advisable failsafe contingencies against unwanted serendip-
www.linkedin.com/pub/kevin-clark/58/67/19a itous or designed human-like machine behavior.
Notably, besides their described usefulness for modeling
Abstract: Technoscientific ambitions for perfecting human-like machines, by intended artificial cognitive faculties, Lake et al.’s core ingredients
advancing state-of-the-art neuromorphic architectures and cognitive provide systematic concepts and guidelines necessary to begin
computing, may end in ironic regret without pondering the humanness of approximating human-like machine personalities, and to probe
fallible artificial non-normative personalities. Self-organizing artificial genuine ethological, ecological, and evolutionary consequences
personalities individualize machine performance and identity through fuzzy
conscientiousness, emotionality, extraversion/introversion, and other traits,
of those personalities for both humans and machines. However,
rendering insights into technology-assisted human evolution, robot ethology/ similar reported strategies for machine architectures, algorithms,
pedagogy, and best practices against unwanted autonomous machine behavior. and performance demonstrate only marginal success when used as
protocols to reach nearer cognitive-emotional humanness in
Within a modern framework of promising, yet still inadequate trending social robot archetypes (Arbib & Fellous 2004; Asada
state-of-the-art artificial intelligence, Lake et al. construct an opti- 2015; Berdahl 2010; Di & Wu 2015; Han et al. 2013; Hiolle
mistic, ambitious plan for innovating truer representative neural et al. 2014; Kaipa et al. 2010; McShea 2013; Read et al. 2010;
network-inspired machine emulations of human consciousness Thomaz & Cakmak 2013; Wallach et al. 2010; Youyou et al.
and cognition, elusive pinnacle goals of many cognitive, semiotic, 2015), emphasizing serious need for improved adaptive quasi-
and cybernetic scientists (Cardon 2006; Clark 2012; 2014; 2015; model-free/-based neural nets, trainable distributed cognition-
Kaipa et al. 2010; McShea 2013). Their machine learning-based emotion mapping, and artificial personality trait parameterization.
agenda, possibly requiring future generations of pioneering The best findings from such work, although far from final reduc-
hybrid neuromorphic computing architectures and other sorts of tion-to-practice, arguably involve the appearance of crude or
technologies to be fully attained (Lande 1998; Indiveri & Liu primitive machine personalities and identities from socially
2015; Schuller & Stevens 2015), relies on implementing sets of learned intra-/interpersonal relationships possessing cognitive-
data-/theory-established “core ingredients” typical of natural emotional valences. Valence direction and magnitude often
human intelligence and development (cf. Bengio 2016; Meltzoff depend on the learner machine’s disposition toward response
et al. 2009; Thomaz & Cakmak 2013; Weigmann 2006). Such priming/contagion, social facilitation, incentive motivation, and
core ingredients, including (1) intuitive causal physics and psy- local/stimulus enhancement of observable demonstrator behavior
chology, (2) compositionality and learning-to-learn, and (3) fast (i.e., human, cohort-machine, and learner-machine behavior).
efficient real-time gradient-descent deep learning and thinking, The resulting self-/world discovery of the learner machine,

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 31
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
analogous to healthy/diseased or normal/abnormal human phenom- Research suggests that from early infancy, children display a core
ena acquired during early formative (neo)Piagetian cognitive-emo- set of abilities that shape their reasoning about the world, including
tional periods (cf. Nisbett & Ross 1980; Parker & McKinney 1999; reasoning about both inanimate objects (intuitive physics [e.g.,
Zentall 2013), reciprocally shapes the potential humanness of Spelke 1990]) and animate social beings (intuitive psychology [e.g.,
reflexive/reflective machine actions through labile interval-delim- Dennett 1987; Meltzoff & Moore 1995]). Although the early onset
ited self-organizing traits consistent with natural human personali- of these abilities provides evidence that they may be universal,
ties, including, but not restricted to, conscientiousness, openness, little research has examined their development in non-WEIRD
emotional stability, agreeableness, and extraversion/introversion. (Western educated industrialized rich democratic) (Henrich et al.
Even simplistic artificial cognitive-emotional profiles and person- 2010) cultures (Legare & Harris, 2016). Moreover, research that
alities thus effect varying control over acquisition and lean of has examined children’s intuitive theories in different cultural set-
machine domain-general/-specific knowledge, perception and tings has suggested the potential for both cross-cultural continuity
expression of flat or excessive machine affect, and rationality and and variation in their development. Take, for example, the develop-
use of inferential machine attitudes/opinions/beliefs (Arbib & ment of children’s theory of mind, a component of intuitive psychol-
Fellous 2004; Asada 2015; Berdahl 2010; Cardon 2006; Davies ogy. A large collection of research comparing the development of
2016; Di & Wu 2015; Han et al. 2013; Hiolle et al. 2014; Kaipa children’s understanding of false belief in the United States,
et al. 2010; McShea 2013; Read et al. 2010; Wallach et al. 2010; China, and Iran indicates that although typically developing children
Youyou et al. 2015). And, by favoring certain artificial personality in all cultures show an improvement in false belief understanding
traits, such as openness, a learner machine’s active and passive ped- over the course of ontogeny, the timing of this improvement
agogical experiences may be radically directed by the quality of differs widely—and such variability is potentially related to different
teacher-student rapport (e.g., Thomaz & Cakmak 2013), enabling sociocultural inputs (Davoodi et al. 2016; Liu et al. 2008; Shahaeian
opportunities for superior nurturing and growth of distinctive, et al. 2011). Thus, children’s social environments may be shaping the
well-adjusted thoughtful machine behavior while, in part, restricting development of these core abilities, “reprogramming” and updating
harmful rogue machine behavior, caused by impoverished learning their developmental start-up software.
environments and predictable pathological Gödel-type incomplete- To illustrate why considering the principles derived from theory
ness/inconsistency for axiomatic neuropsychological systems (cf. theory are important for guiding AI development, Lake et al.
Clark & Hassert 2013). These more-or-less philosophical consider- point to AI’s lack of human-like intuitive psychology as a key
ations, along with the merits of Lake et al.’s core ingredients for reason for why humans outperform AI. In their discussion of
emerging artificial non-normative (or unique) personalities, will humans’ superior performance in the Frostbite challenge, the
bear increasing technical and sociocultural relevance as the authors highlight humans’ ability to build on skills gained
Human Brain Project, the Blue Brain Project, and related connec- through the observation of an expert player,which requires rea-
tome missions drive imminent neuromorphic hardware research soning about the expert player’s mental state. AI can also draw
and development toward precise mimicry of configurable/computa- on observations of expert players, but requires substantially
tional soft-matter variations in human nervous systems (cf. Calimera greater input to achieve similar levels of performance. Humans’
et al. 2013). intuitive psychology and their corresponding ability to reason
about others’ mental states is just one element of why humans
may be outperforming computers in this task. This situation also
draws on humans’ ability to learn by observing others and, like
the development of false-belief understanding, children’s ability
Children begin with the same start-up to learn through observation as well as through verbal testimony,
software, but their software updates are which is heavily influenced by sociocultural inputs (Harris 2012).
cultural Culturally specific ethno-theories of how children learn (Clegg
et al. 2017; Corriveau et al. 2013; Harkness et al. 2007; Super &
doi:10.1017/S0140525X17000097, e260 Harkness 2002) and the learning opportunities to which children
have access (Kline 2015; Rogoff 2003) shape their ability to learn
Jennifer M. Clegg and Kathleen H. Corriveau through observation. As early as late infancy, sociocultural inputs
Boston University School of Education, Boston, MA 02215. such as how parents direct children’s attention, or the typical struc-
[email protected] [email protected] ture of parent-child interaction, may lead to differences in the way
www.jennifermclegg.com www.bu.edu/learninglab children attend to events for the purpose of observational learning
(Chavajay & Rogoff 1999). By pre-school, children from non-
Abstract: We propose that early in ontogeny, children’s core cognitive WEIRD cultures where observational learning is expected and
abilities are shaped by culturally dependent “software updates.” The role socialized outperform children from WEIRD cultures in observa-
of sociocultural inputs in the development of children’s learning is tional learning tasks (Correa-Chávez & Rogoff 2009; Mejía-Arauz
largely missing from Lake et al.’s discussion of the development of et al. 2005). Recent research also suggests that children from differ-
human-like artificial intelligence, but its inclusion would help move ent cultural backgrounds attend to different types of information
research even closer to machines that can learn and think like humans.
when engaging in observational learning. For example, Chinese-
Lake et al. draw from research in both artificial intelligence (AI) American children are more sensitive to whether there is consensus
and cognitive development to suggest a set of core abilities neces- about a behavior or information than Euro-American children (Cor-
sary for building machines that think and learn like humans. We riveau & Harris 2010; Corriveau et al. 2013; DiYanni et al. 2015).
share the authors’ view that children have a set of core cognitive Such cultural differences in attending to social information in obser-
abilities for learning and that these abilities should guide develop- vational learning situations persist into adulthood (Mesoudi et al.
ment in AI research. We also agree with the authors’ focus on 2015). Therefore, although the developmental start-up software
findings from theory theory research and their characterization children begin with may be universal, early in development, child-
of its principles as “developmental start-up software” that is ren’s “software updates” may be culturally dependent. Over time,
adapted later in ontogeny for social learning. What is missing these updates may even result in distinct operating systems.
from this discussion, however, is the recognition that children’s The flexibility of children’s core cognitive abilities to be shaped
developmental start-up software is shaped by their culture-spe- by sociocultural input is what makes human learning unique
cific social environment. Children’s early and ontogenetically per- (Henrich 2015). The role of this input is largely missing from
sistent experiences with their cultural environment affect what Lake et al.’s discussion of creating human-like AI, but its inclusion
learning “programs” children develop and have access to, particu- would help move research even closer to machines that can learn
larly in the case of social learning. and think like humans.

32 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people

Deep-learning networks and the functional intelligence. As Lake et al. note, gains come from combining DL
architecture of executive control with reinforcement learning (RL) and Monte-Carlo Tree Search
to support extended goal-directed activities (such as playing Atari
doi:10.1017/S0140525X17000103, e261 games) and problem solving (as in the game of Go). These exten-
sions are of particular interest because they parallel cognitive psy-
Richard P. Cooper chological accounts of more complex cognition. More specifically,
Centre for Cognition, Computation and Modelling, Department of accounts of behaviour generation and regulation have long distin-
Psychological Sciences, Birkbeck, University of London, London WC1E 7HX, guished between automatic and deliberative behaviour. Thus, the
United Kingdom. contention scheduling/supervisory system theory of Norman and
[email protected] Shallice (1986) proposes that one system – the contention schedul-
http://www.bbk.ac.uk/psychology/our-staff/richard-cooper ing system – controls routine, overlearned, or automatic behaviour,
whereas a second system – the supervisory system – may bias or
Abstract: Lake et al. underrate both the promise and the limitations of modulate the contention scheduling system in non-routine situa-
contemporary deep learning techniques. The promise lies in combining tions where deliberative control is exercised. Within this account
those techniques with broad multisensory training as experienced by the routine system may plausibly employ a DL-type network com-
infants and children. The limitations lie in the need for such systems to
possess functional subsystems that generate, monitor, and switch goals
bined with (a hierarchical variant of) model-free reinforcement
and strategies in the absence of human intervention. learning, whereas the non-routine system is more plausibly con-
ceived of in terms of a model-based system (cf. Daw et al. 2005).
Lake et al. present a credible case for why natural intelligence Viewing DL-type networks as models of the contention sched-
requires the construction of compositional, causal generative uling system suggests that their performance should be compared
models that incorporate intuitive psychology and physics. to those aspects of expert performance that are routinized or over-
Several of their arguments (e.g., for compositionality and theory learned. From this perspective, the limits of DL-type networks
construction and for learning from limited experience) echo argu- are especially informative, as they indicate which cognitive func-
ments that have been made throughout the history of cognitive tions cannot be routinized and should be properly considered as
science (e.g., Fodor & Pylyshyn 1988). Indeed, in the context of supervisory. Indeed, classical model-based RL is impoverished
Lake et al.’s criticisms, the closing remarks of Fodor and Pyly- compared with natural intelligence. The evidence from patient
shyn’s seminal critique of 1980s-style connectionism make sober- and imaging studies suggests that the non-routine system is not
ing reading: “some learning is a kind of theory construction.… We an undifferentiated whole, as might befit a system that simply per-
seem to remember having been through this argument before. forms Monte-Carlo Tree Search. The supervisory system appears
We find ourselves with a gnawing sense of deja vu” (1988, to perform a variety of functions, such as goal generation (to
p. 69). It would appear that cognitive science has advanced little create one’s own goals and to function in real domains outside
in the last 30 years with respect to the underlying debates. of the laboratory), strategy generation and evaluation (to create
Yet Lake et al. underrate both the promise and the limitations and evaluate potential strategies that might achieve goals), moni-
of contemporary deep learning (DL) techniques with respect to toring (to detect when one’s goals are frustrated and to thereby
natural and artificial intelligence. Although contemporary DL trigger generation of new plans/strategies or new goals), switching
approaches to, say, learning and playing Atari games undoubtedly (to allow changing goals), response inhibition (to prevent selection
employ psychologically unrealistic training regimes, and are of pre-potent actions which may conflict with one’s high-level
undoubtedly inflexible with respect to changes to the reward/ goals), and perhaps others. (See Shallice & Cooper [2011] for
goal structure, to fixate on these limitations overlooks the an extended review of relevant evidence and Fox et al. [2013]
promise of such approaches. It is clear the DL nets are not nor- and Cooper [2016], for detailed suggestions for the potential orga-
mally trained with anything like the experiences had by the devel- nisation of higher-level modulatory systems.) These functions
oping child, whose learning is based on broad, multisensory must also support creativity and autonomy, as expressed by natu-
experience and is cumulative, with new motor and cognitive rally intelligent systems. Furthermore, “exploration” is not
skills building on old (Vygotsky 1978). Until DL nets are trained unguided as in the classical exploration/exploitation trade-off of
in this way, it is not reasonable to critique the outcomes of such RL. Natural intelligence appears to combine the largely reactive
approaches for unrealistic training regimes of, for example, perception-action cycle of RL with a more active action-percep-
“almost 500 times as much experience as the human received” tion cycle, in which the cognitive system can act and deliberatively
(target article, sect. 3.2, para. 4). That 500 times as much experi- explore in order to test hypotheses.
ence neglects the prior experience that the human brought to the To achieve natural intelligence, it is likely that a range of super-
task. DL networks. as currently organised, require that much visory functions will need to be incorporated into the model-based
experience precisely because they bring nothing but a learning system, or as modulators of a model-free system. Identifying the
algorithm to the task. component functions and their interactions, that is, identifying
A more critical question is whether contemporary DL the functional architecture (Newell 1990), will be critical if we
approaches might, with appropriate training, be able to acquire are to move beyond Lake et al.’s “Character” and “Frostbite” chal-
intuitive physics – the kind of thing an infant learns through his lenges, which remain highly circumscribed tasks that draw upon
or her earliest interactions with the world (that there are solids limited world knowledge.
and liquids, and that solids can be grasped and that some can be
picked up, but that they fall when dropped, etc.). Similarly, can
DL acquire intuitive psychology through interaction with other
agents? And what kind of input representations and motor abilities
might allow DL networks to develop representational structures Causal generative models are just a start
that support reuse across tasks? The promise of DL networks
(and at present it remains a promise) is that, with sufficiently doi:10.1017/S0140525X17000115, e262
broad training, they may support the development of systems
that capture intuitive physics and intuitive psychology. To Ernest Davisa and Gary Marcusb,c
neglect this possibility is to see the glass as half empty, rather a
Department of Computer Science, New York University, New York, NY 10012;
than half full. b
Uber AI Labs, San Francisco, CA 94103; cDepartment of Psychology,
The suggestion is not simply that training an undifferentiated New York University, New York, NY 10012.
DL network with the ordered multisensory experiences of a devel- [email protected] [email protected]
oping child will automatically yield an agent with natural http://www.cs.nyu.edu/faculty/davise http://garymarcus.com/

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 33
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
Abstract: Human reasoning is richer than Lake et al. acknowledge, and Third, there is extensive evidence that in many cases where the
the emphasis on theories of how images and scenes are synthesized is actual physics is simple, humans make large, systematic errors.
misleading. For example, the world knowledge used in vision For example, a gyroscope or a balance beam constructed of
presumably involves a combination of geometric, physical, and other solid parts is governed by the identical physics as the falling
knowledge, rather than just a causal theory of how the image was
produced. In physical reasoning, a model can be a set of constraints
tower of blocks studied in Battaglia et al. (2013); the physical
rather than a physics engine. In intuitive psychology, many inferences interactions and their analysis are much simpler for these than
proceed without detailed causal generative models. How humans for the tower of blocks, and the physics engine that Battaglia
reliably perform such inferences, often in the face of radically et al. used in their studies will handle the case of a gyroscope or
incomplete information, remains a mystery. a balance beam without difficulty. But here, the model is “too
good” relative to humans. Human subjects often make errors in
We entirely agree with the central thrust of the article. But a predicting the behavior of a balance beam (Siegler 1976), and
broader view of what a “model” is, is needed. most people find the behavior of a gyroscope mystifying.
In most of the examples discussed in the target article, a Neither result follows from the model.
“model” is a generative system that synthesizes a specified Intuitive psychology goes even further beyond what can be
output. For example, the target article discusses a system built explained by sorts of generative models of action choice, discussed
by Lake et al. (2015a) that learns to recognize handwritten char- in the target article. One’s knowledge of the state of another
acters from one or two examples, by modeling the sequence of agent’s mind and one’s ability to predict their action are necessar-
strokes that produced them. The result is impressive, but the ily extremely limited; nonetheless, powerful psychological reason-
approach – identifying elements from a small class of items ing can be carried out. For example, if you see a person pick up a
based on a reconstruction of how something might be generated – telephone and dial, it is a good guess that they he or she is plan-
does not readily generalize in many other situations. Consider, for ning to talk to someone. To do so, one does not need a full
example, how one might recognize a cat, a cartoon of a cat, a causal model of whom they want to talk to, what they will say,
painting of a cat, a marble sculpture of a cat, or a cloud that or what their goal is in calling. In this instance (and many
happens to look like a cat. The causal processes that generated others), there seems to be a mismatch between the currency of
each of these are very different; and yet a person familiar with generative models and the sorts of inferences that humans can
cats will recognize any of these depictions, even if they know readily make.
little of the causal processes underlying sculpture or the formation So whereas we salute Lake et al.’s interest in drawing inferences
of clouds. Conversely, the differences between the causal pro- from small amounts of data, and believe as they do that rich
cesses that generate a cat and those that generate a dog are models are essential to complex reasoning, we find their view of
understood imperfectly, even by experts in developmental causal models to be too parochial. Reasoning in humans, and in
biology, and hardly at all by laypeople. Yet even children can general artificial intelligence, requires bringing to bear knowledge
readily distinguish dogs from cats. Likewise, where children across an extraordinarily wide range of subjects, levels of abstrac-
learn to recognize letters significantly before they can write tion, and degrees of completeness. The exclusive focus on causal
them at all well,1 it seems doubtful that models of how an generative models is unduly narrow.
image is synthesized, play any necessary role in visual recognition
even of letters, let alone of more complex entities. Lake et al.’s NOTE
results are technically impressive, but may tell us little about 1. This may be less true with respect to Chinese and other large char-
object recognition in general. acter sets, in which practicing drawing the characters is an effective way of
memorizing them (Tan et al. 2005).
The discussion of physical reasoning here, which draws on
studies such as Battaglia et al. (2013), Gerstenberg et al. (2015),
and Sanborn et al. (2013), may be similarly misleading. The
target article argues that the cognitive processes used for human
physical reasoning are “intuitive physics engines,” similar to the Thinking like animals or thinking like
simulators used in scientific computation and computer games. colleagues?
But, as we have argued elsewhere (Davis & Marcus 2014;
2016), this model of physical reasoning is much too narrow, doi:10.1017/S0140525X17000127, e263
both for AI and for cognitive modeling.
First, simulation engines require both a precise predictive theory Daniel C. Dennett and Enoch Lambert
of the domain and a geometrically and physically precise descrip- Center for Cognitive Studies, Tufts University, Medford, MA 02155.
tion of the situation. Human reasoners, by contrast, can deal with [email protected] [email protected]
information that is radically incomplete. For example, if you are car- http://ase.tufts.edu/cogstud/dennett/
rying a number of small creatures in a closed steel box, you can http://ase.tufts.edu/cogstud/faculty.html
predict that as long as the box remains completely closed, the crea-
tures will remain inside. This prediction can be made without Abstract: We comment on ways in which Lake et al. advance our
knowing anything about the creatures and the way they move, understanding of the machinery of intelligence and offer suggestions.
The first set concerns animal-level versus human-level intelligence. The
without knowing the initial positions or shapes of the box or the
second concerns the urgent need to address ethical issues when
creatures, and without knowing the trajectory of the box. evaluating the state of artificial intelligence.
Second, simulation engines predict how a system will develop
by tracing its state in detail over a sequence of closely spaced Lake et al. present an insightful survey of the state of the art in
instances. For example, Battaglia et al. (2013) use an existing artificial intelligence (AI) and offer persuasive proposals for feasi-
physics engine to model how humans reason about an unstable ble future steps. Their ideas of “start-up software” and tools for
tower of blocks collapsing to the floor. The physics engine gener- rapid model learning (sublinguistic “compositionality” and “learn-
ates a trace of the exact positions, velocities of every block, and the ing-to-learn”) help pinpoint the sources of general, flexible intelli-
forces between them, at a sequence of instants a fraction of a gence. Their concrete examples using the Character Challenge
second apart. There is no evidence that humans routinely gener- and Frostbite Challenge forcefully illustrate just how behaviorally
ate comparably detailed traces or even that they are capable of effective human learning can be compared with current achieve-
doing so. Conversely, people are capable of predicting character- ments in machine learning. Their proposal that such learning is
istics of an end state for problems where it is impossible to predict the result of “metacognitive processes” integrating model-based
the intermediate states in detail, as the example of the creatures in and model-free learning is tantalizingly suggestive, pointing
the box illustrates. toward novel ways of explaining intelligence. So, in a sympathetic

34 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
spirit, we offer some suggestions. The first set concerns casting a over the results such AIs deliver. One way to encourage this is to
wider view of explananda and, hence, potential explanantia establish firm policies of disclosure of all known gaps and inabili-
regarding intelligence. The second set concerns the need to con- ties in AIs (much like the long lists of side effects of medications).
front ethical concerns as AI research advances. Furthermore, we should adopt the requirement that such lan-
Lake et al.’s title speaks of “thinking like humans” but most of guage-using AIs must have an initiation period in which their
the features discussed—use of intuitive physics, intuitive psychol- task is to tutor users, treating them as apprentices and not
ogy, and relying on “models”—are features of animal thinking as giving any assistance until the user has established a clear level
well. Not just apes or mammals, but also birds and octopuses of expertise. Such expertise would not be in the fine details of
and many other animals have obviously competent expectations the AIs’ information, which will surely outstrip any human
about causal links, the reactions of predators, prey and conspecifics, being’s knowledge, but in the limitations of the assistance on
and must have something like implicit models of the key offer and the responsibility that remains in the hands of the
features in their worlds—their affordances, to use Gibson’s user. Going forward, it is time for evaluations of the state of AI
(1979) term. to include consideration of such moral matters.
Birds build species-typical nests they have never seen built,
improving over time, and apes know a branch that is too weak
to hold them. We think the authors’ term intuitive physics
engine is valuable because unlike “folk physics,” which suggests Evidence from machines that learn and think
a theory, it highlights the fact that neither we, nor animals in like people
general, need to understand from the outset the basic predictive
machinery we are endowed with by natural selection. We doi:10.1017/S0140525X17000139, e264
humans eventually bootstrap this behavioral competence into
reflective comprehension, something more like a theory and Kenneth D. Forbusa and Dedre Gentnerb
something that is probably beyond language-less animals. a
Department of Computer Science, Northwestern University, Evanston, IL
So, once sophisticated animal-level intelligence is reached, 60208; bDepartment of Psychology, Northwestern University, Evanston, IL
there will remain the all-important step of bridging the gap to 60208.
human-level intelligence. Experiments suggest that human chil- [email protected] [email protected]
dren differ from chimpanzees primarily with respect to social http://www.cs.northwestern.edu/~forbus/
knowledge (Herrmann et al. 2007; 2010). Their unique forms of http://groups.psych.northwestern.edu/gentner/
imitation and readiness to learn from teachers suggest means by
which humans can accumulate and exploit an “informational com- Abstract: We agree with Lake et al.’s trenchant analysis of deep learning
monwealth” (Kiraly et al. 2013; Sterelny 2012; 2013). This is most systems, including that they are highly brittle and that they need vastly
likely part of the story of how humans can become as intelligent as more examples than do people. We also agree that human cognition
they do. But the missing part of that story remains internal mech- relies heavily on structured relational representations. However, we
differ in our analysis of human cognitive processing. We argue that (1)
anisms, which Lake et al. can help us focus on. Are the unique analogical comparison processes are central to human cognition; and (2)
social skills developing humans deploy because of enriched intuitive physical knowledge is captured by qualitative representations,
models (“intuitive psychology” say), novel models (ones with prin- rather than quantitative simulations.
ciples of social emulation and articulation), or more powerful abil-
ities to acquire and enrich models (learning-to-learn)? The answer Capturing relational capacity. We agree with Lake et al. that
probably appeals to some combination. But we suggest that con- structured relational representations are essential for human cog-
necting peculiarly human ways of learning from others to Lake nition. But that raises the question of how such representations
et al.’s “learning-to-learn” mechanisms may be particularly fruitful are acquired and used. There is abundant evidence from both
for fleshing out the latter – and ultimately illuminating to the children and adults that structure mapping (Gentner 1983) is a
former. major route to acquiring and using knowledge. For example, phys-
The step up to human-style comprehension carries moral impli- icists asked to solve a novel problem spontaneously use analogies
cations that are not mentioned in Lake et al.’s telling. Even the to known systems (Clement 1988), and studies of working micro-
most powerful of existing AIs are intelligent tools, not colleagues, biology laboratories reveal that frequent use of analogies is a major
and whereas they can be epistemically authoritative (within limits determinant of success (Dunbar 1995). In this respect, children
we need to characterize carefully), and hence will come to be are indeed like little scientists. Analogical processes support child-
relied on more and more, they should not be granted moral ren’s learning of physical science (Chen & Klahr 1999; Gentner
authority or responsibility because they do not have skin in the et al. 2016) and mathematics (Carey 2009; Mix 1999; Richland
game: they do not yet have interests, and simulated interests are & Simms 2015). Analogy processes pervade everyday reasoning
not enough. We are not saying that an AI could not be created as well. People frequently draw inferences from analogous situa-
to have genuine interests, but that is down a very long road tions, sometimes without awareness of doing so (Day &
(Dennett 2017; Hurley et al. 2011). Although some promising Gentner 2007).
current work suggests that genuine human consciousness Moreover, computational models of structure mapping’s
depends on a fundamental architecture that would require matching, retrieval, and generalization operations have been
having interests (Deacon 2012; Dennett 2013), long before that used to simulate a wide range of phenomena, including geometric
day arrives, if it ever does, we will have AIs that can communicate analogies, transfer learning during problem solving, and moral
with natural language with their users (not collaborators). decision making (Forbus et al. 2017). Simulating humans on
How should we deal, ethically, with these pseudo-moral agents? these tasks requires between 10 and 100 relations per example.
One idea, inspired in part by recent work on self-driving cars This is a significant gap. Current distributed representations
(Pratt 2016), is that instead of letting them be autonomous, they have difficulty handling even one or two relations.
should be definitely subordinate: co-pilots that help but do not Even visual tasks, such as character recognition, are more com-
assume responsibility for the results. We must never pass the pactly represented by a network of relationships and objects than
buck to the machines, and we should take steps now to ensure by an array of pixels, which is why human visual systems compute
that those who rely on them recognize that they are strictly edges (Marr 1983; Palmer 1999). Further, the results from adver-
liable for any harm that results from decisions they make with sarial training indicate that deep learning systems do not construct
the help of their co-pilots. The studies by Dietvorst et al. (2015; human-like intermediate representations (Goodfellow et al. 2015;
2016; see Hutson 2017) suggest that people not only tend to dis- see also target article). In contrast, there is evidence that a struc-
trust AIs, but also want to exert control, and hence responsibility, tured representation approach can provide human-like visual

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 35
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
processing. For example, a model that combines analogy with What can the brain teach us about building
visual processing of relational representations has achieved artificial intelligence?
human-level performance on Raven’s Progressive Matrices test
(Lovett & Forbus 2017). Using analogy over relational representa- doi:10.1017/S0140525X17000140, e265
tions may be a superior approach even for benchmark machine
learning tasks. For example, on the link plausibility task, in which Dileep George
simple knowledge bases (Freebase, WordNet) are analyzed so Vicarious, Union City, CA 94587.
that the plausibility of new queries can be estimated (e.g., Is [email protected] www.vicarious.com
Barack Obama Kenyan?), a combination of analogy and structured
logistic regression achieved state-of-the-art performance, with Abstract: Lake et al. offer a timely critique on the recent accomplishments
orders of magnitude fewer training examples than distributed rep- in artificial intelligence from the vantage point of human intelligence and
resentation systems (Liang & Forbus 2015). Because structure provide insightful suggestions about research directions for building more
mapping allows the use of relational representations, the system human-like intelligence. Because we agree with most of the points they
also provided explanations, the lack of which is a significant draw- raised, here we offer a few points that are complementary.
back of distributed representations.
Causality and qualitative models. Lake et al. focus on Bayesian The fact that “airplanes do not flap their wings” is often offered as
techniques and Monte Carlo simulation as their alternative expla- a reason for not looking to biology for artificial intelligence (AI)
nation for how human cognition works. We agree that statistics are insights. This is ironic because the idea that flapping is not
important, but they are insufficient. Specifically, we argue that required to fly, could easily have originated from observing
analogy provides exactly the sort of rapid learning and reasoning eagles soaring on thermals. The comic strip in Figure 1 offers a
that human cognition exhibits. Analogy provides a means of trans- humorous take on the current debate in AI. A flight researcher
ferring prior knowledge. For example, the Companion cognitive who does not take inspiration from birds defines an objective
architecture can use rich relational representations and analogy function for flight and ends up creating a catapult. Clearly, a cat-
to perform distant transfer. Learning games with a previously apult is an extremely useful invention. It can propel objects
learned analogous game led to more rapid learning than learning through the air, and in some cases, it can even be a better alterna-
without such an analog (Hinrichs & Forbus 2011). This and many tive to flying. Just as researchers who are interested in building
other experiments suggest that analogy not only can explain “real flight” would be well advised to pay close attention to the dif-
human transfer learning, but also can provide new techniques ferences between catapult flight and bird flight, researchers who
for machine learning. are interested in building “human-like intelligence” or artificial
Our second major claim is that qualitative representations – not general intelligence (AGI) would be well advised to pay attention
quantitative simulations – provide much of the material of our to the differences between the recent successes of deep learning
conceptual structure, especially for reasoning about causality and human intelligence. We believe the target article delivers on
(Forbus & Gentner 1997). Human intuitive knowledge concerns that front, and we agree with many of its conclusions.
relationships such as “the higher the heat, the quicker the water Better universal algorithms or more inductive biases? Learning
will boil,” not the equations of heat flow. Qualitative representa- and inference are instances of optimization algorithms. If we could
tions provide symbolic, relational representations of continuous derive a universal optimization algorithm that works well for all
properties and an account of causality organized around processes data, the learning and inference problems for building AGI
of change. They enable commonsense inferences to be made with would be solved as well. Researchers who work on assumption-
little information, using qualitative mathematics. Decades of suc- free algorithms are pushing the frontier on this question.
cessful models have been built for many aspects of intuitive Exploiting inductive biases and the structure of the AI problem
physics, and such models have also been used to ground scientific makes learning and inference more efficient. Our brains show
and engineering reasoning (Forbus 2011). Moreover, qualitative remarkable abilities to perform a wide variety of tasks on data
models can explain aspects of social reasoning, including blame that look very different. What if all of these different tasks and
assignment (Tomai & Forbus 2008) and moral decision making data have underlying similarities? Our view is that biological evo-
(Dehghani et al. 2008), suggesting that they are important in intu- lution, by trial and error, figured out a set of inductive biases that
itive psychology as well. work well for learning in this world, and the human brain’s effi-
We note two lines of qualitative reasoning results that are par- ciency and robustness derive from these biases. Lake et al. note
ticularly challenging for simulation-based accounts. First, qualita- that many researchers hope to overcome the need for inductive
tive representations provide a natural way to express some aspects biases by bringing biological evolution into the fold of the learning
of natural language semantics, for example, “temperature algorithms. We point out that biological evolution had the advan-
depends on heat” (McFate & Forbus 2016). This has enabled tage of using building blocks (proteins, cells) that obeyed the laws
Companions to learn causal models via reading natural language of the physics of the world in which these organisms were evolving
texts, thereby improving their performance in a complex strategy to excel. In this way, assumptions about the world were implicitly
game (McFate et al. 2014). Second, qualitative representations baked into the representations that evolution used. Trying to
combined with analogy been used to model aspects of conceptual evolve intelligence without assumptions might therefore be a sig-
change. For example, using a series of sketches to depict motion, a nificantly harder problem than biological evolution. AGI has one
Companion learns intuitive models of force. Further, it progresses existence proof – our brains. Biological evolution is not an exis-
from simple to complex models in an order that corresponds to tence proof for artificial universal intelligence.
the order found in children (Friedman et al. 2010). It is hard to At the same time, we think a research agenda for building AGI
see how a Monte Carlo simulation approach would capture could be synergistic with the quest for better universal algorithms.
either the semantics of language about processes or the findings Our strategy is to build systems that strongly exploit inductive
of the conceptual change literature. biases, while keeping open the possibility that some of those
Although we differ from Lake et al. in our view of intuitive assumptions can be relaxed by advances in optimization
physics and the role of analogical processing, we agree that algorithms.
rapid computation over structured representations is a major What kind of generative model is the brain? Neuroscience can
feature of human cognition. Today’s deep learning systems are help, not just cognitive science. Lake et al. offered several com-
interesting for certain applications, but we doubt that they are pelling arguments for using cognitive science insights. In addition
on a direct path to understanding human cognition. to cognitive science, neuroscience data can be examined to obtain

36 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people

Figure 1 (George). A humorous take on the current debate in artificial intelligence.

clues about what kind of generative model the brain implements Building brains that communicate like
and how this model differs from models being developed in the AI machines
community.
For instance, spatial lateral connections between oriented doi:10.1017/S0140525X17000152, e266
features are a predominant feature of the visual cortex and are
known to play a role in enforcing contour continuity. However, Daniel Graham
lateral connections are largely ignored in current generative Department of Psychology, Hobart & William Smith Colleges, Geneva, NY 14456.
models (Lee 2015). Another example is the factorization of [email protected] http://people.hws.edu/graham
contours and surfaces. Evidence indicates that contours and sur-
faces are represented in a factored manner in the visual cortex Abstract: Reverse engineering human cognitive processes may improve
(Zhou et al. 2000), potentially giving rise to the ability of artificial intelligence, but this approach implies we have little to learn
humans to imagine and recognize objects with surface regarding brains from human-engineered systems. On the contrary,
appearances that are not prototypical – like a blanket made of engineered technologies of dynamic network communication have many
bananas or a banana made of blankets. Similarly, studies on top- features that highlight analogous, poorly understood, or ignored aspects
of brain and cognitive function, and mechanisms fundamental to these
down attention demonstrate the ability of the visual cortex to technologies can be usefully investigated in brains.
separate out objects even when they are highly overlapping and
transparent (Cohen & Tong 2015). These are just a handful of Lake et al. cogently argue that artificial intelligence (AI) machines
examples from the vast repository of information on cortical would benefit from more “reverse engineering” of the human brain
representations and inference dynamics, all of which could be and its cognitive systems. However, it may be useful to invert this
used to build AGI. logic and, in particular, to use basic principles of machine commu-
The conundrum of “human-level performance”: Benchmarks nication to provide a menu of analogies and, perhaps, mechanisms
for AGI. We emphasize the meaninglessness of “human-level per- that could be investigated in human brains and cognition.
formance,” as reported in mainstream AI publications, and then We should consider that one of the missing components in deep
use as a yardstick to measure our progress toward AGI. Take learning models of cognition – and of most large-scale models of
the case of the DeepQ network playing “breakout” at a “human brain and cognitive function – is an understanding of how signals
level” (Mnih et al. 2015). We found that even simple changes to are selectively routed to different destinations in brains
the visual environment (as insignificant as changing the bright- (Graham 2014; Graham and Rockmore 2011).
ness) dramatically and adversely affect the performance of the Given that brain cells themselves are not motile enough to
algorithm, whereas humans are not affected by such perturbations selectively deliver messages to their destination (unlike cells in
at all. At this point, it should be well accepted that almost any nar- the immune system, for example), there must be a routing proto-
rowly defined task can be “solved” with brute force data and com- col of some kind in neural systems to accomplish this. This proto-
putation and that any use of “human-level” as a comparison should col should be relatively fixed in a given species and lineage, and
be reserved for benchmarks that adhere to the following princi- have the ability to be scaled up over development and evolution.
ples: (1) learning from few examples, (2) generalizing to distribu- Turning to machine communication as a model, each general
tions that are different from the training set, and (3) generalizing technological strategy has its advantages and ideal operating con-
to new queries (for generative models) and new tasks (in the case ditions (grossly summarized here for brevity):
of agents interacting with an environment).
Message passing-based algorithms for probabilistic models. Circuit switched (traditional landline telephony): high
Although the article makes good arguments in favor of structured throughput of dense real-time signals
probabilistic models, it is surprising that the authors mentioned only Message switched (postal mail): multiplexed, verifiable,
Markov chain Monte Carlo (MCMC) as the primary tool for infer- compact addresses
ence. Although MCMC has asymptotic guarantees, the speed of Packet switched (Internet): dynamic routing, sparse connec-
inference in many cortical areas is more consistent with message tivity, fault tolerance, scalability
passing (MP)-like algorithms, which arrive at maximum a posteriori
solutions using only local computations. Despite lacking theoretical We should expect that brains adopt analogous – if not homolo-
guarantees, MP has been known to work well in many practical gous – solutions when conditions require. For example, we would
cases, and recently we showed that it can be used for learning of expect something like circuit switching in somatosensory and
compositional features (Lázaro-Gredilla et al. 2016). There is motor output systems, which tend to require dense, real-time
growing evidence for the use of MP-like inference in cortical areas communication. However, we would expect a dynamic, possibly
(Bastos et al. 2012; George & Hawkins 2009), and MP could offer a packet-switched system in the visual system, given limited
happy medium where inference is fast, as in neural networks, windows of attention and acuity and the need for spatial remap-
while retaining MCMC’s capability for answering arbitrary queries ping, selectivity, and invariance (Olshausen et al. 1993; Poggio
on the model. 1984; Wiskott 2006; Wiskott and von der Malsburg 1996).

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 37
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
There could be hybrid routing architectures at work in brains and The importance of motivation and emotion
several that act concurrently (consider by way of analogy that it was for explaining human cognition
possible until recently for a single human communicator to use the
three switching protocols described above simultaneously). Indi- doi:10.1017/S0140525X17000164, e267
vidual components of a given routing system could also be selec-
tively employed in brains. For example, Fornito et al. (2016) C. Dominik Güssa and Dietrich Dörnerb
proposed a mechanism of deflection routing (which is used to a
Department of Psychology, University of North Florida, Jacksonville, FL
reroute signals around damaged or congested nodes), to explain 32224; bTrimberg Research Academy TRAc, Otto-Friedrich Universität
changes in functional connectivity following focal lesions. Bamberg, 96047 Bamberg, Germany.
Nevertheless, functional demands in human cognitive systems [email protected] [email protected]
appear to require a dynamic mechanism that could resemble a https://www.unf.edu/bio/N00174812
packet-switched system (Schlegel et al. 2015). As Lake et al. note, https://www.uni-bamberg.de/trac/senior-researchers/doerner
the abilities of brains to (1) grow and develop over time and (2) flex-
ibly, creatively, and quickly adapt to new events are essential to their Abstract: Lake et al. discuss building blocks of human intelligence that are
quite different from those of artificial intelligence. We argue that a theory
function. Packet switching as a general strategy may be more com- of human intelligence has to incorporate human motivations and emotions.
patible with these requirements than alternative architectures. The interaction of motivation, emotion, and cognition is the real strength of
In terms of growth, the number of Internet hosts – each of which human intelligence and distinguishes it from artificial intelligence.
can potentially communicate with any other within milliseconds –
has increased without major disruption over a few decades, to Lake et al. applaud the advances made in artificial intelligence
surpass the number of neurons in the cortex of many primates includ- (AI), but argue that future research should focus on the most
ing the macaque (Fasolo 2011). This growth has also been much impressive form of intelligence, namely, natural/human intelli-
faster than the growth of the message-switched U.S. Postal Service gence. In brief, the authors argue that AI does not resemble
(Giambene 2005; U.S. Postal Service 2016). Cortical neurons, like human intelligence. The authors then discuss the building
Internet hosts, are separated by relatively short network distances, blocks of human intelligence, for example, developmental start-
and have the potential for communication along many possible up software including intuitive physics and intuitive psychology,
routes within milliseconds. Communication principles that allowed and learning as a process of model building based on composition-
for the rapid rise and sustained development of the packet-switched ality and causality, and they stress that “people never start
Internet may provide insights relevant to understanding how evolu- completely from scratch” (sect. 3.2, last para.)
tion and development conspire to generate intelligent brains. We argue that a view of human intelligence that focuses solely
In terms of adapting quickly to new situations, Lake et al. point on cognitive factors misses crucial aspects of human intelligence.
out that a fully trained artificial neural network generally cannot In addition to cognition, a more complete view of human intelli-
take on new or different tasks without substantial retraining and gence must incorporate motivation and emotion, a viewpoint
reconfiguration. Perhaps this is not so much a problem of compu- already stated by Simon: “Since in actual human behavior
tation, but rather one of routing: in neural networks, one com- motive and emotion are major influences on the course of cogni-
monly employs a fixed routing system, all-to-all connectivity tive behavior, a general theory of thinking and problem solving
between layers, and feedback only between adjacent layers. must incorporate such influences” (Simon 1967, p. 29; see also
These features may make such systems well suited to learning a Dörner & Güss 2013).
particular input space, but ill suited to flexible processing and effi- Incorporating motivation (e.g., Maslow 1954; Sun 2016) in
cient handling of new circumstances. Although a packet-switched computational models of human intelligence can explain where
routing protocol would not necessarily improve current deep goals come from. Namely, goals come from specific needs, for
learning systems, it may be better suited to modeling approaches example, from existential needs such as hunger or pain avoidance;
that more closely approximate cortical networks’ structure and sexual needs; the social need for affiliation, to be together with
function. Unlike most deep learning networks, the brain appears other people; the need for certainty related to unpredictability
to largely show dynamic routing, sparse connectivity, and feed- of the environment; and the need for competence related to inef-
back among many hierarchical levels. Including such features in fective coping with problems (Dörner 2001; Dörner & Güss
computational models may better approximate and explain biolog- 2013). Motivation can explain why a certain plan has priority
ical function, which could in turn spawn better AI. and why it is executed, or why a certain action is stopped. Lake
Progress in understanding routing in the brain is already being et al. acknowledge the role of motivation in one short paragraph
made through simulations of dynamic signal flow on brain-like net- when they state: “There may also be an intrinsic drive to reduce
works and in studies of brains themselves. Mišić et al. (2014) have uncertainty and construct models of the environment” (sect.
investigated how Markovian queuing networks (a form of message- 4.3.2, para. 4). This is right. However, what is almost more impor-
switched architecture) with primate brain-like connectivity could tant is the need for competence, which drives people to explore
take advantage of small-world and rich-club topologies. Comple- new environments. This is also called diversive exploration
menting this work, Sizemore et al. (2016) have shown that the (e.g., Berlyne 1966). Without diversive exploration, mental
abundance of weakly interconnected brain regions suggests a models could not grow, because people would not seek new
prominent role for parallel processing, which would be well experiences (i.e., seek uncertainty to reduce uncertainty
suited to dynamic routing. Using algebraic topology, Sizemore afterward).
et al. (2016) provide evidence that human brains show loops of Human emotion is probably the biggest difference between
converging or diverging signal flow (see also Granger 2006). In people and AI machines. Incorporating emotion into computa-
terms of neurophysiology, Briggs and Usrey (2007) have shown tional models of human intelligence can explain some aspects
that corticothalamic networks can pass signals in a loop in just 37 that the authors discuss as “deep learning” and “intuitive
milliseconds. Such rapid feedback is consistent with the notion psychology.” Emotions are shortcuts. Emotions are the frame-
that corticothalamic signals could function like the “ack” (acknowl- work in which cognition happens (e.g., Bach 2009; Dörner
edgment) system used on the Internet to ensure packet delivery 2001). For example, not reaching an important goal can make a
(Graham 2014; Graham and Rockmore 2011). person angry. Anger then characterizes a specific form of
In conclusion, it is suggested that an additional “core ingredient perception, planning, decision making, and behavior. Anger
of human intelligence” is dynamic information routing of a kind means high activation, quick and rough perception, little planning
that may mirror the packet-switched Internet, and cognitive scien- and deliberation, and making a quick choice. Emotions modulate
tists and computer engineers alike should be encouraged to inves- human behavior; the how of the behavior is determined by the
tigate this possibility. emotions.

38 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
In other situations, emotions can trigger certain cognitive pro- understand how human abilities can emerge without assuming
cesses. In some problem situations, for example, a person would special start-up software will be most helpful in explicating the
get an “uneasy” feeling when all solution attempts do not result in nature of human cognition.
a solution. This uneasiness can be the start of metacognition. The The explicit compositional approach of Lake et al. is limited
person will start reflecting on his or her own thinking: “What did I because it downplays the often complex interactions between the
do wrong? What new solution could I try?” In this sense, human multitude of contextual variables in the task settings in which the
intelligence controls itself, reprogramming its own programs. representation is used. Avoiding a commitment to symbolic com-
And what is the function of emotions? The function of emotions positionality increases one’s flexibility to respond to sometimes
is to adjust behavior to the demands of the current situation. subtle influences of context and allows for the possibility of more
Perhaps emotions can partly explain why humans learn “rich robust learning across contexts. The recent startling improvements
models from sparse data” (sect. 4.3, para. 1), as the authors state. in computer vision (Krizhevsky et al. 2012), machine translation
A child observing his or her father smiling and happy when watch- (Johnson et al. 2016), and question answering (Weston et al.
ing soccer does not need many trials to come to the conclusion that 2015a) were possible, precisely because they avoided these limita-
soccer must be something important that brings joy. tions by foregoing symbolic compositionality altogether.
In brief, a theory or a computational model of human intelli- Although Lake et al. seek to take the computational-level “high
gence that focuses solely on cognition is not a real theory of ground” (Marr 1982), their representational commitments also
human intelligence. As the authors state, “Our machines need constrain the inferential procedures on which they rely. Their
to confront the kinds of tasks that human learners do.” This modeling work relies on the use of combinatorially explosive
means going beyond the “simple” Atari game Frostbite. In Frost- search algorithms. This approach can be effective in a specific
bite, the goal was well defined (build an igloo). The operations and limited domain (such as Omniglot), precisely because the
obstacles were known (go over ice floes without falling in the startup software can be hand selected by the modeler to match
water and without being hit by objects/animals). The more the specific requirements of that specific domain. However,
complex, dynamic, and “real” such tasks become – as has been their approach avoids the hard question of where this startup soft-
studied in the field of Complex Problem Solving or Dynamic ware came from. Appeals to evolution, although they may be plau-
Decision Making (e.g., Funke 2010; Güss Tuason & Gerhard sible for some tasks, seem out of place in domains of recent
2010), the more human behavior will show motivational, cogni- human invention such as character-based writing systems. Also,
tive, and emotional processes in their interaction. This interaction because many naturalistic learning contexts are far more open
of motivation, cognition, and emotion, is the real strength of ended, combinatorial search is not a practical algorithmic strategy.
human intelligence compared with artificial intelligence. Here, the gradient-based methods of neural networks have
proven far more effective (see citations above).
We believe learning research will be better off taking a domain
general approach wherein the startup software used when one
Building on prior knowledge without encounters a task as an experienced adult human learner is the
building it in experience and prior knowledge acquired through a domain
general learning process.
doi:10.1017/S0140525X17000176, e268 Most current deep learning models, however, do not build on
prior experience. For example, the network in Mnih et al.
Steven S. Hansen,a Andrew K. Lampinen,a Gaurav Suri,b and (2013) that learns Atari games was trained from scratch on each
James L. McClellanda new problem encountered. This is clearly not the same as
a
Psychology Department, Stanford University, Stanford, CA 94305; human learning, which builds cumulatively on prior learning.
b
Psychology Department, San Francisco State University, San Francisco, Humans learn complex skills in a domain after previously learning
CA 94132. simpler ones, gradually building structured knowledge as they
[email protected] [email protected] learn. In games like Chess or Go, human learners can receive
[email protected] [email protected] feedback not only on the outcome of an entire game – did the
http://www.suriradlab.com/ https://web.stanford.edu/group/pdplab/ learner succeed or fail? – but also on individual steps in an
action sequence. This sort of richer feedback can easily be incor-
Abstract: Lake et al. propose that people rely on “start-up software,” porated into neural networks, and doing so can enhance learning
“causal models,” and “intuitive theories” built using compositional (Gülçehre and Bengio 2016).
representations to learn new tasks more efficiently than some deep neural An important direction is to explore how humans learn from a
network models. We highlight the many drawbacks of a commitment to
compositional representations and describe our continuing effort to
rich ensemble of multiple, partially related tasks. The steps of a
explore how the ability to build on prior knowledge and to learn new sequential task can be seen as mutually supporting subtasks, and
tasks efficiently could arise through learning in deep neural networks. a skill, such as playing chess can be seen as a broad set of related
tasks beyond selecting moves: predicting the opponent’s moves,
Lake et al. have laid out a perspective that builds on earlier work explaining positions, and so on. One reason humans might be
within the structured/explicit probabilistic cognitive modeling able to learn from fewer games than a neural network trained on
framework. They have identified several ways in which humans playing chess as a single integrated task is that humans receive feed-
with existing domain knowledge can quickly acquire new back on many of these tasks throughout learning, and this both
domain knowledge and deploy their knowledge flexibly. Lake allows more feedback from a single experience (e.g., both an emo-
et al. also make the argument that the key to understanding tional reward for capturing a piece and an explanation of the tactic
these important human abilities is the use of “start-up software,” from a teacher) and constrains the representations that can emerge
“causal models,” and “intuitive theories” that rely on a composi- (they must support all of these related subtasks). Such constraints
tional knowledge representation of the kind advocated by, for amount to extracting shared principles that allow for accelerated
example, Fodor and Pylyshyn (1988). learning when encountering other tasks that use them. One
We agree that humans can often acquire new domain knowl- example is training a recurrent network on translation tasks
edge quickly and can often generalize this knowledge to new between multiple language pairs, which can lead to zero-shot (no
examples and use it in flexible ways. However, we believe that training necessary) generalization, to translation between unseen
human knowledge acquisition and generalization can be under- language pairs (Johnson et al. 2016). Just as neural networks can
stood without building in a commitment to domain-specific exhibit rulelike behavior without building in explicit rules, we
knowledge structures or compositional knowledge representation. believe that they may not require a compositional, explicitly sym-
We therefore expect that continuing our longstanding effort to bolic form of reasoning to produce human-like behavior.

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 39
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
Indeed, recent work on meta-learning (or learning-to-learn) in levels of description to understand, and to be able to invent, intel-
deep learning models provides a base for making good on this ligent machines and computational theories of human intelligence.
claim (Bartunov and Vetrov 2016; Santoro et al. 2016; Vinyals Pattern recognition was a natural first step toward understand-
et al. 2016). The appearance of rapid learning (e.g., one-shot clas- ing human intelligence. This essential component mechanism has
sification) is explained as slow, gradient-based learning on a meta- been conquered by taking inspiration from the brain. Machines
problem (e.g., repeatedly solving one-shot classification problems could not do core object recognition (DiCarlo et al. 2012) until
drawn from a distribution). Although the meta-tasks used in these a few years ago (Krizhevsky et al. 2012). Brain-inspired neural net-
first attempts only roughly reflect the training environment that works gave us machines that can recognize objects robustly under
humans face (we probably do not face explicit one-shot classifica- natural viewing conditions. As we move toward higher cognitive
tion problems that frequently), the same approach could be used functions, we might expect that it will continue to prove fruitful
with meta-tasks that are extremely common as a result of sociocul- to think about cognition in the context of its implementation in
tural conventions, such as “follow written instructions,” “incorpo- the brain. To understand how humans learn and think, we need
rate comments from a teacher,” and “give a convincing to understand how brains adapt and compute.
explanation of your behavior.” A neural network model may require more time to train than
Fully addressing the challenges Lake et al. pose – rather than humans. This reflects the fact that current models learn from
building in compositional knowledge structures that will ulti- scratch. Cognitive models, like Bayesian program learning (Lake
mately prove limiting – is a long-term challenge for the science et al. 2015a), rely more strongly on built-in knowledge. Their
of learning. We expect meeting this challenge to take time, but inferences require realistically small amounts of data, but unreal-
that the time and effort will be well spent. We would be istically large amounts of computation, and, as a result, their high-
pleased if Lake et al. would join us in this effort. Their participa- level feats of cognition do not always scale to complex real-world
tion would help accelerate progress toward a fuller understanding challenges. To explain human cognition, we must care about effi-
of how advanced human cognitive abilities arise when humans are cient implementation and scalability, in addition to the goals of
immersed in the richly structured learning environments that computation. Studying the brain can help us understand the rep-
have arisen in human cultures and their educational systems. resentations and dynamics that support the efficient implementa-
tion of cognition (e.g., Aitchison & Lengyel 2016).
The brain seamlessly merges bottom-up discriminative and top-
down generative processes into a rapidly converging process of
Building machines that adapt and compute inference that combines the advantages of both: the rapidity of dis-
like brains criminative inference and the flexibility and precision of generative
inference (Yildirim et al. 2015). The brain’s inference process
doi:10.1017/S0140525X17000188, e269 appears to involve recurrent cycles of message passing at multiple
scales, from local interactions within an area to long-range interac-
Nikolaus Kriegeskorte and Robert M. Mok tions between higher- and lower-level representations.
Medical Research Council Cognition and Brain Sciences Unit, Cambridge, As long as major components of human intelligence are out of
CB2 7EF, United Kingdom. the reach of machines, we are obviously far from understanding
[email protected] the human brain and cognition. As more and more component
[email protected] tasks are conquered by machines, the question of whether they
do it “like humans” will come to the fore. How should we define
Abstract: Building machines that learn and think like humans is essential not “human-like” learning and thinking? In cognitive science, the
only for cognitive science, but also for computational neuroscience, whose empirical support for models comes from behavioral data. A
ultimate goal is to understand how cognition is implemented in biological model must not only reach human levels of task performance,
brains. A new cognitive computational neuroscience should build
but also predict detailed patterns of behavioral responses (e.g.,
cognitive-level and neural-level models, understand their relationships, and
test both types of models with both brain and behavioral data. errors and reaction times on particular instances of a task).
However, humans are biological organisms, and so “human-
Lake et al.’s timely article puts the recent exciting advances with like” cognition should also involve the same brain representa-
neural network models in perspective, and usefully highlights the tions and algorithms that the human brain employs. A good
aspects of human learning and thinking that these models do not model should somehow match the brain’s dynamics of informa-
yet capture. Deep convolutional neural networks have conquered tion processing.
pattern recognition. They can rapidly recognize objects as humans Measuring the similarity of processing dynamics between a
can, and their internal representations are remarkably similar to model and a brain has to rely on summary statistics of the activity
those of the human ventral stream (Eickenberg et al. 2016; and may be equally possible for neural and cognitive models. For
Güçlü & van Gerven 2015; Khaligh-Razavi & Kriegeskorte 2014; neural network models, a direct comparison may seem more trac-
Yamins et al. 2014). However, even at a glance, we understand table. We might map the units of the model onto neurons in the
visual scenes much more deeply than current models. We bring brain. However, even two biological brains of the same species
complex knowledge and dynamic models of the world to bear on will have different numbers of neurons, and any given neuron
the sensory data. This enables us to infer past causes and future may be idiosyncratically specialized, and may not have an exact
implications, with a focus on what matters to our behavioral match in the other brain. For either a neural or a cognitive
success. How can we understand these processes mechanistically? model, we may find ways to compare the internal model represen-
The top-down approach of cognitive science is one required tations to representations in brains (e.g., Kriegeskorte & Diedrich-
ingredient. Human behavioral researchers have an important sen 2016; Kriegeskorte et al. 2008). For example, one could test
role in defining the key challenges for model engineering by intro- whether the visual representation of characters in high-level
ducing tasks where humans still outperform the best models. visual regions reflects the similarity predicted by the generative
These tasks serve as benchmarks, enabling model builders to model of character perception proposed by Lake et al. (2015a).
measure progress and compare competing approaches. Cognitive The current advances in artificial intelligence re-invigorate the
science introduced task-performing computational models of cog- interaction between cognitive science and computational neurosci-
nition. Task-performing models are also essential for neuroscience, ence. We hope that the two can come together and combine their
whose theories cannot deliver explicit accounts of intelligence empirical and theoretical constraints, testing cognitive and neural
without them (Eliasmith & Trujillo 2014). The current construc- models with brain and behavioral data. An integrated cognitive com-
tive competition between modeling at the cognitive level and mod- putational neuroscience might have a shot at the task that seemed
eling at the neural level is inspiring and refreshing. We need both impossible a few years ago: understanding how the brain works.

40 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people

Will human-like machines make human-like been linked to people’s acceptance of alternative medicine, poten-
mistakes? tially leading an individual to choose an ineffective treatment over
evidence-based treatments, sometimes at extreme personal risk
doi:10.1017/S0140525X1700019X, e270 (Lindeman 2011).
Causal models, especially those that affect beliefs about treat-
Evan J. Livesey, Micah B. Goldwater, and Ben Colagiuri ment efficacy, can even influence physiological responses to
School of Psychology, The University of Sydney, NSW 2006, Australia. medical treatments. In this case, known as the placebo effect,
[email protected] [email protected] beliefs regarding a treatment can modulate the treatment
[email protected] response, positively or negatively, independently of whether a
http://sydney.edu.au/science/people/evan.livesey.php genuine treatment is delivered (Colagiuri et al. 2015). The
http://sydney.edu.au/science/people/micah.goldwater.php placebo effect is caused by a combination of expectations driven
http://sydney.edu.au/science/people/ben.colagiuri.php by causal beliefs and associative learning mechanisms that are
more analogous to the operations of simple neural networks. Asso-
Abstract: Although we agree with Lake et al.’s central argument, there are ciative learning algorithms, of the kind often used in neural net-
numerous flaws in the way people use causal models. Our models are often works, are surprisingly susceptible to illusory correlations, for
incorrect, resistant to correction, and applied inappropriately to new example, when a treatment actually has no effect on a medical
situations. These deficiencies are pervasive and have real-world outcome (Matute et al. 2015). Successfully integrating two differ-
consequences. Developers of machines with similar capacities should ent mechanisms for knowledge generation (neural networks and
proceed with caution.
causal models), when each individually may be prone to bias, is
Lake et al. present a compelling case for why causal model-build- an interesting problem, not unlike the challenge of understanding
ing is a key component of human learning, and we agree that the nature of human learning. Higher-level beliefs interact in
beliefs about causal relations need to be captured by any convinc- numerous ways with basic learning and memory mechanisms,
ingly human-like approach to artificial intelligence (AI). Knowl- and the precise nature and consequences of these interactions
edge of physical relations between objects and psychological remain unknown (Thorwart & Livesey 2016).
relations between agents brings huge advantages. It provides a Even when humans hold an appropriate causal model, they often
wealth of transferable information that allows humans to quickly fail to use it. When facing a new problem, humans often erroneously
apprehend a new situation. As such, combining the computational draw upon models that share superficial properties with the current
power of deep-neural networks with model-building capacities problem, rather than those that share key structural relations (Gick &
could indeed bring solutions to some of the world’s most pressing Holyoak 1980). Even professional management consultants, whose
problems. However, as advantageous as causal model-building job it is to use their prior experiences to help businesses solve
might be, it also brings problems that can lead to flawed learning novel problems, often fail to retrieve the most relevant prior experi-
and reasoning. We therefore ask, would making machines ence to the new problem (Gentner et al. 2009). It is unclear whether
“human-like” in their development of causal models also make an artificial system that possesses mental modelling capabilities
those systems flawed in human-like ways? would suffer the same limitations. On the one hand, they may be
Applying a causal model, especially one based on intuitive under- caused by human processing limitations. For example, effective
standing, is essentially a gamble. Even though we often feel like we model-based decision-making is associated with capacities for learn-
understand the physical and psychological relations surrounding us, ing and transferring abstract rules (Don et al. 2016), and for cognitive
our causal knowledge is almost always incomplete and sometimes control (Otto et al. 2015), which may potentially be far more power-
completely wrong (Rozenblit & Keil 2002). These errors may be ful in future AI systems. On the other hand, the power of neural net-
an inevitable part of the learning process by which models are works lies precisely in their ability to encode rich featural and
updated based on experience. However, there are many examples contextual information. Given that experience with particular
in which incorrect causal models persist, despite strong counterevi- causal relations is likely to correlate with experience of more super-
dence. Take the supposed link between immunisation and autism. ficial features, a more powerful AI model generator may still suffer
Despite the science and the author of the original vaccine-autism similar problems when faced with the difficult decision of which
connection being widely and publicly discredited, many continue model to apply to a new situation.
to believe that immunisation increases the risk of autism and their Would human-like AI suffer human-like flaws, whereby recalci-
refusal to immunise has decreased the population’s immunity to pre- trant causal models lead to persistence with poor solutions, or
ventable diseases (Larson et al. 2011; Silverman & Hendrix 2015). novel problems activate inappropriate causal models? Developers
Failures to revise false causal models are far from rare. In fact, of AI systems should proceed with caution, as these properties of
they seem to be an inherent part of human reasoning. Lewandow- human causal modelling produce pervasive biases, and may be
sky and colleagues (2012) identify numerous factors that increase symptomatic of the use of mental models rather than the limita-
resistance to belief revision, including several that are societal- tions on human cognition. Monitoring the degree to which AI
level (e.g., biased exposure to information) or motivational (e.g., systems show the same flaws as humans will be invaluable for
vested interest in retaining a false belief). Notwithstanding the sig- shedding light on why human cognition is the way it is and, it is
nificance of these factors (machines too can be influenced by hoped, will offer some solutions to help us change our minds
biases in data availability and the motives of their human develop- when we desperately need to.
ers), it is noteworthy that people still show resistance to updating
their beliefs even when these sources of bias are removed, espe-
cially when new information conflicts with the existing causal
model (Taylor & Ahn 2012).
Flawed causal models can also be based on confusions that are
Benefits of embodiment
less easily traced to specific falsehoods. Well-educated adults reg- doi:10.1017/S0140525X17000206, e271
ularly confuse basic ontological categories (Chi et al. 1994), dis-
tinctions between mental, biological, and physical phenomena Bruce James MacLennan
that are fundamental to our models of the world and typically Department of Electrical Engineering and Computer Science, University of
acquired in childhood (Carey 2011). A common example is the Tennessee, Knoxville, TN 37996.
belief that physical energy possesses psychological desires and [email protected] http://web.eecs.utk.edu/~mclennan
intentions – a belief that even some physics students appear to
endorse (Svedholm & Lindeman 2013). These errors affect both Abstract: Physical competence is acquired through animals’ embodied
our causal beliefs and our choices. Ontological confusions have interaction with their physical environments, and psychological

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 41
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
competence is acquired through situated interaction with other agents. Implicit models are the sort that neural networks construct, gen-
The acquired neural models essential to these competencies are implicit erally in terms of large numbers of sub-symbolic variables,
and permit more fluent and nuanced behavior than explicit models. The densely interrelated. Implicit models often allow an approximate
challenge is to understand how such models are acquired and used to emergent symbolic description, but such descriptions typically
control behavior.
capture only the largest effects and interrelationships implicit in
The target article argues for the importance of “developmental the sub-symbolic model. Therefore, they may lack the subtlety
start-up software” (sects. 4.1 and 5.1), but neglects the nature of and context sensitivity of implicit models, which is why it is diffi-
that software and how it is acquired. The embodied interaction cult, if not impossible, to capture expert behavior in explicit
of an organism with its environment, provides a foundation for rules (Dreyfus & Dreyfus 1986). Therefore, terms such as “intui-
its understanding of “intuitive physics” and physical causality. tive physics,” “intuitive psychology,” and “theory of mind” are mis-
Animal nervous systems control their complex physical bodies in leading because they connote explicit models, but implicit models
their complex physical environments in real time, and this compe- (especially those acquired by virtue of embodiment and situated-
tence is a consequence of innate developmental processes and, ness) are more likely to be relevant to the sorts of learning dis-
especially in more complex species, subsequent developmental cussed in the target article. It is less misleading to refer to
processes that fine-tune neural control, such as prenatal and post- competencies, because humans and other animals can use their
natal “motor babbling” (non-goal-directed motor activity) (Meltz- physical and psychological understanding to behave competently
off & Moore 1997). Through these developmental processes, even in the absence of explicit models.
animals acquire a non-conceptual understanding of their bodies The target article shows the importance of hierarchical compo-
and physical environments, which provides a foundation for sitionality to the physical competence of humans and other animals
higher-order imaginative and conceptual physical understanding. (sect. 4.2.1); therefore, it is essential to understand how hierarchi-
Animals acquire physical competence through interaction with cal structure is represented in implicit models. Recognizing the
their environments (both phylogenetic through evolution and centrality of embodiment can help, for our bodies are hierarchi-
ontogenetic through development), and robots can acquire phys- cally articulated and our physical environments are hierarchically
ical competence similarly, for example, through motor babbling structured. The motor affordances of our bodies provide a basis
(Mahoor et al. 2016), and this is one goal of epigenetic and devel- for non-conceptual understanding of the hierarchical structure of
opmental robotics (Lungarella et al. 2003). In principle, compara- objects and actions. However, it iss important to recognize that
ble competence can be acquired by simulated physical agents hierarchical decompositions need not be unique; they may be
behaving in simulated physical environments, but it is difficult context dependent and subject to needs and interests, and a holis-
to develop sufficiently accurate physical simulations so that tic behavior may admit multiple incompatible decompositions.
agents acquire genuine physical competence (i.e., competence The target article points to the importance of simulation-based
in the real world, not some simulated world). It should be possible and imagistic inference (sect. 4.1.1). Therefore, we need to under-
to transfer physical competence from one agent to others that are stand how they are implemented through implicit models. Fortu-
sufficiently similar physically, but the tight coupling of body and nately, neural representations, such as topographic maps, permit
nervous system suggests that physical competence will remain analog transformations, which are better than symbolic digital
tied to a “form of life.” computation for simulation-based and imagistic inference. The
Animals are said to be situated because cognition primarily fact of neural implementation can reveal modes of information
serves behavior, and behavior is always contextual. For most processing and control beyond the symbolic paradigm.
animals, situatedness involves interaction with other animals; it Connectionism consciously abandoned the explicit models of
conditions the goals, motivations, and other factors that are caus- symbolic AI and cognitive science in favor of implicit, neural
ative in an animal’s own behavior, and can be projected onto other network models, which had a liberating effect on cognitive
agents, providing a foundation for “intuitive psychology.” Psycho- modeling, AI, and robotics. With 20-20 hindsight, we know
logical competence is grounded in the fact that animals are situ- that many of the successes of connectionism could have been
ated physical agents with interests, desires, goals, fears, and so achieved through existing statistical methods (e.g., Bayesian
on. Therefore, they have a basis for non-conceptual understanding inference), without any reference to the brain, but they were
of other agents (through imagination, mental simulation, projec- not. Progress had been retarded by the desire for explicit,
tion, mirror neurons, etc.). In particular, they can project their human-interpretable models, which connectionism abandoned
experience of psychological causality onto other animals. This psy- in favor of neural plausibility. We are ill advised to ignore the
chological competence is acquired through phylogenetic and brain again.
ontogenetic adaptation.
The problem hindering AI systems from acquiring psychologi-
cal competence is that most artificial agents do not have interests,
desires, goals, fears, and so on that they can project onto others or
use as a basis for mental simulation. For example, computer vision Understand the cogs to understand cognition
systems do not “care” in any significant way about the images they
process. Because we can be injured and die, because we can feel doi:10.1017/S0140525X17000218, e272
fear and pain, we perceive immediately (i.e., without the media-
tion of conceptual thought) the significance of a man being Adam H. Marblestone,a Greg Wayne,b and
dragged by a horse, or a family fleeing a disaster (Lake et al., Konrad P. Kordingc
a
Fig. 6). Certainly, through artificial evolution and reinforcement Synthetic Neurobiology Group, MIT Media Lab, Cambridge, MA 02474;
b
learning, we can train artificial agents to interact competently DeepMind, London N1 9DR, UK; cDepartments of Bioengineering and
with other (real or simulated) agents, but because they are a dif- Neuroscience, University of Pennsylvania, Philadelphia, PA 19104
ferent form of life, it will be difficult to give them the same [email protected] [email protected]
cares and concerns as we have and that are relevant to many of [email protected] http://www.adammarblestone.org/
our practical applications. www.kordinglab.com
The target article does not directly address the important dis-
Abstract: Lake et al. suggest that current AI systems lack the inductive
tinction between explicit and implicit models. Explicit models biases that enable human learning. However, Lake et al.’s proposed
are the sort scientists construct, generally in terms of symbolic biases may not directly map onto mechanisms in the developing brain. A
(lexical-level) variables; we expect to be able to understand explicit convergence of fields may soon create a correspondence between
models conceptually, to communicate them in language, and to biological neural circuits and optimization in structured architectures,
reason about them discursively (including mathematically). allowing us to systematically dissect how brains learn.

42 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
The target article by Lake et al. beautifully highlights limitations of procedure for sequentially unfolding inductive biases within brain
today’s artificial intelligence (AI) systems relative to the perfor- systems to produce a fully functional organism.
mance of human children and adults. Humans demonstrate A goal for both AI and neuroscience should be to advance both
uptake and generalization of concepts in the domains of intuitive fields to the point where they can have a useful conversation about
physics and psychology, decompose the world into reusable parts, the specifics. To do this, we need not only to build more human-
transfer knowledge across domains, and reason using models of like inductive biases into our machine learning systems, but also to
the world. As Lake et al. emphasize, and as is a mathematical understand the architectural primitives that are employed by the
necessity (Ho & Pepyne 2002), humans are not generic, universal brain to set up these biases. This has not yet been possible because
learning systems: they possess inductive biases that constrain and of the fragmentation and incompleteness of our neuroscience
guide learning for species-typical tasks. knowledge. For neuroscience to ask questions that directly
However, the target article’s characterization of these inductive inform the computational architecture, it must first cross more
biases largely overlooks how they may arise in the brain and how basic thresholds in understanding. To build a bridge with the
they could be engineered into artificial systems. Their particular intellectual frameworks used in machine learning, it must estab-
choice of inductive biases, though supported by psychological lish the neural underpinnings of optimization, cost functions,
research (see Blumberg [2005] for a critique), is in some ways memory access, and information routing. Once such thresholds
arbitrary or idiosyncratic: It is unclear whether these capabilities are crossed, we will be in a position – through a joint effort of neu-
are the key ones that enable human cognition, unclear whether roscience, cognitive science, and AI – to identify the brain’s actual
these inductive biases correspond to separable “modules” in any inductive biases and how they integrate into a single developing
sense, and, most importantly, unclear how these inductive system.
biases could actually be built. For example, the cognitive level
of description employed by Lake et al. gives little insight into
whether the systems underlying intuitive psychology and physics
comprise overlapping mechanisms. An alternative and plausible
view holds that both systems may derive from an underlying Social-motor experience and perception-
ability to make sensory predictions, conditioned on the effects action learning bring efficiency to machines
of actions, which could be bootstrapped through, for example,
motor learning. With present methods and knowledge, it is any- doi:10.1017/S0140525X1700022X, e273
body’s guess which of these possibilities holds true: an additional
source of constraint and inspiration seems needed. Ludovic Marina and Ghiles Mostafaouib
Lake et al. seem to view circuit and systems neuroscience as a
EuroMov Laboratory, University of Montpellier, Montpellier, France; bETIS
unable to provide strong constraints on the brain’s available com- Laboratory, Cergy-Pontoise University, 95302 Cergy Pontoise, France.
putational mechanisms – perhaps in the same way that transistors [email protected] [email protected]
place few meaningful constraints on the algorithms that may run http://euromov.eu/team/ludovic-marin/
on a laptop. However, the brain is not just a hardware level on
which software runs. Every inductive bias is a part of the Abstract: Lake et al. proposed a way to build machines that learn as fast as
genetic and developmental makeup of the brain. Indeed, people do. This can be possible only if machines follow the human
processes: the perception-action loop. People perceive and act to
whereas neuroscience has not yet produced a sufficiently well- understand new objects or to promote specific behavior to their
established computational description to decode the brain’s partners. In return, the object/person provides information that induces
inductive biases, we believe that this will change soon. In particu- another reaction, and so on.
lar, neuroscience may be getting close to establishing a more
direct correspondence between neural circuitry and the optimiza- The authors of the target article stated, “the interaction between
tion algorithms and structured architectures used in deep learn- representation and previous experience may be key to building
ing. For example, many inductive biases may be implemented machines that learn as fast as people do” (sect. 4.2.3, last para.)
through the precise choice of cost functions used in the optimiza- To design such machines, they should function as humans do.
tion of the connectivity of a neuronal network. But to identify But a human acts and learns based on his or her social-MOTOR
which cost function is actually being optimized in a cortical experience. Three main pieces of evidence can demonstrate our
circuit, we must first know how the circuit performs optimization. claim:
Recent work is starting to shed light on this question (Guergiuev First, any learning or social interacting is based on social motor
et al. 2016), and to do so, it has been forced to look deeply not only embodiment. In the field of human movement sciences, many
at neural circuits, but also even at how learning is implemented at pieces of evidence indicate that we are all influenced by the
the subcellular level. Similar opportunities hold for crossing motor behavior of the one with whom we are interacting (e.g.,
thresholds in our understanding of the neural basis of other key Schmidt & Richardson 2008). The motor behavior directly
components of machine learning agents, such as structured infor- expresses the state of mind of the partner (Marin et al. 2009).
mation routing, memory access, attention, hierarchical control, For example, if someone is shy, this state of mind will be directly
and decision making. embodied in her or his entire posture, facial expressions, gaze, and
We argue that the study of evolutionarily conserved neural gestures. It is in the movement that we observe the state of mind
structures will provide a means to identify the brain’s true, funda- of the other “interactant.” But when we are responding to that shy
mental inductive biases and how they actually arise. Specifically, person, we are influenced in return by that behavior. Obviously we
we propose that optimization, architectural constraints, and “boot- can modify intentionally our own motor behavior (to ease the
strapped cost functions” might be the basis for the development of interaction with him or her). But in most cases we are not
complex behavior (Marblestone et al. 2016). There are many aware of the alterations of our movements. For example, when
potential mechanisms for gradient-based optimization in cortical an adult walks next to a child, they both unintentionally synchron-
circuits, and many ways in which the interaction of such mecha- ize their stride length to each other (implying they both modify
nisms with multiple other systems could underlie diverse forms their locomotion to walk side-by-side). Another example in
of structured learning like those hypothesized in Lake et al. Fun- mental health disorders showed that an individual suffering
damental neural structures are likely tweaked and re-used to from schizophrenia does not interact “motorly” the same way as
underpin different kinds of inductive biases across animal a social phobic (Varlet et al. 2014). Yet, both pathologies
species, including humans. Within the lifetime of an animal, a present motor impairment and social withdrawal. But what char-
developmentally orchestrated sequence of experience-dependent acterizes their motor differences is based on the state of mind of
cost functions may provide not just a list of inductive biases, but a the patients. In our example, the first patient presents attentional

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 43
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
impairment, whereas the other suffers from social inhibition. If, agents or to the object on which they are acting, and (3) learn
however, a healthy participant is engaged in a social-motor syn- to understand what his or her or its actions mean.
chronization task, both participants (the patient and the healthy
subject) unintentionally adjust their moves (Varlet et al. 2014).
ACKNOWLEDGMENT
This study demonstrates that unconscious communication is
This work was supported by the Dynamics of Interactions, Rhythmicity,
sustained even though the patients are suffering from social inter-
Action and Communication (DIRAC), a project funded by the Agence
action disorders. We can then state that mostly low-level treat-
Nationale de la Recherche (Grant ANR 13-ASTR-0018-01).
ments of sensorimotor flows are involved in this process.
Consequently, machines/robots should be embedded with com-
putational models, which tackles the very complex question of
adapting to the human world using sensorimotor learning.
We claim that enactive approaches of this type will drastically
reduce the complexity of future computational models. Methods The argument for single-purpose robots
of this type are indeed supported by recent advances in the
doi:10.1017/S0140525X17000231, e274
human brain mirroring system and theories based on motor reso-
nance (Meltzoff 2007). In this line of thinking, computational Daniel E. Moerman
models have been built and used to improve human robot inter-
University of Michigan—Dearborn, Ypsilanti, MI 48198.
action and communication, in particular through the notion of
[email protected] naeb.brit.org
learning by imitation (Breazeal & Scassellati 2002; Lopes &
Santos-Victor 2007). Furthermore, some studies embedded Abstract: The argument by Lake et al. to create more human-like robots
machines with computational models using an adequate is, first, implausible and, second, undesirable. It seems implausible to me
action-perception loop and showed that some complex social com- that a robot might have friends, fall in love, read Foucault, prefer Scotch to
petencies such as immediate imitation (present in early human Bourbon, and so on. It seems undesirable because we already have 7
development) could emerge through sensorimotor ambiguities billion people on earth and don’t really need more.
as proposed in Gaussier et al. (1998), Nagai et al. (2011), and
Braud et al. (2014). This commentary addresses the issue of Human-Like Machines
This kind of model allows future machines to better generalize (HLMs), which Lake et al. would like to be able to do more than
their learning and to acquire new social skills. In other recent have “object recognition” and play “video games, and board
examples, using a very simple neural network providing minimal games” (abstract). They would like a machine “to learn or think
sensorimotor adaptation capabilities to the robot, unintentional like a person” (sect. 1, para. 3). I argue that people do vastly
motor coordination could emerge during an imitation game (of more than this: they interact, communicate, share, and collaborate;
a simple gesture) with a human (Hasnain et al. 2012; 2013). An they use their learning and thinking to “behave”; they experience
extension of this work demonstrated that a robot could quickly complex emotions. I believe that these authors have a far too
and “online” learn more complex gestures and synchronize its limited sense of what “human-like” behavior is. The kinds of behav-
behavior to the human partner based on the same sensorimotor ior I have in mind include (but are certainly not limited to) these:
approach (Ansermin et al. 2016). 1. Drive with a friend in a stick shift car from LA to Vancouver,
Second, even to learn (or understand) what a simple object is, and on to Banff…
people need to act on it (O’Regan 2011). For example, if we do 2. Where, using a fly he or she tied, with a fly rod he or she
not know what a “chair” is, we will understand its representation made, he or she should be able to catch a trout which…
by sitting on it, touching it. The definition is then easy: A chair is 3. He or she should be able to clean, cook, and share with a
an object on which we can sit, regardless of its precise shape. friend.
Now, if we try to define its representation before acting, it 4. He or she should have a clear gender identity, clearly recog-
becomes very difficult to describe it. This requires determining nizing what gender he or she is, and understanding the differences
the general shape, number of legs, with or without arms or between self and other genders. (Let’s decide our HLM was man-
wheels, texture, and so on. Hence, when programming a ufactured to be, and identifies as, “male.”)
machine, this latter definition brings a high computational cost 5. He should be able to fall in love, get married, and reproduce.
that drastically slows down the speed of the learning (and pushes He might wish to vote; he should be able to pay taxes. I’m not
away the idea of learning as fast as humans do). In that case, the certain if he could be a citizen.
machines/robots should be able to learn directly by acting and per- 6. He should be able to read Hop on Pop to his 4-year-old,
ceiving the consequences of their actions on the object/person. helping her to get the idea of reading. He should be able to
Finally, from a more low-level aspect, even shape recognition is read it to her 200 times. He should be able to read and understand
strongly connected to our motor experience. Viviani and Stucchi Foucault, Sahlins, Hinton, le Carré, Erdrich, Munro, and authors
(1992) demonstrated that when they showed a participant a like them. He should enjoy reading. He should be able to write a
point light performing a perfect circle, as soon as this point book, like Hop on Pop, or like Wilder’s The Foundations of
slowed down at the upper and lower parts of this circle, the par- Mathematics.
ticipant did not perceive the trajectory as a circle any longer, 7. He should be able to have irreconcilable differences with his
but as an ellipse. This perceptual mistake is explained by the spouse, get divorced, get depressed, get psychological counseling,
fact that we perceive the shape of an object based on the way get better, fall in love again, remarry, and enjoy his grandchildren.
we draw it (in drawing a circle, we move with a constant speed, He should be able to detect by scent that the baby needs to have
whereas in drawing an ellipse, we slow down at the two opposite her diaper changed. Recent research indicates that the human
extremities). Typically, handwriting learning (often cited by the nose can discriminate more than one trillion odors (Bushdid
authors) is based not only on learning visually the shape of the et al. 2014). Our HLM should at least recognize a million or so.
letters, but also mainly on global sensorimotor learning of perceiv- He should be able to change a diaper and to comfort and calm
ing (vision) and acting (writing, drawing). Once again, this a crying child. And make mac and cheese.
example indicates that machines/robots should be able to under- 8. He should be able to go to college, get a B.A. in Anthropol-
stand an object or the reaction of a person based on how they ogy, then a Ph.D., get an academic job, and succeed in teaching
have acted on that object/person. the complexities of kinship systems to 60 undergraduates.
Therefore, to design machines that learn as fast as humans, we 9. He should be able to learn to play creditable tennis, squash,
need to make them able to (1) learn through a perception-action baseball, or soccer, and enjoy it into his seventies. He should be
paradigm, (2) perceive and react to the movements of other able to get a joke. (Two chemists go into a bar. The first says,

44 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
“I’ll have an H2O.” The second says, “I’ll have an H2O too.” The autonomous choice of goals, and integrating them with deep learning
second guy dies.) He should be able both to age and to die. opens stimulating perspectives.
10. He should be able to know the differences between Scotch
and Bourbon, and to develop a preference for one or the other, Deep learning (DL) approaches made great advances in artificial
and enjoy it occasionally. Same for wine. intelligence, but are still far from human learning. As argued con-
I’m human, and I can do, or have done, all those things (except vincingly by Lake et al., differences include human capabilities to
die), which is precisely why I think this is a fool’s errand. I think it learn causal models of the world from very few data, leveraging
is a terrible idea to develop robots that are like humans. There are compositional representations and priors like intuitive physics
7 billion humans on earth already. Why do we need fake humans and psychology. However, there are other fundamental differ-
when we have so many real ones? The robots we have now are ences between current DL systems and human learning, as well
(primarily) extremely useful single-function machines that can as technical ingredients to fill this gap that are either superficially,
weld a car together in minutes, 300 a day, and never feel like, or not adequately, discussed by Lake et al.
well, a robot, or a rivethead (Hamper 2008). These fundamental mechanisms relate to autonomous develop-
Even this sort of robot can cause lots of problems, as substantial ment and learning. They are bound to play a central role in artifi-
unemployment in industry can be attributed to them. They tend cial intelligence in the future. Current DL systems require
to increase productivity and reduce the need for workers (Baily engineers to specify manually a task-specific objective function
& Bosworth 2014). If that’s what single-purpose (welding) for every new task, and learn through offline processing of large
robots can do, imagine what a HLM could do. If you think it training databases. On the contrary, humans learn autonomously
might not be a serious problem, read Philip K. Dick’s story, Do open-ended repertoires of skills, deciding for themselves which
Androids Dream Electric Sheep (Dick 1968), or better yet, goals to pursue or value and which skills to explore, driven by
watch Ridley Scott’s film Blade Runner (Scott 2007) based on intrinsic motivation/curiosity and social learning through natural
Dick’s story. The key issue in this film is that HLMs are indistin- interaction with peers. Such learning processes are incremental,
guishable from ordinary humans and are allowed legally to exist online, and progressive. Human child development involves a pro-
only as slaves. They don’t like it. Big trouble ensues. (Re gressive increase of complexity in a curriculum of learning where
number 6, above, our HLM should probably not enjoy Philip skills are explored, acquired, and built on each other, through par-
Dick or Blade Runner.) ticular ordering and timing. Finally, human learning happens in
What kinds of things should machines be able to do? Jobs inimical the physical world, and through bodily and physical experimenta-
to the human condition. Imagine an assistant fireman which could tion, under severe constraints on energy, time, and computational
run into a burning building and save the 4-year-old reading Dr. resources.
Seuss. There is work going on to develop robotic devices – referred In the two last decades, the field of Developmental and Cogni-
to as exoskeletons – that can help people with profound spinal cord tive Robotics (Asada et al. 2009; Cangelosi and Schlesinger 2015),
injuries to walk again (Brenner 2016). But this is only reasonable if in strong interaction with developmental psychology and neuro-
the device helps the patient go where he wants to go, not where the science, has achieved significant advances in computational mod-
robot wants to go. There is also work going on to develop robotic eling of mechanisms of autonomous development and learning in
birds, or orniothopters, among them the “Nano Hummingbird” human infants, and applied them to solve difficult artificial intelli-
and the “SmartBird.” Both fly with flapping wings (Mackenzie gence (AI) problems. These mechanisms include the interaction
2012). The utility of these creatures is arguable; most of what they between several systems that guide active exploration in large
can do could probably be done with a $100 quad-copter drone. and open environments: curiosity, intrinsically motivated rein-
(Our HLM should be able to fly a quad-copter drone. I can.) forcement learning (Barto 2013; Oudeyer et al. 2007; Schmid-
Google recently reported significant improvements in language huber 1991) and goal exploration (Baranes and Oudeyer 2013),
translation as a result of the adoption of a neural-network approach social learning and natural interaction (Chernova and Thomaz
(Lewis-Kraus 2016; Turovsky 2016). Many users report dramatic 2014; Vollmer et al. 2014), maturation (Oudeyer et al. 2013),
improvements in translations. (My own experience has been less and embodiment (Pfeifer et al. 2007). These mechanisms crucially
positive.) This is a classic single-purpose “robot” that can help complement processes of incremental online model building
translators, but no one ought to rely on it alone. (Nguyen and Peters 2011), as well as inference and representation
In summary, it seems that even with the development of large learning approaches discussed in the target article.
neural-network style models, we are far from anything in Blade Intrinsic motivation, curiosity and free play. For example,
Runner. It will be a long time before we can have an HLM that models of how motivational systems allow children to choose
can both display a patellar reflex and move the pieces in a chess which goals to pursue, or which objects or skills to practice in con-
game. And that, I think, is a very good thing. texts of free play, and how this can affect the formation of devel-
opmental structures in lifelong learning have flourished in the last
decade (Baldassarre and Mirolli 2013; Gottlieb et al. 2013). In-
depth models of intrinsically motivated exploration, and their
links with curiosity, information seeking, and the “child-as-a-
scientist” hypothesis (see Gottlieb et al. [2013] for a review),
Autonomous development and learning in have generated new formal frameworks and hypotheses to under-
artificial intelligence and robotics: Scaling up stand their structure and function. For example, it was shown that
deep learning to human-like learning intrinsically motivated exploration, driven by maximization of
learning progress (i.e., maximal improvement of predictive or
doi:10.1017/S0140525X17000243, e275 control models of the world; see Oudeyer et al. [2007] and
Schmidhuber [1991]) can self-organize long-term developmental
Pierre-Yves Oudeyer structures, where skills are acquired in an order and with timing
Inria and Ensta Paris-Tech, 33405 Talence, France. that share fundamental properties with human development
[email protected] http://www.pyoudeyer.com (Oudeyer and Smith 2016). For example, the structure of early
infant vocal development self-organizes spontaneously from
Abstract: Autonomous lifelong development and learning are such intrinsically motivated exploration, in interaction with the
fundamental capabilities of humans, differentiating them from current
deep learning systems. However, other branches of artificial intelligence
physical properties of the vocal systems (Moulin-Frier et al.
have designed crucial ingredients towards autonomous learning: 2014). New experimental paradigms in psychology and neurosci-
curiosity and intrinsic motivation, social learning and natural interaction ence were recently developed and support these hypotheses
with peers, and embodiment. These mechanisms guide exploration and (Baranes et al. 2014; Kidd 2012).

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 45
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
These algorithms of intrinsic motivation are also highly efficient in transparent and comprehensible ways, which can be achieved by
for multitask learning in high-dimensional spaces. In robotics, accurately mirroring human cognitive processes.
they allow efficient stochastic selection of parameterized experi-
ments and goals, enabling incremental collection of data and How to build human-like machines? We agree with the authors’
learning of skill models, through automatic and online curriculum assertion that “reverse engineering human intelligence can use-
learning. Such active control of the growth of complexity enables fully inform artificial intelligence and machine learning” (sect.
robots with high-dimensional continuous action spaces to learn 1.1, para. 3), and in this commentary we offer some suggestions
omnidirectional locomotion on slippery surfaces and versatile concerning the direction of future developments. Specifically,
manipulation of soft objects (Baranes and Oudeyer 2013) or hier- we posit that human-like machines should not only be built to
archical control of objects through tool use (Forestier and match humans in performance, but also to be able to make deci-
Oudeyer 2016). Recent work in deep reinforcement learning sions that are both transparent and comprehensible to humans.
has included some of these mechanisms to solve difficult rein- First, we argue that human-like machines need to decide and
forcement learning problems, with rare or deceptive rewards act in transparent ways, such that humans can readily understand
(Bellemare et al. 2016; Kulkarni et al. 2016), as learning multiple how their decisions are made (see Arnold & Scheutz 2016; Indur-
(auxiliary) tasks in addition to the target task simplifies the khya & Misztal-Radecka 2016; Mittelstadt et al. 2016). Behavior
problem (Jaderberg et al. 2016). However, there are many of artificial agents should be predictable, and people interacting
unstudied synergies between models of intrinsic motivation in with them ought to be in a position that allows them to intuitively
developmental robotics and deep reinforcement learning grasp how those machines decide and act the way they do (Malle
systems; for example, curiosity-driven selection of parameterized & Scheutz 2014). This poses a unique challenge for designing
problems/goals (Baranes and Oudeyer 2013) and learning strate- algorithms.
gies (Lopes and Oudeyer 2012) and combinations between intrin- In current neural networks, there is typically no intuitive
sic motivation and social learning, for example, imitation learning explanation for why a network reached a particular decision
(Nguyen and Oudeyer 2013), have not yet been integrated with given received inputs (Burrell 2016). Such networks represent
deep learning. statistical pattern recognition approaches that lack the ability
Embodied self-organization. The key role of physical embodi- to capture agent-specific information. Lake et al. acknowledge
ment in human learning has also been extensively studied in this problem and call for structured cognitive representations,
robotics, and yet it is out of the picture in current deep learning which are required for classifying social situations. Specifically,
research. The physics of bodies and their interaction with their the authors’ proposal of an “intuitive psychology” is grounded
environment can spontaneously generate structure guiding learn- in the naïve utility calculus framework (Jara-Ettinger et al.
ing and exploration (Pfeifer and Bongard 2007). For example, 2016). According to this argument, algorithms should attempt
mechanical legs reproducing essential properties of human leg to build a causal understanding of observed situations by creat-
morphology generate human-like gaits on mild slopes without ing representations of agents who seek rewards and avoid costs
any computation (Collins et al. 2005), showing the guiding role in a rational way.
of morphology in infant learning of locomotion (Oudeyer 2016). Putting aside extreme examples (e.g., killer robots and autono-
Yamada et al. (2010) developed a series of models showing that mous vehicles), let us look at the more ordinary artificial intelli-
hand-face touch behaviours in the foetus and hand looking in gence task of scene understanding. Cost-benefit–based
the infant self-organize through interaction of a non-uniform inferences about situations such as the one depicted in the left-
physical distribution of proprioceptive sensors across the body most picture in Figure 6 of Lake et al. will likely conclude that
with basic neural plasticity loops. Work on low-level muscle syner- one agent has a desire to kill the other, and that he or she
gies also showed how low-level sensorimotor constraints could values higher the state of the other being dead than alive.
simplify learning (Flash and Hochner 2005). Although we do not argue this is incorrect, a human-like classifi-
Human learning as a complex dynamical system. Deep learning cation of such a scene would rather reach the conclusion that
architectures often focus on inference and optimization. Although the scene depicts either a legal execution or a murder. The
these are essential, developmental sciences suggested many times returned alternative depends on the viewer’s inferences about
that learning occurs through complex dynamical interaction agent-specific characteristics. Making such inferences requires
among systems of inference, memory, attention, motivation, going beyond the attribution of simple goals – one needs to
low-level sensorimotor loops, embodiment, and social interaction. make assumptions about the roles and obligations of different
Although some of these ingredients are part of current DL agents. In the discussed example, although both a sheriff and a
research, (e.g., attention and memory), the integration of other contract killer would have the same goal to end another
key ingredients of autonomous learning and development opens person’s life, the difference in their identity would change the
stimulating perspectives for scaling up to human learning. human interpretation in a significant way.
We welcome the applicability of naïve utility calculus for infer-
ring simple information concerning agent-specific variables, such
as goals and competence level. At the same time, however, we
point out some caveats inherent to this approach. Humans inter-
acting with the system will likely expect a justification of why it has
Human-like machines: Transparency and picked one interpretation rather than another, and algorithm
comprehensibility designers might want to take this into consideration.
This leads us to our second point. Models of cognition can come
doi:10.1017/S0140525X17000255, e276 in at least two flavors: (1) As-if models, which only aspire to
achieve human-like performance on a specific task (e.g., classify-
Piotr M. Patrzyk, Daniela Link, and Julian N. Marewski
ing images), and (2) process models, which seek both to achieve
Faculty of Business and Economics, University of Lausanne, Quartier
human-like performance and to accurately reproduce the cogni-
UNIL-Dorigny, Internef, CH-1015 Lausanne, Switzerland
tive operations humans actually perform (classifying images by
[email protected] [email protected]
combining pieces of information in a way humans do). We
[email protected]
believe that the task of creating human-like machines ought to
Abstract: Artificial intelligence algorithms seek inspiration from human be grounded in existing process models of cognition. Indeed,
cognitive systems in areas where humans outperform machines. But on investigating human information processing is helpful for ensuring
what level should algorithms try to approximate human cognition? We that generated decisions are comprehensible (i.e., that they follow
argue that human-like machines should be designed to make decisions human reasoning patterns).

46 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
Why is it important that machine decision mechanisms, in addi- animals (e.g., Hubel & Wiesel, 1959), suggesting that an under-
tion to being transparent, actually mirror human cognitive pro- standing of human cognition is achievable. Third, the project of
cesses in a comprehensible way? In the social world, people understanding human cognition has a long way to go. We have
often judge agents not only according to the agents’ final deci- learned a lot about what we know when and what our brains are
sions, but also according to the process by which they have made of, but not how or why we know, think, and learn as we do.
arrived at these (e.g., Hoffman et al. 2015). It has been argued Research on cognition in human infancy provides a case in
that the process of human decision making does not typically point. Infants represent key geometric properties of the navigable
involve rational utility maximization (e.g., Hertwig & Herzog layout, track objects as solid, continuously movable bodies, and
2009). This, in turn, influences how we expect other people to endow others with goals and causal powers in ways that are
make decisions (Bennis et al. 2010). To the extent that one highly similar to those of other inexperienced animals (e.g.,
cares about the social applications of algorithms and their interac- Spelke & Lee 2012). These abilities are not shaped by encounters
tions with people, considerations about transparency and compre- with the postnatal environment: precocial and controlled-reared
hensibility of decisions become critical. animals exhibit them the first time they move through a navigable
Although as-if models relying on cost-benefit analysis might be space (e.g., Chiandetti et al. 2014; Wills et al. 2010), track an
reasonably transparent and comprehensible, for example, when object over occlusion (e.g., Regolin et al. 1995), or encounter
problems are simple and do not involve moral considerations, another animal (e.g., Mascalzoni et al. 2010). Moreover, the
this might not always be the case. Algorithm designers need to basic ways that infants and animals understand space, objects,
ensure that the underlying process will be acceptable to the and goal-directed action remain central to our intuitive thinking
human observer. What research can be drawn up to help build as adults (e.g., Doeller & Burgess 2008) and to the brain
transparent and comprehensible mechanisms? systems that support it (e.g., Doeller et al. 2008; 2010).
We argue that one source of inspiration might be the research None of these findings should be surprising or controversial.
on fast-and-frugal heuristics (Gigerenzer & Gaissmaier 2011). Human cognitive and neural architecture is unlikely to differ rad-
Simple strategies such as fast-and-frugal trees (e.g., Hafenbrädl ically from that of other animals, because evolution proceeds by
et al. 2016) might be well suited to providing justifications for modifying what is already present. Abilities to represent space,
decisions made in social situations. Heuristics not only are objects, and other agents are unlikely to be entirely learned,
meant to capture ecologically rational human decision mecha- because most animals need to get some problems right the first
nisms (see Todd & Gigerenzer 2007), but also are transparent time they arise, including finding their way home, distinguishing
and comprehensible (see Gigerenzer 2001). Indeed, these heuris- a supporting surface from a teetering rock, avoiding predators,
tics possess a clear structure composed of simple if-then rules and staying with their group. And innate knowledge is unlikely
specifying (1) how information is searched within the search to be overturned by later learning, because core knowledge cap-
space, (2) when information search is stopped, and (3) how the tures fundamental properties of space, objects, and agency and
final decision is made based upon the information acquired because learning depends on prior knowledge.
(Gigerenzer & Gaissmaier 2011). Despite these findings, we do not know how human knowledge
These simple decision rules have been used to model and aid originates and grows, and a wealth of approaches to this question
human decisions in numerous tasks with possible moral implica- are rightly being pursued. One class of models, however, cannot
tions, for example, in medical diagnosis (Hafenbrädl et al. 2016) plausibly explain the first steps of learning and development in
or classification of oncoming traffic at military checkpoints as any animal: deep learning systems whose internal structure is
hostile or friendly (Keller & Katsikopoulos 2016). We propose determined by analyzing massive amounts of data beyond any
that the same heuristic principles might be useful to engineer human scale. Human learning is fast and effective, in part,
autonomous agents that behave in a human-like way. because it builds on cognitive and neural systems by which we
understand the world throughout our lives. That’s one reason
ACKNOWLEDGMENTS why the effort described by Lake et al., implemented in computa-
D.L. and J.N.M acknowledge the support received from the Swiss tional models and tested against the judgments of human adults, is
National Science Foundation (Grants 144413 and 146702). important to the grand challenge of achieving a deep understand-
ing of human intelligence. We think the biggest advances from
this work are still to come, through research that crafts and tests
such models in systems that begin with human core knowledge
Intelligent machines and human minds and then learn, as young children do, to map their surroundings,
develop a taxonomy of object kinds, and reason about others’
doi:10.1017/S0140525X17000267, e277 mental states.
Computational models of infant thinking and learning may foster
Elizabeth S. Spelkea and Joseph A. Blassb efforts to build smart machines that are not only better at reasoning,
a
Department of Psychology, Harvard University, Cambridge, MA 02138; but also better for us. Because human infants are the best learners
b
Department of Electrical Engineering and Computer Science, Northwestern on the planet and instantiate human cognition in its simplest natural
University, Evanston, IL 60208. state, a computational model of infants’ thinking and learning could
[email protected] [email protected] guide the construction of machines that are more intelligent than
https://software.rc.fas.harvard.edu/lds/research/spelke/elizabeth-spelke/ any existing ones. But equally importantly, and sometimes left
http://qrg.northwestern.edu/people/Blass out of the conversation, a better understanding of our own minds
is critical to building information systems for human benefit.
Abstract: The search for a deep, multileveled understanding of human Whether or not such systems are designed to learn and think as
intelligence is perhaps the grand challenge for 21st-century science, with
broad implications for technology. The project of building machines that
we do, such an understanding will help engineers build machines
think like humans is central to meeting this challenge and critical to that will best foster our own thinking and learning.
efforts to craft new technologies for human benefit. To take just one example from current technology that is
already ubiquitous, consider mobile, GPS-guided navigation
A century of research on human brains and minds makes three systems. These systems can choose the most efficient route to a
things clear. First, human cognition can be understood only if it destination, based on information not accessible to the user, and
is studied at multiple levels, from neurons to concepts to compu- allowing users to get around in novel environments. A person
tations (Marr 1982/2010). Second, human and animal brains/ with a GPS-enabled cell phone never needs to know where he
minds are highly similar. Indeed, most of what we have discovered or she is or how the environment is structured. Is such a device
about our own capacities comes from multileveled studies of other good for us? A wealth of research indicates that the systems that

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 47
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people
guide active, independent navigation in humans and animals from playing, language acquisition, intuitive physics, and people’s
the moment that they first begin to locomote independently are understanding of block worlds. Today, Anderson’s ACT-R
broadly involved in learning and memory (e.g., Squire 1992). perhaps provides the most comprehensive simulation model
Moreover, the brain activity observed during active navigation (Anderson et al. 2004).
diminishes when the same trajectory is followed passively A second road was taken by pioneers like Colby (1975) with
(O’Keefe & Nadel 1978). How are these systems affected by PARRY, a computer program simulating a paranoid, Abelson
the use of devices that do our navigating for us? If the cognitive and Carroll (1965) with their True Believer program, and Weizen-
and brain sciences could answer such questions in advance, baum (1966) with his ELIZA non-directive psychotherapy
researchers could design intelligent devices that both eliminate program. The idea in this research was to understand people’s
unnecessary cognitive burdens and provide needed cognitive often suboptimal performances in learning and thinking. These
exercise. Without a deeper understanding of how and why we programs recognized that people are often emotional, a-rational,
learn and remember what we do, however, the designers of and function at levels well below their capabilities.
current technologies are working in the dark, even when they Many of these ideas have been formalized in recent psycholog-
design devices to aid navigation, one of our best-understood cog- ical research. For example, Stanovich (2009) has shown that ratio-
nitive functions (e.g., O’Keefe 2014; Moser et al. 2008). nality and intelligence are largely distinct. Mayer and Salovey
Working in the dark posed less of a problem in past centuries. (1993) have shown the importance of emotional intelligence to
When each new tool that humans invented had limited function, people’s thinking, and Sternberg (1997) has argued both for the
and its use spread slowly, tools could be evaluated and modified by importance of practical intelligence and for its relative indepen-
trial and error, without benefit of scientific insights into the dence from analytical or more academic aspects of intelligence.
workings of our minds. Today’s tools, however, have multiple The two roads of AI/simulation research might have converged
functions, whose workings are opaque to the end user. Technolog- with comprehensive models that comfortably incorporate aspects
ical progress is dizzyingly rapid, and even small advances bring of both optimal and distinctly suboptimal performance. They
sweeping, worldwide changes to people’s lives. To design future haven’t. At the time, Abelson, Colby, and others worked on
machines for human benefit, researchers in all of the information their models of what was at best a-rational, and at worst wholly
technologies need to be able to foresee the effects that their irrational, thinking. The work seemed a bit quirky and off the
inventions will have on us. And as Lake et al. observe, such fore- beaten track – perhaps a road not worth following very far. That
sight comes only with understanding. A great promise of human- was then
inspired artificial intelligence, beyond building smarter machines, The 2016 presidential election has upended any assumption that
is to join neuroscience and cognitive psychology in meeting the everyday people think along the lines that Lake and his colleagues
grand challenge of understanding the nature and development have pursued. Whether one is a Republican or a Democrat, it
of human minds. would be hard to accept this election process as representing any-
thing other than seriously deficient and even defective thinking.
The terms learning and thinking seem almost too complimentary
to describe what went on. To some people the 2016 election was
The fork in the road a frightening portent of a dystopia to come.
The first road, that of Lake et al., is of human cognition divorced
doi:10.1017/S0140525X17000279, e278 from raw emotions, of often self-serving motivations and illogic that
characterize much of people’s everything thinking. On this view,
Robert J. Sternberg people are more or less rational “machines.” One might think
Department of Human Development, College of Human Ecology, that it is only stupid people (Sternberg 2002; 2004) who think
Cornell University, Ithaca, NY 14853.
and act foolishly. But smart people are as susceptible to foolish
[email protected] www.robertjsternberg.com
thinking as are not so smart people, or even more susceptible,
Abstract: Machines that learn and think like people should simulate how because they do not realize they can think and act foolishly.
people really think in their everyday lives. The field of artificial intelligence The United States, and indeed the world, seems to be entering
originally traveled down two roads, one of which emphasized abstract, a new and uncharted era of populism and appeals by politicians
idealized, rational thinking and the other, which emphasized the not to people’s intellects, but to their basest emotions. Unless
emotionally charged and motivationally complex situations in which our models of learning and thinking help us understand how
people often find themselves. The roads should have converged but those appeals can succeed, and how we can counter them and
never did. That’s too bad. help people become wiser (Sternberg & Jordan 2005), the
models we create will be academic, incomplete, and, at worst,
Two roads diverged in a wood, and I—
wrong-headed. The field came to a fork in the road and took it,
I took the one less traveled by,
but to where?
And that has made all the difference.
—Robert Frost, The Road Not Taken
When you come to a fork in the road, take it.
—Yogi Berra
Avoiding frostbite: It helps to learn from others
Lake and his colleagues have chosen to build “machines that learn doi:10.1017/S0140525X17000280, e279
and think like people.” I beg to differ. Or perhaps it is a matter of
what one means by learning and thinking like people. Permit me Michael Henry Tessler, Noah D. Goodman, and
to explain. Early in the history of artificial intelligence (AI) and Michael C. Frank
simulation research, investigators began following two different
Department of Psychology, Stanford University, Stanford, CA 94305.
roads. The roads might potentially have converged, but it has
[email protected] [email protected]
become more and more apparent from recent events that they
[email protected] stanford.edu/~mtessler/
have actually diverged.
noahgoodman.net stanford.edu/~mcfrank/
One road was initiated by pioneers like Newell et al. (1957),
Winograd (1972), Minsky and Papert (1987), Minsky (2003), Abstract: Machines that learn and think like people must be able to learn
and Feigenbaum and Feldman (1995). This road was based on from others. Social learning speeds up the learning process and – in
understanding people’s competencies in learning and thinking. combination with language – is a gateway to abstract and unobservable
Investigators taking this road studied causal reasoning, game information. Social learning also facilitates the accumulation of

48 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Commentary/Lake et al.: Building machines that learn and think like people

Figure 1 (Tessler et al.). Score trajectories for players in the game Frostbite over time. The two panels depict results with and without
instructions on the abstract structure of the game.

knowledge across generations, helping people and artificial intelligences target article. The other half were not given this information.
learn things that no individual could learn in a lifetime. (Everybody was told that you move the agent using the arrow
keys.) Learners who were told about the abstract structure of
Causality, compositionality, and learning-to-learn – the future the game learned to play the game more quickly and achieved
goals for artificial intelligence articulated by Lake et al. – are higher overall scores (M = 2440) than the group without written
central for human learning. But these abilities alone would not instructions (M = 1333) (Figure 1). The highest score for those
be enough to avoid frostbite on King William Island in the without linguistic instructions was 3530 points, achieved after
Arctic Archipelago. You need to know how to hunt seals, make about 4 minutes of play. By comparison, the highest score
skin clothing, and manage dog sleds, and these skills are not achieved with linguistic instructions was 7720 points, achieved
easy to acquire from the environment alone. But if the Netsilik after 2 minutes of play. Indeed, another group (including some
Inuit people taught them to you, your chances of surviving a authors of the target article) recently found a similar pattern of
winter would be dramatically improved (Lambert 2011). Similar increased performance in Frostbite as a result of social guidance
to a human explorer, an artificial intelligence (AI) learning to (Tsividis et al. 2017).
play video games like Frostbite should take advantage of the Learning from others also does more than simply “speed up”
rich knowledge available from other people. Access to this knowl- learning about the world. Human knowledge seems to accumulate
edge requires the capacity for social learning, both a critical pre- across generations, hence permitting progeny to learn in one life-
requisite for language use and a gateway in itself to cumulative time what no generation before them could learn (Boyd et al.,
cultural knowledge. 2011; Tomasello, 1999). We hypothesize that language – and par-
Learning from other people helps you learn with fewer data. In ticularly its flexibility to refer to abstract concepts – is key to faith-
particular, humans learn effectively even from “small data” ful transmission of knowledge, between individuals and through
because the social context surrounding the data is itself informa- generations. Human intelligence is so difficult to match because
tive. Dramatically different inferences can result from what is we stand on the shoulders of giants. AIs need to “ratchet” up
ostensibly the same data in distinct social contexts or even with their own learning, by communicating knowledge efficiently
alternative assumptions about the same context (Shafto et al. within and across generations. Rather than be subject to a top-
2012). The flexibility of the social inference machinery in down hive mind, intelligent agents should retain their individual
humans turns small signals into weighty observations: Even for intellectual autonomy, and innovate new solutions to problems
young children, ambiguous word-learning events become infor- based on their own experience and what they have learned from
mative through social reasoning (Frank & Goodman 2014), non- others. The important discoveries of a single AI could then be
obvious causal action sequences become “the way you do it” shared, and we believe language is the key to this kind of cultural
when presented pedagogically (Buchsbaum et al. 2011), and transmission. Cultural knowledge could then accumulate within
complex machines can become single-function tools when a both AI and human networks.
learner is taught just one function (Bonawitz et al. 2011). In sum, learning from other people should be a high priority for
Learning from others comes in many forms. An expert may tol- AI researchers. Lake et al. hope to set priorities for future
erate onlookers, a demonstrator may slow down when completing research in AI, but fail to acknowledge the importance of learning
a particularly challenging part of the task, and a teacher may from language and social cognition. This is a mistake: The more
actively provide pedagogical examples and describe them with complex the task is, the more learning to perform like a human
language (Csibra & Gergely 2009; Kline 2015). Informative involves learning from other people.
demonstrations may be particularly useful for procedural learning
(e.g., hunting seals, learning to play Frostbite). Language,
however, is uniquely powerful in its ability to convey information
that is abstract or difficult to observe, or information that Crossmodal lifelong learning in hybrid neural
otherwise does not have a way of being safely acquired such as embodied architectures
learning that certain plants are poisonous or how to avoid frostbite
(Gelman 2009). Studying social learning is an important part of doi:10.1017/S0140525X17000292, e280
studying language learning (Goodman & Frank 2016); both
should be top priorities for making AIs learn like people. Stefan Wermter, Sascha Griffiths, and Stefan Heinrich
Focusing on Lake et al.’s key example, you can even learn the Knowledge Technology Group, Department of Informatics, Universität
game Frostbite with fewer data when you learn it from other Hamburg, Hamburg, Germany.
people. We recruited 20 participants from Amazon’s Mechanical [email protected]
Turk to play Frostbite for 5 minutes. Half of the participants griffi[email protected]
were given written instructions about the abstract content of the [email protected]
game, adapted directly from the caption of Figure 2 in the https://www.informatik.uni-hamburg.de/~wermter/

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 49
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Response/Lake et al.: Building machines that learn and think like people
https://www.informatik.uni-hamburg.de/~griffiths/ learned and then transferred to another domain within the same
https://www.informatik.uni-hamburg.de/~heinrich/ modality. For example, it is possible to learn about affect in
speech and to transfer that model of affect to music (Coutinho
Abstract: Lake et al. point out that grounding learning in general et al. 2014). In the modality case, one can vertically transfer con-
principles of embodied perception and social cognition is the next step
cepts from one modality to another. This could be a learning
in advancing artificial intelligent machines. We suggest it is necessary to
go further and consider lifelong learning, which includes developmental process in which language knowledge is transferred to the visual
learning, focused on embodiment as applied in developmental robotics domain (Laptev 2008; Donahue 2015). However, based on the
and neurorobotics, and crossmodal learning that facilitates integrating previous crossmodal integration in people, we must look into com-
multiple senses. binations of both, such that transferring between domains is not
merely switching between two modalities, but integrating into
Artificial intelligence has recently been seen as successful in a both. Therefore, machines must exploit the representations that
number of domains, such a playing chess or Go, recognising hand- form when integrating multiple modalities that are richer than
written characters, or describing visual scenes in natural language. the sum of the parts. Recent initial examples include (1) under-
Lake et al. discuss these kinds of breakthroughs as a big step for standing continuous counting expressed in spoken numbers
artificial intelligence, but raise the question how we can build from learned spatial differences in gestural motor grounding
machines that learn like people? We can find an indication in a (Ruciński 2014) and (2) classifying affective states’ audiovisual
survey of mind perception (Gray et al. 2007), which is the emotion expressions via music, speech, facial expressions, and
“amount of mind” people are willing to attribute to others. Partic- motion (Barros and Wermter 2016).
ipants judged machines to be high on agency but low on experi- Freeing learning from modalities and domains in favour of dis-
ence. We attribute this to the fact that computers are trained on tributed representations, and reusing learned representations in
individual tasks, often involving a single modality such as vision the next individual learning tasks, will enable a larger view of
or speech, or a single context such as classifying traffic signs, as learning to learn. Having underlying hybrid neural embodied
opposed to interpreting spoken and gestured utterances. In con- architectures (Wermter et al. 2005) will support horizontal and
trast, for people, the “world” essentially appears as a multimodal vertical transfer and integration. This is the “true experience”
stream of stimuli, which unfold over time. Therefore, we machines need to learn and think like people. All in all, Lake
suggest that the next paradigm shift in intelligent machines will et al. stress the important point of grounding learning in general
have to include processing the “world” through lifelong and cross- principles of embodied perception and social cognition. Yet, we
modal learning. This is important because people develop suggest it is still necessary to go a step further and consider life-
problem-solving capabilities, including language processing, over long learning, which includes developmental learning, focused
their life span and via interaction with the environment and on embodiment as applied in developmental robotics and neuro-
other people (Elman 1993, Christiansen and Chater 2016). In robotics, and crossmodal learning, which facilitates the integration
addition, the learning is embodied, as developing infants have a of multiple senses.
body-rational view of the world, but also seem to apply general
problem-solving strategies to a wide range of quite different
tasks (Cangelosi and Schlesinger 2015).
Hence, we argue that the proposed principles or “start-up soft-
ware” are coupled tightly with general learning mechanisms in the
brain. We argue that these conditions inherently enable the devel-
Authors’ Response
opment of distributed representations of knowledge. For
example, in our research, we found that architectural mecha-
nisms, like different timings in the information processing in the
cortex, foster compositionality that in turn enables both the devel- Ingredients of intelligence: From classic
opment of more complex body actions and the development of debates to an engineering roadmap
language competence from primitives (Heinrich 2016). These
kinds of distributed representations are coherent with the cogni- doi:10.1017/S0140525X17001224, e281
tive science on embodied cognition. Lakoff and Johnson (2003),
for example, argue that people describe personal relationships Brenden M. Lake,a Tomer D. Ullman,b,c Joshua
in terms of the physical sensation of temperature. The transfer B. Tenenbaum,b,c and Samuel J. Gershmanc,d
from one domain to the other is plausible, as an embrace or hand- a
Department of Psychology and Center for Data Science, New York University,
shake between friends or family members, for example, will cause New York, NY 10011; bDepartment of Brain and Cognitive Sciences,
a warm sensation for the participants. These kinds of temperature Massachusetts Institute of Technology, Cambridge, MA 02139; cThe Center
exchanging actions are supposed to be signs of people’s positive for Brains Minds and Machines, Cambridge, MA 02139; dDepartment of
Psychology and Center For Brain Science, Harvard University, Cambridge,
feelings towards each other (Hall 1966). The connection
MA 02138
between temperature sensation and social relatedness is argued
[email protected] [email protected] [email protected]
to reflect neural “bindings” (Gallese and Lakoff 2005). The [email protected] http://cims.nyu.edu/~brenden/
domain knowledge that is used later in life can be derived from http://www.mit.edu/~tomeru/ http://web.mit.edu/cocosci/josh.html
the primitives that are encountered early in childhood, for http://gershmanlab.webfactional.com/index.html
example, in interactions between infants and parents, and is
referred to as intermodal synchrony (Rohlfing and Nomikou Abstract: We were encouraged by the broad enthusiasm for
2014). As a further example, our own research shows that learn- building machines that learn and think in more human-like
ing, which is based on crossmodal integration, like the integration ways. Many commentators saw our set of key ingredients as
of real sensory perception on low and on intermediate levels (as helpful, but there was disagreement regarding the origin and
suggested for the superior colliculus in the brain), can enable structure of those ingredients. Our response covers three main
both super-additivity and dominance of certain modalities based dimensions of this disagreement: nature versus nurture,
on the tasks (Bauer et al. 2015). coherent theories versus theory fragments, and symbolic versus
In developing machines, approaches such as transfer learning sub-symbolic representations. These dimensions align with
and zero-shot learning are receiving increasing attention, but classic debates in artificial intelligence and cognitive science,
are often restricted to transfers from domain to domain or from although, rather than embracing these debates, we emphasize
modality to modality. In the domain case, this can take the form ways of moving beyond them. Several commentators saw our
of a horizontal transfer, in which a concept in one domain is set of key ingredients as incomplete and offered a wide range of

50 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Response/Lake et al.: Building machines that learn and think like people
additions. We agree that these additional ingredients are central thrust of the article” (Davis & Marcus, para. 1):
important in the long run and discuss prospects for “Causality, compositionality, and learning-to-learn … are
incorporating them. Finally, we consider some of the ethical central for human learning” (Tessler, Goodman, &
questions raised regarding the research program as a whole. Frank [Tessler et al.], para. 1): “Their ideas of ‘start-up
software’ and tools for rapid model learning … help pin-
R1. Summary point the sources of general, flexible intelligence”
(Dennet & Lambert, para. 1).
We were pleased to see so many thoughtful commentaries This is not to say that there was universal agreement
and critiques in response to our target article. The project about our suggested ingredients. Our list was carefully
of “building machines that learn and think like people” will chosen but not meant to be complete, and many commen-
require input and insight from a broad range of disciplines, tators offered additional suggestions: emotion (Clark;
and it was encouraging that we received responses from Güss & Dörner), embodiment and action (Baldassarre,
experts in artificial intelligence (AI), machine learning, cog- Santucci, Cartoni, & Caligiore; [Baldassarre et al.];
nitive psychology, cognitive development, social psychol- MacLennan; Marin & Mostafaoui; Oudeyer;
ogy, philosophy, robotics, and neuroscience. As to be Wermter, Griffiths, & Heinrich [Wermter et al.]),
expected, there were many differences in perspective and social and cultural learning (Clegg & Corriveau;
approach, but before turning to those disagreements we Dennett & Lambert; Tessler et al.; Marin & Mosta-
think it is worth starting with several main points of faoui), and open-ended learning through intrinsic moti-
agreement. vation (Baldassarre et al.; Güss & Dörner; Oudeyer;
First, we were encouraged to see broad enthusiasm for Wermter et al.). We appreciate these suggested additions,
the general enterprise and the opportunities it would which help paint a richer and more complete picture of
bring. Like us, many researchers have been inspired by the mind and the ingredients of human intelligence. We
recent AI advances to seek a better computational under- discuss prospects for incorporating them into human-like
standing of human intelligence, and see this project’s AI systems in Section 5.
potential for driving new breakthroughs in building more The main dimensions of disagreement in the commentar-
human-like intelligence in machines. There were notable ies revolved around how to implement our suggested ingre-
exceptions: A few respondents focused more on the poten- dients in building AI: To what extent should they be
tial risks and harms of this effort, or questioned its whole explicitly built in, versus expected to emerge? What is
foundations or motivations. We return to these issues at their real content? How integrated or fragmented is the
the end of this response. mind’s internal structure? And what form do they take?
Most commenters also agreed that despite rapid pro- How are these capacities represented in the mind or instan-
gress in AI technologies over the last few years, machine tiated in the brain, and what kinds of algorithms or data
systems are still not close to achieving human-like learning structures should we be looking to in building an AI system?
and thought. It is not merely a matter of scaling up current Perhaps, unsurprisingly, these dimensions tended to
systems with more processors and bigger data sets. Funda- align with classic debates in cognitive science and AI, and
mental ingredients of human cognition are missing, and we found ourselves being critiqued from all sides. The
fundamental innovations must be made to incorporate first dimension is essentially the nature versus nurture
these ingredients into any kind of general-purpose, debate (Section 2), and we were charged with advocating
human-like AI. both for too much nature (Botvinick et al.; Clegg & Cor-
Our target article articulated one vision for making pro- riveau; Cooper) and too little (Spelke & Blass). The
gress toward this goal. We argued that human-like intelli- second dimension relates to whether human mental
gence will come from machines that build models of the models are better characterized in terms of coherent theo-
world – models that support explanation and understanding, ries versus theory fragments (Section 3): We were criticized
prediction and planning, and flexible generalization for an for positing theory-forming systems that were too strong
open-ended array of new tasks – rather than machines that (Chater & Oaksford; Davis & Marcus; Livesey, Gold-
merely perform pattern recognition to optimize perfor- water, & Colagiuri [Livesey et al.]), but also too weak
mance in a previously specified task or set of tasks. (Dennett & Lambert). The third dimension concerns
We outlined a set of key cognitive ingredients that could symbolic versus sub-symbolic representations (Section 4):
support this approach, which are missing from many To some commenters our proposal felt too allied with sym-
current AI systems (especially those based on deep learn- bolic cognitive architectures (Çağ lar & Hanson; Hansen,
ing), but could add great value: the “developmental start- Lampinen, Suri, & McClelland [Hansen et al.]; Mac-
up software” of intuitive physics and intuitive psychology, Lennan). To others, we did not embrace symbols deeply
and mechanisms for rapid model learning based on the enough (Forbus & Gentner).
principles of compositionality, causality, and learning-to- Some who saw our article through the lens of these
learn (along with complementary mechanisms for efficient classic debates, experienced a troubling sense of déjà vu.
inference and planning with these models). We were grat- It was “Back to the Future: The Return of Cognitive Func-
ified to read that many commentators found these sug- tionalism” for Çağ lar & Hanson. For Cooper, it appeared
gested cognitive ingredients useful: “We agree … on “that cognitive science has advanced little in the last 30
their list of ‘key ingredients’ for building human-like intel- years with respect to the underlying debates.” We felt dif-
ligence” (Botvinick, Barrett, Battaglia, de Freitas, ferently. We took this broad spectrum of reactions from
Kumaran, Leibo, Lillicrap, Modayil, Mohamed, Rabi- commentators (who also, by and large, felt they agreed
nowitz, Rezende, Santoro, Schaul, Summerfield, with our main points), as a sign that our field collectively
Wayne, Weber, Wierstra, Legg, & Hassabis [Botvi- might be looking to break out from these debates – to
nick et al.], abstract): “We entirely agree with the move in new directions that are not so easily classified as

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 51
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Response/Lake et al.: Building machines that learn and think like people
just more of the same. It is understandable that many com- ingredients: nature versus nurture (sect. 2), coherent theo-
mentators would see our argument through the lens of ries versus theory fragments (sect. 3), and symbolic versus
these well-known and entrenched lines of argument, sub-symbolic representations (sect. 4). Additional ingredi-
perhaps because we, as individuals, have contributed to ents suggested by the commentators are covered in
them in previous publications. However, we wrote this Section 5. We discuss insights from neuroscience and the
target article, in part, because we felt it was vital to redefine brain in Section 6. We end by discussing the societal risks
this decades-long discussion in light of the recent progress and benefits of building machines that learn and think
in AI and machine learning. like people, in the light of the ethical issues raised by
Recent AI successes, on the one hand, make us optimis- some commentators (sect. 7).
tic about the project of building machines that learn and
think like people. Working toward this goal seems much
more plausible to many people than it did just a few R2. Nature versus nurture
years ago. At the same time, recent AI successes, when
viewed from the perspective of a cognitive scientist, also As mentioned, our target article did not intend to take a
highlight the gaps between machine and human intelli- strong stance on “nature versus nurture” or “designing
gence. Our target article begins from this contrast: versus learning” for how our proposed ingredients should
Whereas the driving force behind most of today’s come to be incorporated into more human-like AI
machine learning systems is sophisticated pattern recogni- systems. We believe this question is important, but we
tion, scaled up to increasingly large and diverse data sets, placed our focus elsewhere in the target article. The main
the most impressive feats of human learning are better thesis is that a set of ingredients – each with deep roots in
understood in terms of model building, often with much cognitive science – would be powerful additions to AI
more limited data. We take the goal of building machines systems in whichever way a researcher chooses to include
that can build models of the world as richly, as flexibly, them. Whether the ingredients are learned, built in, or
and as quickly as humans can, as a worthy target for the enriched through learning, we see them as a primary goal
next phase of AI research. Our target article lays out to strive for when building the next generation of AI
some of the key ingredients of human cognition that systems. There are multiple possible paths for developing
could serve as a basis for making progress toward that goal. AI systems with these ingredients, and we expect individual
We explicitly tried to avoid framing these suggestions in researchers will vary in the paths they choose for pursuing
terms of classic lines of argument that neural network these goals.
researchers and other cognitive scientists have engaged Understandably, many of the commentators linked their
in, to encourage more building and less arguing. With views on the biological origin of our cognitive principles to
regards to nature versus nurture (sect. 2 of this article), their strategy for developing AI systems with these princi-
we tried our best to describe these ingredients in a way ples. In contrast to the target article and its agnostic
that was “agnostic with regards to [their] origins” (target stance, some commentators took a stronger nativist
article, sect. 4, para. 2), but instead focused on their engi- stance, arguing that aspects of intuitive physics, intuitive
neering value. We made this choice, not because we do psychology, and causality are innate, and it would be valu-
not have views on the matter, but because we see the able to develop AI systems that “begin with human core
role of the ingredients as more important than their knowledge” (Spelke & Blass, para. 4). Other commenta-
origins, for the next phase of AI research and the dialog tors took a stronger nurture stance, arguing that the goal
between scientists and engineers. Whether learned, should be to learn these core ingredients rather than
innate, or enriched, the fact that these ingredients are build systems that start with them (Botvinick et al.;
active so early in development, is a signal of their impor- Cooper). Relatedly, many commentators pointed out addi-
tance. They are present long before a person learns a tional nurture-based factors that are important for human-
new handwritten letter in the Character Challenge, or like learning, such as social and cultural forms of learning
learns to play a new video game in the Frostbite Challenge (Clegg & Corriveau; Dennet & Lambert; Marin &
(target article, sects. 3.1 and 3.2). AI systems could similarly Mostafaoui; Tessler et al.). In the section that follows,
benefit from utilizing these ingredients. With regards to we respond to the different suggestions regarding the
symbolic versus sub-symbolic modeling (sect. 4 of this origin of the key ingredients, leaving the discussion of addi-
article), we think the ingredients could take either form, tional ingredients, such as social learning, for Section 5.
and they could potentially be added to symbolic architec- The response from researchers at Google DeepMind
tures, sub-symbolic architectures, or hybrid architectures (Botvinick et al.) is of particular interest because our
that transcend the dichotomy. Similarly, the model-build- target article draws on aspects of their recent work. We
ing activities we describe could potentially be implemented offered their work as examples of recent accomplishments
in a diverse range of architectures, including deep learning. in AI (e.g., Graves et al. 2016; Mnih et al. 2015; Silver et al.
Regardless of implementation, demonstrations such as the 2016). At the same time, we highlighted ways that their
Characters and Frostbite challenges show that people can systems do not learn or think like people (e.g., the Frostbite
rapidly build models of the world, and then flexibly recon- Challenge), but could potentially be improved by aiming
figure these models for new tasks without having to retrain. for this target and by incorporating additional cognitive
We see this as an ambitious target for AI that can be ingredients. Botvinick et al.’s response suggests that there
pursued in a variety of ways, and will have many practical are substantial areas of agreement. In particular, they see
applications (target article, sect. 6.2). the five principles as “a powerful set of target goals for AI
The rest of our response is organized as follows: The next research” (para. 1), suggesting similar visions of what
three sections cover in detail the main dimensions of future accomplishments in AI will look like, and what the
debate regarding the origin and structure of our required building blocks are for getting there.

52 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Response/Lake et al.: Building machines that learn and think like people

Botvinick et al. strongly emphasized an additional prin- & Miikkulainen 2002). This approach may be character-
ciple: Machines should learn for themselves with minimal ized as “building machines that evolve to learn and think
hand engineering from their human designers. We agree like people,” in that such an extensive search would pre-
this is a valuable principle to guide researchers seeking to sumably include aspects of both phylogeny and ontogeny.
build learning-based general AI systems, as DeepMind As discussed in Section 4.1 of the target article, children
aims to. To the extent that this principle is related to our have a foundational understanding of physics (objects,
principle of “learning-to-learn,” we also endorse it in build- substances, and their dynamics) and psychology (agents
ing machines that learn and think like people. Children are and their goals) early in development. Whether innate,
born capable of learning for themselves everything they will enriched, or rapidly learned, it seems unlikely that these
ultimately learn, without the need for an engineer to tweak ingredients arise purely in ontogeny from an extensive
their representations or algorithms along the way. structural search over a large space of cognitive architec-
However, it is not clear that the goals of building general tures, with no initial bias toward building these kinds of
AI systems and building machines that learn like people structures. In contrast, our preferred approach is to
always converge, and the best design approach might be explore both powerful learning algorithms and starting
correspondingly different. Human beings (and other ingredients together.
animals) may be born genetically programmed with mech- Over the last decade, this approach has led us to the key
anisms that effectively amount to highly engineered cogni- ingredients that are the topic of the target article (e.g.,
tive representations or algorithms – mechanisms that Baker et al. 2009; 2017; Battaglia et al. 2013; Goodman
enable their subsequent learning and learning-to-learn abil- et al. 2008; Kemp et al. 2007; Lake et al. 2015a; Ullman
ities. Some AI designers may want to emulate this et al. 2009); we did not start with these principles as
approach, whereas others may not. dogma. After discovering which representations, learning
The differences between our views may also reflect a dif- algorithms, and inference mechanisms appear especially
ference in how we prioritize a set of shared principles and powerful in combination with each other, it is easier to
how much power we attribute to learning-to-learn investigate their origins and generalize them so they
mechanisms. Botvinick et al. suggest – but do not state apply more broadly. Examples of this strategy from our
explicitly – that they prioritize learning with minimal engi- work include the grammar-based framework for discover-
neering above the other principles (and, thus, maximize ing structural forms in data (Kemp & Tenenbaum 2008),
the role of learning-to-learn). Under this strategy, the and a more emergent approach for implicitly learning
goal is to develop systems with our other key ingredients some of the same forms (Lake et al. 2016), as well as
(compositionality, causality, intuitive physics, and intuitive models of causal reasoning and learning built on the
psychology), insofar as they can be learned from scratch theory of causal Bayesian networks (Goodman et al. 2011;
without engineering them. In the short term, this approach Griffiths & Tenenbaum 2005, 2009). This strategy has
rests heavily on the power of learning-to-learn mechanisms allowed us to initially consider a wider spectrum of
to construct these other aspects of an intelligent system. In models, without a priori rejecting those that do not learn
cases where this strategy is not feasible, Botvinick et al. everything from scratch. Once an ingredient is established
state their approach also licenses them to build in ingredi- as important, it provides important guidance for additional
ents too, but (we assume) with a strong preference for research on how it might be learned.
learning the ingredients wherever possible. We have pursued this strategy primarily through struc-
Although these distinctions may seem subtle, they can tured probabilistic modeling, but we believe it can be fruit-
have important consequences for research strategy and fully pursued using neural networks as well. As Botvinick
outcome. Compare DeepMind’s work on the Deep Q- et al. point out, this strategy would not feel out of place in
Network (Mnih et al. 2015) to the theory learning approach contemporary deep learning research. Convolutional
our target article advocated for tackling the Frostbite Chal- neural networks build in a form of translation invariance
lenge, or their work on one-shot learning in deep neural that proved to be highly useful for object recognition (Kriz-
networks (Rezende et al. 2016; Santoro et al. 2016; hevsky et al. 2012; LeCun et al. 1989), and more recent
Vinyals et al. 2016) and our work on Bayesian Program work has explored building various forms of compositional-
Learning (Lake et al. 2015a). DeepMind’s approaches to ity into neural networks (e.g., Eslami et al. 2016; Reed & de
these problems clearly learn with less initial structure Freitas 2016). Increasingly, we are seeing more examples of
than we advocate for, and also clearly have yet to approach integrating neural networks with lower-level building
the speed, flexibility, and richness of human learning, even blocks from classic psychology and computer science (see
in these constrained domains. sect. 6 of target article): selective attention (Bahdanau
We sympathize with DeepMind’s goals and believe et al. 2015; Mnih et al. 2014; Xu et al. 2015), augmented
their approach should be pursued vigorously, along with working memory (Graves et al. 2014; Grefenstette et al.
related suggestions by Cooper and Hansen et al. 2015; Sukhbaatar et al. 2015; Weston et al. 2015b), and
However, we are not sure how realistic it is to pursue all experience replay (McClelland et al. 1995; Mnih et al.
of our key cognitive ingredients as emergent phenomena 2015). AlphaGo has an explicit model of the game of Go
(see related discussion in sect. 5 of the target article), and builds in a wide range of high level and game-specific
using the learning-to-learn mechanisms currently on features, including how many stones a move captures,
offer in the deep learning landscape. Genuine intuitive how many turns since a move was played, the number of
physics, intuitive psychology, and compositionality, are liberties, and whether a ladder will be successful or not
unlikely to emerge from gradient-based learning in a rela- (Silver et al. 2016). If researchers are willing to include
tively generic neural network. Instead, a far more expen- these types of representations and ingredients, we hope
sive evolutionary-style search over discrete architectural they will also consider our higher level cognitive
variants may be required (e.g., Real et al. 2017; Stanley ingredients.

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 53
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Response/Lake et al.: Building machines that learn and think like people
It is easy to miss fruitful alternative representations by particular, for enabling efficient inference, prediction,
considering only models with minimal assumptions, espe- and learning in rich causal theories. We suggested that dif-
cially in cases where the principles and representations ferent behaviors might be best explained by one, or the
have strong empirical backing (as is the case with our sug- other, or both. For example, identifying the presence in a
gested principles). In fact, Botvinick et al. acknowledge scene of an object that we call a “fridge” may indeed be
that intuitive physics and psychology may be exceptions driven by pattern recognition. But representing that
to their general philosophy, and could be usefully built in, object as a heavy, rigid, inanimate entity, and the corre-
given their breadth of empirical support. We were gratified sponding predictions and plans that representation allows,
to see this, and we hope it is clear to them and like-minded is likely driven by more general abstract knowledge about
AI researchers that our recommendations are to consider physics and objects, whose core elements are not tied
building in only a relatively small set of core ingredients down to extensive patterns of experience with particular
that have this level of support and scope. Moreover, a categories of objects. We could just as well form this repre-
purely tabula rasa strategy can lead to models that sentation upon our first encounter with a fridge, without
require unrealistic amounts of training experience, and knowing what it is called or knowing anything about the cat-
then struggle to generalize flexibly to new tasks without egory of artifacts that it is one instance of.
retraining. We believe that has been the case so far for On the issue of rich versus shallow representations,
deep learning approaches to the Characters and Frostbite whereas intuitive theories of physics and psychology may
challenges. be rich in the range of generalizable inferences they
support, these and other intuitive theories are shallow in
another sense; they are far more shallow than the type of
R3. Coherent theories versus theory fragments formal theories scientists aim to develop, at the level of
base reality and mechanism. From the point of view of a
Beyond the question of where our core ingredients come physicist, a game engine representation of a tower of
from, there is the question of their content and structure. blocks falling down is definitely not, as Davis & Marcus
In our article, we argued for theory-like systems of knowl- describe it, a “physically precise description of the situa-
edge and causally structured representations, in particular tion” (para. 4). A game engine representation is a simplifi-
(but not limited to) early-emerging intuitive physics and cation of the physical world; it does not go down to the
intuitive psychology. This view builds on extensive empiri- molecular or atomic level, and it does not give predictions
cal research showing how young infants organize the world at the level of a nanosecond. It represents objects with sim-
according to general principles that allow them to general- plified bounding shapes, and it can give coarse predictions
ize across varied scenarios (Spelke 2003; Spelke & Kinzler for coarse time-steps. Also, although real physics engines
2007), and on theoretical and empirical research applied to are useful analogies for a mental representation, they are
children and adults that sees human knowledge in different not one and the same, and finding the level of granularity
domains as explained by theory-like structures (Carey 2009; of the mental physics engine (if it exists) is an empirical
Gopnik et al. 2004; Murphy & Medin 1985; Schulz 2012b; question. To the point about intuitive psychology, theories
Wellman & Gelman 1992; 1998). that support reasoning about agents and goals do not need
Commentators were split over how rich and how theory- to specify all of the moving mental or neural parts involved
like (or causal) these representations really are in the in planning, to make useful predictions and explanations
human mind and what that implies for building human- about what an agent might do in a given situation.
like AI. Dennett & Lambert see our view of theories as Returning to the need for multiple types of models, and
too limited – useful for describing cognitive processes to the example of the fridge, Chater & Oaksford point to
shared with animals, but falling short of many distinctively a significant type of reasoning not captured by either recog-
human ways of learning and thought. On the other hand, nizing an image of a fridge or reasoning about its physical
several commentators saw our proposal as too rich for behavior as a heavy, inert object. Rather, they consider
much of human knowledge. Chater & Oaksford argue the shallow and sketchy understanding of how a fridge
by analogy to case-law, that “knowledge has the form of a stays cold. Chater & Oaksford use such examples to
loosely inter-linked history of reusable fragments” reason that, in general, reasoning is done by reference to
(para. 6) rather than a coherent framework. They stress examplars. They place stored, fragmented exemplars in
that mental models are often shallow, and Livesey et al. the stead of wide-scope and deep theories. However, we
add that people’s causal models are not only shallow, but suggest that even the shallow understanding of the opera-
also often wrong, and resistant to change (such as the tion of a fridge may best be phrased in the language of a
belief that vaccines cause autism). Davis & Marcus simi- causal, generative model, albeit a shallow or incomplete
larly suggest that the models we propose for the core ingre- one. That is, even in cases in which we make use of previ-
dients are too causal, too complete, and too narrow to ously stored examples, these examples are probably best
capture all of cognitive reasoning: Telling cats from dogs represented by a causal structure, rather than by external
does not require understanding their underlying biological or superficial features. To use Chater & Oaksford’s
generative process; telling that a tower will fall does not analogy, deciding which precedent holds in a new case
require specifying in detail all of the forces and masses at relies on the nature of the offense and the constraining cir-
play along the trajectory; and telling that someone is cumstances, not the surname of the plaintiff. In the same
going to call someone does not require understanding way that two letters are considered similar not because of
whom they are calling or why. a pixel-difference measure, but because of the similar
In our target article, although we emphasized a view of strokes that created them, exemplar-based reasoning
cognition as model building, we also argued that pattern would rely on the structural similarity of causal models of
recognition can be valuable and even essential – in a new example and stored fragments (Medin & Ortony

54 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Response/Lake et al.: Building machines that learn and think like people

1989). An interesting hypothesis is that shallow causal human cognition has begun to move beyond it, in ways
models or mini-theories could be filling their gaps with that could be valuable.
more general, data-driven statistical machinery, such as a To this end, we suggested that pattern recognition versus
causal model with some of its latent variables generated model building – and the ability to rapidly acquire new
by neural networks. Another possibility is that some mini- models and then reconfigure these models for new tasks
causal theories are generated ad hoc and on the fly without having to retrain – is a useful way to view the
(Schulz, 2012a), and so it should not be surprising that wide gap between human and machine intelligence. Imple-
they are sometimes ill-specified and come into conflict menting AI systems with our key ingredients would be a
with one another. promising route for beginning to bridge this gap. Although
Unlike more general and early developing core theories our proposal is not entirely orthogonal to the symbolic
such as intuitive physics and intuitive psychology, these versus sub-symbolic debate, we do see it as importantly
mini-theory fragments may rely on later-developing lan- different. Genuine model-building capabilities could be
guage faculties (Carey 2009). More generally, the early implemented in fully symbolic architectures or in a
forms of core theories such as intuitive physics and psychol- range of architectures that combine minimal symbolic com-
ogy, may be, as Dennett & Lambert put it, “[bootstrapped] ponents (e.g., objects, relations, agents, goals) with compo-
into reflective comprehension” (para. 3). Similar points have sitionality and sub-symbolic representation.
been made in the past by Carey (2009) and Spelke (Spelke These ingredients could also be implemented in an
2003; Spelke & Kinzler 2007), among others, regarding architecture that does not appear to have symbols in any
the role of later-developing language in using domain-spe- conventional sense – one that advocates of sub-symbolic
cific and core knowledge concepts to extend intuitive theo- approaches might even call non-symbolic – although we
ries as well as to build formal theories. The principles of expect that advocates of symbolic approaches would point
core knowledge by themselves, are not meant to fully to computational states, which are effectively functioning
capture the formal or qualitative physical understanding of as symbols. We do not claim to be breaking any new
electricity, heat, light, and sound (distinguishing it from ground with these possibilities; the theoretical landscape
from the qualitative reasoning that Forbus & Gentner has been well explored in philosophy of mind. We merely
discuss). But, if later-developing aspects of physical under- want to point out that our set of key ingredients is not
standing are built on these early foundations, that may be something that should trouble people who feel that
one source of the ontological confusion and messiness that symbols are problematic. On the contrary, we hope this
permeates our later intuitive theories as well as people’s path can help bridge the gap between those who see
attempts to understand formal theories in intuitive terms symbols as essential, and those who find them mysterious
(Livesey et al.). Electrons are not little colliding ping- or elusive.
pong balls; enzymes are not trying to achieve an aspiration. Of our suggested ingredients, compositionality is argu-
But our parsing of the world into regular-bounded objects ably the most closely associated with strongly symbolic
and intentional agents produces these category errors, architectures. In relation to the above points, it is especially
because of the core role objects and agents play in cognition. instructive to discuss how close this association has to be,
and how much compositionality could be achievable
within approaches to building intelligent machines that
R4. Symbolic versus sub-symbolic might not traditionally be seen as symbolic.
representations Hansen et al. argue that there are inherent limitations
to “symbolic compositionality” that deep neural networks
Beyond the richness and depth of our intuitive theories, the help overcome. Although we have found traditional sym-
nature of representation was hotly contested in other ways, bolic forms of compositionality to be useful in our work,
for the purposes of both cognitive modeling and developing especially in interaction with other key cognitive ingredi-
more human-like AI. A salient division in the commentaries ents such as causality and learning-to-learn (e.g.,
was between advocates of “symbolic” versus “sub-symbolic” Goodman et al. 2011; 2015; Lake et al. 2015a), there may
representations, or relatedly, those who viewed our work be other forms of compositionality that are useful for learn-
through the lens of “explicit” versus “implicit” representa- ing and thinking like humans, and easier to incorporate into
tions or “rules” versus “associations.” Several commentators neural networks. For example, neural networks designed to
thought our proposal relied too much on symbolic repre- understand scenes with multiple objects (see also Fig. 6 of
sentations, especially because sub-symbolic distributed our target article), or to generate globally coherent text
representations have helped facilitate much recent pro- (such as a recipe), have found simple forms of composition-
gress in machine learning (Çağlar & Hanson; Hansen ality to be extremely useful (e.g., Eslami et al. 2016; Kiddon
et al.; MacLennan). Other commentators argued that et al. 2016). In particular, “objects” are minimal symbols
human intelligence rests on more powerful forms of sym- that can support powerfully compositional model building,
bolic representation and reasoning than our article empha- even if implemented in an architecture that would other-
sized, such as abstract relational representations and wise be characterized as sub-symbolic (e.g., Eslami et al.
analogical comparison (Forbus & Gentner). 2016; Raposo et al. 2017). The notion of a physical object –
This is a deeply entrenched debate in cognitive science a chunk of solid matter that moves as a whole, moves
and AI – one that some of us have directly debated in smoothly through space and time without teleporting, dis-
past articles (along with some of the commentators [e.g., appearing, or passing through other solid objects – emerges
Griffiths et al. 2010; McClelland et al. 2010]), and we are very early in development (Carey 2009). It is arguably the
not surprised to see it resurfacing here. Although we central representational construct of human beings’ earliest
believe that this is still an interesting debate, we also see intuitive physics, one of the first symbolic concepts in any
that recent work in AI and computational modeling of domain that infants have access to, and likely shared with

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 55
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Response/Lake et al.: Building machines that learn and think like people
many other animal species in some form (see target article, There are likely other missing components as well.
sect. 4.1.1). Hence, the “object” concept is one of the best However, the question for us as researchers interested in
candidates for engineering AI to start with, and a promising the reverse engineering of cognition is: Where to start?
target for advocates of sub-symbolic approaches who might We focused on ingredients that were largely missing from
want to incorporate useful but minimal forms of symbols today’s deep learning AI systems, ones that were clearly
and compositionality into their systems. crucial and present early in human development, and with
Deep learning research is also beginning to explore more large expected payoffs in terms of core AI problems. Impor-
general forms of compositionality, often by utilizing hybrid tantly, for us, we also wanted to draw focus to ingredients
symbolic and sub-symbolic representations. Differentiable that to our mind can be implemented in the relatively
neural computers (DNCs) are designed to process sym- short term, given a concentrated effort. Our challenges
bolic structures such as graphs, and they use a mixture of are not meant to be AI-complete, but ones that can poten-
sub-symbolic neural network-style computation and sym- tially be met in the next few years. For many of the sugges-
bolic program traces to reason with these representations tions the commentators made, it is hard (for us, at least) to
(Graves et al. 2016). Neural programmer-interpreters know where to begin concrete implementation.
(NPIs) begin with symbolic program primitives embedded We do not mean that there have not been engineering
in their architecture, and they learn to control the flow of advances and theoretical proposals for many of these sug-
higher-level symbolic programs that are constructed from gestions. The commentators have certainly made progress
these primitives (Reed & de Freitas 2016). Interestingly, on them, and we and our colleagues have also made theo-
the learned controller is a sub-symbolic neural network, retical and engineering contributions to some. But to do
but it is trained with symbolic supervision. These systems full justice to all of these missing components – from
are very far from achieving the powerful forms of model emotion to sociocultural learning to embodiment – there
building that we see in human intelligence, and it is likely are many gaps that we do not know how to fill yet. Our
that more fundamental breakthroughs will be needed. aim was to set big goal posts on the immediate horizon,
Still, we are greatly encouraged to see neural network and we admit that there are others beyond. With these
researchers who are not ideologically opposed to the role implementation gaps in mind, we have several things to
of symbols and compositionality in the mind and, indeed, say about each of these missing components.
are actively looking for ways to incorporate these notions
into their paradigm.
R5.1. Machines that feel: Emotion
In sum, by viewing the impressive achievements of
human learning as model building rather than pattern rec- In popular culture, intelligent machines differ from
ognition, we hope to emphasize a new distinction, different humans in that they do not experience the basic passions
from classic debates of symbolic versus sub-symbolic com- that color people’s inner life. To call someone robotic
putation, rules versus associations, or explicit versus does not mean that they lack a good grasp of intuitive
implicit reasoning. We would like to focus on people’s physics, intuitive psychology, compositionality, or causality.
capacity for learning flexible models of the world as a It means they, like the Tin Man, have no heart. Research on
target for AI research – one that might be reached success- “mind attribution” has also borne out this distinction (Gray
fully through a variety of representational paradigms if they & Wegner 2012; Gray et al. 2007; Haslam 2006; Loughnan
incorporate the right ingredients. We are pleased that the & Haslam 2007): Intelligent machines and robots score
commentators seem to broadly support “model building” highly on the agency dimension (people believe such crea-
and our key ingredients as important goals for AI research. tures can plan and reason), but low on the experience
This suggests a path for moving forward together. dimension (people believe they lack emotion and subjective
insight). In line with this, Güss & Dörner; Clark; and
Sternberg highlight emotion as a crucial missing ingredi-
R5. Additional ingredients ent in building human-like machines. As humans ourselves,
we recognize the importance of emotion in directing
Many commentators agreed that although our key ingredi- human behavior, in terms of both understanding oneself
ents were important, we neglected another obvious, crucial and predicting and explaining the behavior of others. The
component of human-like intelligence. There was less challenge, of course, is to operationalize this relationship
agreement on which component we had neglected. Over- in computational terms. To us, it is not obvious how to go
looked components included emotion (Güss & Dörner; from evocative descriptions, such as “a person would get
Clark); embodiment and action (Baldassarre et al.; Mac- an ‘uneasy’ feeling when solution attempts do not result
Lennan; Marin & Mostafaoui; Oudeyer; Wermter in a solution” (as observed by Güss & Dörner, para. 5),
et al.); learning from others through social and cultural to a formal and principled implementation of unease in a
interaction (Clegg & Corriveau; Dennett & Lambert; decision-making agent. We see this as a worthwhile
Marin & Mostafaoui; Tessler et al.); open-ended learn- pursuit for developing more powerful and human-like AI,
ing combined with the ability to set one’s own goal (Baldas- but we see our suggested ingredients as leading to concrete
sarre et al.; Oudeyer; Wermter et al.); architectural diversity payoffs that are more attainable in the short term.
(Buscema & Sacco); dynamic network communication Nonetheless we can speculate about what it might take
(Graham); and the ability to get a joke (Moerman). to structure a human-like “emotion” ingredient in AI, and
Clearly, our recipe for building machines that learn and how it would relate to the other ingredients we put forth.
think like people was not complete. We agree that each of Pattern-recognition approaches (based on deep learning
these capacities should figure in any complete scientific or other methods) have had some limited success in
understanding of human cognition, and will likely be mapping between video and audio of humans to simple
important for building artificial human-like cognition. emotion labels like happy (e.g., Kahou et al. 2013).

56 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Response/Lake et al.: Building machines that learn and think like people

Sentiment analysis networks learn to map between text and not need to be explicit and accessible in a communicative
its positive or negative valence (e.g., Socher et al. 2013). sense (target article, sect. 4). Instead, people may have
But genuine, human-like concepts or experiences of no introspective insight into its underlying computations,
emotion will require more, especially more sophisticated in the same way that they have no introspective insight
model building, with close connections and overlap with into the computations that go into recognizing a face.
the ingredient of intuitive psychology. Humans may have To MacLennan’s point regarding the necessary tight
a “lay theory of emotion” (Ong et al. 2015) that allows coupling between an agent and a real environment: If a
them to reason about the causal processes that drive the theory-like representation turns out to be the right repre-
experiences of frustration, anger, surprise, hate, and joy. sentation, we do not see why it cannot be arrived at by
That is, something like “achieving your goal makes you virtual agents in a virtual environment, provided that they
feel happy.” This type of theory would also connect the are provided with the equivalents of somatosensory infor-
underlying emotions to observable behaviors such as mation and the ability to generate the equivalent of
facial expressions (downward turned lips), action (crying), forces. Agents endowed with a representation of intuitive
body posture (hunched shoulders), and speech (“It’s physics may have calibration issues when transferred
nothing, I’m fine”). Moreover, as pointed out by Güss & from a virtual environment to a situated and embodied
Dörner, a concept of “anger” must include how it modu- robot, but it would likely not result in a complete break-
lates perception, planning, and desires, touching on key down of their physical understanding, any more than
aspects of intuitive psychology. adults experience a total breakdown of intuitive physics
when transferred to realistic virtual environments.
As for being situated in a physical body, although the
R5.2. Machines that act: Action and embodiment
mental game-engine representation has been useful in cap-
One of the aspects of intelligence “not much stressed by turing people’s reasoning about disembodied scenes (such
Lake et al.” was the importance of intelligence being as whether a tower of blocks on a table will fall down), it is
“strongly embodied and situated,” located in an acting interesting to consider extending this analogy to the exis-
physical body (Baldassarre et al., para. 4), with possible tence of an agent’s body and the bodies of other agents.
remedies coming in the form of “developmental robotics Many games rely on some representation of the players,
and neurorobotics” (Oudeyer; Wermter et al.). This with simplified bodies built of “skeletons” with joint con-
was seen by some commentators as more than yet- straints. This type of integration would fit naturally with
another-key-ingredient missing from current deep learning the long-investigated problem of pose estimation (Moeslund
research. Rather, they saw it as an issue for our own pro- et al. 2006), which has recently been the target of discrimina-
posal, particularly as it relates to physical causality and tive deep learning networks (e.g., Jain et al. 2014; Toshev &
learning. Embodiment and acting on the real world pro- Szegedy 2014). Here, too, we would expect a converging
vides an agent with “a foundation for its understanding of combination of structured representations and pattern rec-
intuitive physics” (MacLennan), and “any learning or ognition: That is, rather than mapping directly between
social interacting is based on social motor embodiment.” image pixels and the target label sitting, there would be an
Even understanding what a chair is requires the ability to intermediate simplified body-representation, informed by
sit on it (Marin & Mostafaoui). constraints on joints and the physical situation. This interme-
We were intentionally agnostic in our original proposal diate representation could in turn be categorized as sitting
regarding the way a model of intuitive physics might be (see related hybrid architectures from recent years [e.g.,
learned, focusing instead on the existence of the ability, Chen & Yuille 2014; Tompson et al. 2014]).
its theory-like structure, usefulness, and early emergence,
and its potential representation as something akin to a
R5.3. Machines that learn from others: Culture and
mental game engine. It is an interesting question whether
pedagogy
this representation can be learned only by passively
viewing video and audio, without active, embodied engage- We admit that the role of sociocultural learning is, as
ment. In agreement with some of the commentators, it Clegg & Corriveau put it, “largely missing from Lake
seems likely to us that such a representation in humans et al.’s discussion of creating human-like artificial intelli-
does come about – over a combination of both evolutionary gence” (abstract). We also agree that this role is essential
and developmental processes – from a long history of for human cognition. As the commentators pointed out,
agents’ physical interactions with the world – applying it is important both on the individual level, as “learning
their own forces on objects (perhaps somewhat haphaz- from other people helps you learn with fewer data”
ardly at first in babies), observing the resulting effects, (Tessler et al., para. 2) and also on the societal level, as
and revising their plans and beliefs accordingly. “human knowledge seems to accumulate across genera-
An intuitive theory of physics built on object concepts, tions” (Tessler et al., para. 5). Solving Frostbite is not
and analogs of force and mass, would also benefit a physi- only a matter of combining intuitive physics, intuitive psy-
cally realized robot, allowing it to plan usefully from the chology, compositionality, and learning-to-learn, but also a
beginning, rather than bumbling aimlessly and wastefully matter of watching someone play the game, or listening to
as it attempts some model-free policies for interaction someone explain it (Clegg & Corriveau; Tessler et al.),
with its environment. An intuitive theory of physics can as we have shown in recent experiments (Tsividis et al.
also allow the robot to imagine potential situations 2017).
without going through the costly operation of carrying Some of the commentators stressed the role of imitation
them out. Furthermore, unlike MacLennan’s require- and over-imitation in this pedagogical process (Dennet &
ment that theories be open to discourse and communica- Lambert; Marin & Mostafaoui). Additionally, Tessler
tion, such a generative, theory-like representation does et al. focused more on language as the vehicle for this

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 57
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Response/Lake et al.: Building machines that learn and think like people
learning, and framed the study of social learning as a part of humans develop rich internal models is that these
language learning. Our only disagreement with Tessler support the ability to flexibly solve an infinite variety of
et al. regarding the importance of language, is their conten- tasks. Acquisition of such models would be impossible if
tion that we “fail to acknowledge the importance of learn- humans were not intrinsically motivated to acquire infor-
ing from language.” We completely agree about the mation about the world, without being tied to particular
importance of understanding language for understanding supervised tasks. The key question, in our view, is how to
cognition. However, we think that by understanding the define intrinsic motivation in such a way that a learning
early building blocks we discussed, we will be in a better system will seek to develop an abstract understanding of
position to formally and computationally understand lan- the world, populated by agents, objects, and events. Devel-
guage learning and use. For a fuller reply to this point, opmental roboticists tend to emphasize embodiment as a
we refer the reader to Section 5 in the target article. source of constraints: Robots need to explore their physical
Beyond being an additional ingredient, Clegg & Corri- environment to develop sophisticated, generalizable
veau suggest sociocultural learning may override some of sensory-motor skills. Some (e.g., MacLennan) argue that
the key ingredients we discuss. As they nicely put it, high-level competencies, such as intuitive physics and cau-
“although the developmental start-up software children sality, are derived from these same low-level sensory-motor
begin with may be universal, early in development child- skills. As in the previous section, we believe that embodi-
ren’s ‘software updates’ may be culturally-dependent. ment, although important, is insufficient: humans can use
Over time, these updates may even result in distinct oper- exploration to develop abstract theories that transcend par-
ating systems” (para. 4). Their evidence for this includes ticular sensors and effectors (e.g., Cook et al. 2011). For
different culture-dependent time-courses for passing the example, in our Frostbite Challenge, many of the alterna-
false belief task, understanding fictional characters as tive goals are not defined in terms of any particular visual
such, and an emphasis on consensus-building (Corriveau input or motor output. A promising approach would be
et al. 2013; Davoodi et al. 2016; Liu et al. 2008). We see to define intrinsic motivation in terms of intuitive theories –
these differences as variations on, or additions to, the autonomous learning systems that seek information about
core underlying structure of intuitive psychology, which the causal relationships between agents, objects, and
is far from monolithic in its fringes. The specific causes events. This form of curiosity would augment, not
of a particular behavior posited by a 21st-century replace, the forms of lower-level curiosity necessary to
Western architect may be different from those of a medi- develop sensory-motor skills.
eval French peasant or a Roman emperor, but the parsing
of behavior in terms of agents that are driven by a mix of
desire, reasoning, and necessity, would likely remain the R6. Insights from neuroscience and the brain
same, just as their general ability to recognize faces
would likely be the same (Or as an emperor put it, Our article did not emphasize neuroscience as a source of
“[W]hat is such a person doing, and why, and what is he constraint on AI, not because we think it is irrelevant
saying, and what is he thinking of, and what is he contriv- (quite the contrary), but because we felt that it was neces-
ing” [Aurelius 1937]). Despite these different stresses, we sary to first crystallize the core ingredients of human intel-
agree with Clegg & Corriveau that sociocultural learning ligence at a computational level before trying to figure out
builds upon the developmental start-up packages, rather how they are implemented in physical hardware. In this
than by starting with a relatively blank slate child that sense, we are advocating a mostly top-down route
develops primarily through socio-cultural learning via lan- through the famous Marr levels of analysis, much as Marr
guage and communication (Mikolov et al. 2016). himself did. This was unconvincing to some commentators
(Baldassarre et al.; George; Kriegeskorte & Mok;
Marblestone, Wayne, & Kording). Surely it is necessary
R5.4. Machines that explore: Open-ended learning and
to consider neurobiological constraints from the start, if
intrinsic motivation
one wishes to build human-like intelligence?
Several commentators (Baldassarre et al.; Güss & We agree that it would be foolish to argue for cognitive
Dörner; Oudeyer; Wermter et al.) raised the challenge processes that are in direct disagreement with known neu-
of building machines that engage in open-ended learning robiology. However, we do not believe that neurobiology in
and exploration. Unlike many AI systems, humans (espe- its current state provides many strong constraints of this
cially children) do not seem to optimize a supervised objec- sort. For example, George suggests that lateral connec-
tive function; they explore the world autonomously, tions in visual cortex indicate that the internal model
develop new goals, and acquire skill repertoires that gener- used by the brain enforces contour continuity. This seems
alize across many tasks. This challenge has been particularly plausible, but it is not the whole story. We see the world
acute for developmental roboticists, who must endow their in three dimensions, and there is considerable evidence
robots with the ability to learn a large number of skills from from psychophysics that we expect the surfaces of objects
scratch. It is generally infeasible to solve this problem by to be continuous in three dimensions, even if such continu-
defining a set of supervised learning problems, because of ity violates two-dimensional contour continuity (Nakayama
the complexity of the environment and sparseness of et al. 1989). Thus, the situation is more like the opposite of
rewards. Instead, roboticists have attempted to endow what George argues: a challenge for neuroscience is to
their robots with intrinsic motivation to explore, so that explain how neurons in visual cortex enforce the three-
they discover for themselves what goals to pursue and dimensional continuity constraints we know exist from psy-
skills to acquire. chophysical research.
We agree that open-ended learning is a hallmark of Kriegeskorte & Mok point to higher-level vision as a
human cognition. One of our main arguments for why place where neural constraints have been valuable. They

58 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
Response/Lake et al.: Building machines that learn and think like people

write that core object recognition has been “conquered” by progress too quickly, we would remind them that the best
brain-inspired neural networks. We agree that there has machine-learning systems are still very far from achieving
been remarkable progress on basic object recognition human-like learning and thought, in all of the ways we dis-
tasks, but there is still a lot more to understand scientifically cussed in the target article. Superintelligent AIs are even
and to achieve on the engineering front, even in visual further away, so far that we believe it is hard to plan for
object perception. Take, for example, the problem of occlu- them, except in the most general sense. Without new
sion. Because most neural network models of object recog- insights, ingredients, and ideas – well beyond those we
nition have no explicit representation of objects arranged in have written about – we think that the loftiest goals for AI
depth, they are forced to process occlusion as a kind of will be difficult to reach. Nonetheless, we see the current
noise. Again, psychophysical evidence argues strongly debate on AI ethics as responsible and healthy, and we
against this: When objects pass behind an occluding take Dennett & Lambert’s suggestion regarding AI co-
surface, we do not see them as disappearing or becoming pilots in that spirit.
corrupted by a massive amount of noise (Kellman & Moerman’s commentary fits well with many of these
Spelke 1983). A challenge for neuroscience is to explain points: Simply scaling up current methods is unlikely to
how neurons in the ventral visual stream build a 3D repre- achieve anything like human intelligence. However, he
sentation of scenes that can appropriately handle occlusion. takes the project of building more human-like learning
The analogous challenge exists in AI for brain-inspired arti- machines to its logical extreme – building a doppelgänger
ficial neural networks. machine that can mimic all aspects of being human,
Further challenges, just in the domain of object percep- including incidental ones. Beyond rapid model building
tion, include perceiving multiple objects in a scene at once; and flexible generalization, and even after adding the
perceiving the fine-grained shape and surface properties of additional abilities suggested by the other commentators
novel objects for which one does not have a class label; and (sect. 5), Moerman’s doppelgänger machine would still
learning new object classes from just one or a few examples, need the capability to get a joke, get a Ph.D., fall in
and then generalizing to new instances. In emphasizing the love, get married, get divorced, get remarried, prefer
constraints biology places on cognition, it is sometimes Bourbon to Scotch (or vice versa), and so on. We agree
underappreciated to what extent cognition places strong that it is difficult to imagine machines will do all of
constraints on biology. these things any time soon. Nonetheless, the current AI
landscape would benefit from more human-like learn-
ing – with its speed, flexibility, and richness – far before
R7. Coda: Ethics, responsibility, and opportunities machines attempt to tackle many of the abilities that
Moerman discusses. We think that this type of progress,
Your scientists were so preoccupied with whether or not they
even if only incremental, would still have far-reaching,
could, that they didn’t stop to think if they should.
practical applications (target article, sect. 6.2), and
— Dr. Ian Malcom, Jurassic Park broader benefits for society.
Given recent progress, AI is now widely recognized as a Apart from advances in AI more generally, advances in
source of transformative technologies, with the potential human-like AI would bring additional unique benefits.
to impact science, medicine, business, home life, civic Several commentators remarked on this. Spelke &
life, and society, in ways that improve the human condition. Blass point out that a better understanding of our own
There is also real potential for more negative impacts, minds will enable new kinds of machines that “can foster
including dangerous side effects or misuse. Recognizing our thinking and learning” (para. 5). In addition,
both the positive and negative potential has spurred a Patrzyk, Link, & Marewski expound on the benefits of
welcome discussion of ethical issues and responsibility in “explainable AI,” such that algorithms can generate
AI research. Along these lines, a few commentators ques- human-readable explanations of their output, limitations,
tioned the moral and ethical aspects of the very idea of and potential failures (Doshi-Velez & Kim 2017). People
building machines that learn and think like people. often learn by constructing explanations (Lombrozo
Moerman argues that the project is both unachievable 2016, relating to our “model building”), and a human-
and undesirable and, instead, advocates for building like machine learner would seek to do so too. Moreover,
useful, yet inherently limited “single-purpose” machines. as it pertains to human-machine interaction (e.g.,
As he puts it (para. 2), “There are 7 billion humans on Dennett & Lambert), it is far easier to communicate
earth already. Why do we need fake humans when we with machines that generate human-understandable expla-
have so many real ones?” Dennett & Lambert worry nations than with opaque machines that cannot explain
that machines may become intelligent enough to be given their decisions.
control of many vital tasks, before they become intelligent In sum, building machines that learn and think like
or human-like enough to be considered responsible for the people is an ambitious project, with great potential for pos-
consequences of their behavior. itive impact: through more powerful AI systems, a deeper
We believe that trying to build more human-like intelli- understanding of our own minds, new technologies for
gence in machines could have tremendous benefits. Many easing and enhancing human cognition, and explainable
of these benefits will come from progress in AI more AI for easier communication with the technologies of the
broadly – progress that we believe would be accelerated future. As AI systems become more fully autonomous
by the project described in our target article. There are and agentive, building machines that learn and think like
also risks, but we believe these risks are not, for the foresee- people will be the best route to building machines that
able future, existential risks to humanity, or uniquely new treat people the way people want and expect to be
kinds of risks that will sneak up on us suddenly. For treated by others: with a sense of fairness, trust, kindness,
anyone worried that AI research may be making too much considerateness, and intelligence.

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms 59
of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people

References Baranes, A. & Oudeyer, P.-Y. (2013) Active learning of inverse models with intrin-
sically motivated goal exploration in robots. Robotics and Autonomous Systems
61(1):49–73. [P-YO]
[The letters “a” and “r” before author’s initials stand for target article and
Baranes, A. F., Oudeyer, P. Y. & Gottlieb, J. (2014) The effects of task difficulty,
response references, respectively] novelty and the size of the search space on intrinsically motivated exploration.
Frontiers in Neurosciences 8:1–9. [P-YO]
Abelson, R. P. & Carroll, J. D. (1965) Computer simulation of individual belief Barros, P. & Wermter, S. (2016) Developing crossmodal expression recognition
systems. The American behavioral scientist (pre-1986) 8(9):24–30. [RJS] based on a deep neural model. Adaptive Behavior 24(5):373–96. [SW]
Aitchison, L. & Lengyel, M. (2016) The Hamiltonian brain: Efficient probabilistic Barsalou, L. W. (1983) Ad hoc categories. Memory & Cognition 11(3):211–27.
inference with excitatory-inhibitory neural circuit dynamics. PLoS Computa- [aBML]
tional Biology 12(12):e1005186. [NK] Barto, A. (2013) Intrinsic motivation and reinforcement learning. In: Intrinsically
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C. & Qin, Y. (2004) motivated learning in natural and artificial systems, ed. G. Baldassarre & M.
An integrated theory of the mind. Psychological Review 111:1036–60. [RJS] Mirolli, pp. 17–47. Springer. [P-YO]
Anderson, M. L. (2003) Embodied cognition: A field guide. Artificial Intelligence 149 Bartunov, S. & Vetrov, D. P. (2016) Fast adaptation in generative models with
(1):91–130. [GB] generative matching networks. arXiv preprint 1612.02192. [SSH]
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P. & Friston, K. J.
Shillingford, B. & de Freitas, N. (2016). Learning to learn by gradient descent (2012) Canonical microcircuits for predictive coding. Neuron 76:695–711.
by gradient descent. Presented at the 2016 Neural Information Processing http://doi.org/10.1016/j.neuron.2012.10.038. [aBML, DGe]
Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in Bates, C. J., Yildirim, I., Tenenbaum, J. B. & Battaglia, P. W. (2015) Humans predict
neural information processing systems 29 (NIPS 2016), ed. D. D. Lee, M. liquid dynamics using probabilistic simulation. In: Proceedings of the 37th
Sugiyama, U. V. Luxburg, I. Guyon & R. Garnett, pp. 3981–89). Neural Annual Conference of the Cognitive Science Society, Pasadena, CA, July 22–25,
Information Processing Systems. [aBML, MB] 2015, pp. 172–77. Cognitive Science Society. [aBML]
Anselmi, F., Leibo, J. Z., Rosasco, L., Mutch, J., Tacchetti, A. & Poggio, T. (2016) Battaglia, P., Pascanu, R., Lai, M. & Rezende, D. J. (2016) Interaction networks for
Unsupervised learning of invariant representations. Theoretical Computer learning about objects, relations and physics. Presented at the 2016 Neural
Science 633:112–21. [aBML] Information Processing Systems conference, Barcelona, Spain, December 5–
Ansermin, E., Mostafaoui, G., Beausse, N. & Gaussier, P. (2016). Learning to syn- 10, 2016. In: Advances in neural information processing systems 29 (NIPS
chronously imitate gestures using entrainment effect. In: From Animals to 2016), ed. D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon & R. Garnett, pp.
Animats 14: Proceedings of the 14th International Conference on Simulation of 4502–10. Neural Information Processing Systems. [MB]
Adaptive Behavior (SAB 2016), Aberystwyth, United Kingdom, August 23–26, Battaglia, P. W., Hamrick, J. B. & Tenenbaum, J. B. (2013) Simulation as an engine
2016, ed. Tuci E, Giagkos A, Wilson M, Hallam J, pp. 219–31. Springer. [LM] of physical scene understanding. Proceedings of the National Academy of Sci-
Arbib, M. A. & Fellous, J. M. (2004) Emotions: From brain to robot. Trends in ences of the United States of America 110(45):18327–32. [arBML, ED]
Cognitive Science 8(12):554–61. [KBC] Baudiš, P. & Gailly, J.-l. (2012) PACHI: State of the art open source Go program. In:
Arnold, T. & Scheutz, M. (2016) Against the moral Turing test: Accountable design Advances in computer games: 13th International Conference, ACG 2011, Till-
and the moral reasoning of autonomous systems. Ethics and Information burg, The Netherlands, November 20–22, 2011, Revised Selected Papers, ed. H.
Technology 18(2):103–15. doi:10.1007/s10676-016-9389-x. [PMP] Jaap van den Herik & A. Plast, pp. 24–38. Springer. [aBML]
Asada, M. (2015) Development of artificial empathy. Neuroscience Research 90:41– Bauer, J., Dávila-Chacón, J. & Wermter, S. (2015) Modeling development of natural
50. [KBC] multi-sensory integration using neural self-organisation and probabilistic pop-
Asada, M., Hosoda, K., Kuniyoshi, Y., Ishiguro, H., Inui, T., Yoshikawa, Y. & ulation codes. Connection Science 27(4):358–76. [SW]
Yoshida, C. (2009) Cognitive developmental robotics: A survey. IEEE Trans- Baxter, J. (2000) A model of inductive bias learning. Journal of Artificial Intelligence
actions on Autonomous Mental Development 1(1):12–34. [P-YO] Research 12:149–98. [aBML]
Aurelius, M. (1937) Meditations, transl. G. Long. P. F. Collier & Son. [rBML] Bayer, H. M. & Glimcher, P. W. (2005) Midbrain dopamine neurons encode a
Bach, J. (2009) Principles of synthetic intelligence. PSI: An architecture of motivated quantitative reward prediction error signal. Neuron 47:129–41. [aBML]
cognition. Oxford University Press. [CDG] Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D. & Munos, R.
Bahdanau, D., Cho, K. & Bengio, Y. (2015) Neural machine translation by jointly (2016) Unifying count-based exploration and intrinsic motivation. Presented at
learning to align and translate. Presented at the International Conference on the 2016 Neural Information Processing Systems conference, Barcelona, Spain,
Learning Representations (ICLR), San Diego, CA, May 7–9, 2015. arXiv pre- December 5–10, 2016. In: Advances in neural information processing systems
print 1409.0473. Available at: http://arxiv.org/abs/1409.0473v3. [arBML] 29 (NIPS 2016), ed. D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon & R.
Baillargeon, R. (2004) Infants’ physical world. Current Directions in Psychological Garnett, pp. 1471–79. Neural Information Processing Systems. [MB, P-YO]
Science 13:89–94. [aBML] Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. (2013) The arcade learning
Baillargeon, R., Li, J., Ng, W. & Yuan, S. (2009) An account of infants physical environment: An evaluation platform for general agents. Journal of Artificial
reasoning. In: Learning and the infant mind, ed. A. Woodward & A. Neeham, Intelligence Research 47:253–79. [aBML]
pp. 66–116. Oxford University Press. [aBML] Bengio, J. (2009) Learning deep architectures for AI. Foundations and Trends in
Baily, M. N. & Bosworth, B. P. (2014) US manufacturing: Understanding its past and Machine Learning 2(1):1–127. [MBu]
its potential future. The Journal of Economic Perspectives 28(1):3–25. [DEM] Bengio, Y. (2016) Machines who learn. Scientific American 314(6):46–51. [KBC]
Baker, C. L., Jara-Ettinger, J., Saxe, R. & Tenenbaum, J. B. (2017). Rational quan- Bennis, W. M., Medin, D. L. & Bartels, D. M. (2010) The costs and benefits of
titative attribution of beliefs, desires and percepts in human mentalizing. Nature calculation and moral rules. Perspectives on Psychological Science 5(2):187–202.
Human Behaviour 1:0064. [rBML] doi:10.1177/1745691610362354. [PMP]
Baker, C. L., Saxe, R. & Tenenbaum, J. B. (2009) Action understanding as inverse Berdahl, C. H. (2010) A neural network model of Borderline Personality Disorder.
planning. Cognition 113(3):329–49. [arBML] Neural Networks 23(2):177–88. [KBC]
Baldassarre, G. (2011) What are intrinsic motivations? A biological perspective. In: Berlyne, D. E. (1966) Curiosity and exploration. Science 153(3731):25–33.
Proceedings of the International Conference on Development and Learning and doi:10.1126/science.153.3731.25 [aBML, CDG]
Epigenetic Robotics (ICDL-EpiRob-2011), ed. A. Cangelosi, J. Triesch, I. Fasel, Berthiaume, V. G., Shultz, T. R. & Onishi, K. H. (2013) A constructivist connec-
K. Rohlfing, F. Nori, P.-Y. Oudeyer, M. Schlesinger & Y. Nagai, pp. E1–8. tionist model of transitions on false-belief tasks. Cognition 126(3): 441–58.
IEEE. [GB] [aBML]
Baldassarre, G., Caligiore, D. & Mannella, F. (2013a) The hierarchical organisation Berwick, R. C. & Chomsky, N. (2016) Why only us: Language and evolution. MIT
of cortical and basal-ganglia systems: A computationally-informed review and Press. [aBML]
integrated hypothesis. In: Computational and robotic models of the hierarchical Bever, T. G. & Poeppel, D. (2010) Analysis by synthesis: A (re-) emerging program of
organisation of behaviour, ed. G. Baldassarre & M. Mirolli, pp. 237–70. research for language and vision. Biolinguistics 4:174–200. [aBML]
Springer-Verlag. [GB] Bi, G.-Q. & Poo, M.-M. (2001) Synaptic modification by correlated activity: Hebb’s
Baldassarre, G., Mannella, F., Fiore, V. G., Redgrave, P., Gurney, K. & Mirolli, M. postulate revisited. Annual Review of Neuroscience 24:139–66. [aBML]
(2013b) Intrinsically motivated action-outcome learning and goal-based action Biederman, I. (1987) Recognition-by-components: A theory of human image
recall: A system-level bio-constrained computational model. Neural Networks understanding. Psychological Review 94(2):115–47. [aBML]
41:168–87. [GB] Bienenstock, E., Cooper, L. N. & Munro, P. W. (1982) Theory for the development
Baldassarre, G. & Mirolli, M., eds. (2013) Intrinsically motivated learning in natural of neuron selectivity: Orientation specificity and binocular interaction in visual
and artificial systems. Springer. [GB, P-YO] cortex. The Journal of Neuroscience 2(1):32–48. [aBML]
Baldassarre, G., Stafford, T., Mirolli, M., Redgrave, P., Ryan, R. M. & Barto, A. Bienenstock, E., Geman, S. & Potter, D. (1997) Compositionality, MDL priors, and
(2014) Intrinsic motivations and open-ended development in animals, humans, object recognition. Presented at the 1996 Neural Information Processing
and robots: An overview. Frontiers in Psychology 5:985. [GB] Systems conference, Denver, CO, December 2–5, 1996. In: Advances in neural

60 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
information processing systems 9, ed. M. C. Mozer, M. I. Jordan & T. Petsche, Buscema, M., Tastle, W. J. & Terzi, S. (2013) Meta net: A new meta-classifier family.
pp. 838–44. Neural Information Processing Systems Foundation. [aBML] In: Data mining applications using artificial adaptive systems, ed. W. J. Tastle,
Blackburn, S. (1984) Spreading the word: Groundings in the philosophy of language. pp. 141–82. Springer. [MBu]
Oxford University Press. [NC] Buscema, M., Terzi, S. & Tastle, W.J. (2010). A new meta-classifier. In: 2010 Annual
Block, N. (1978) Troubles with functionalism. Minnesota Studies in the Philosophy of Meeting of the North American Fuzzy Information Processing Society
Science 9:261–325. [LRC] (NAFIPS), Toronto, ON, Canada, pp. 1–7. IEEE. [MBu]
Bloom, P. (2000) How children learn the meanings of words. MIT Press. [aBML] Bushdid, C., Magnasco, M. O., Vosshall, L. B. & Keller, A. (2014) Humans can dis-
Blumberg, M. S. (2005) Basic instinct: The genesis of behavior. Basic Books. [AHM] criminate more than 1 trillion olfactory stimuli. Science 343(6177):1370–72. [DEM]
Blundell, C., Uria, B., Pritzel, A., Li, Y., Ruderman, A., Leibo, J. Z., Rae, J., Wierstra, Caglar, L. R. & Hanson, S. J. (2016) Deep learning and attentional bias in human
D. & Hassabis, D. (2016) Model-free episodic control. arXiv preprint category learning. Poster presented at the Neural Computation and Psychology
1606.04460. Available at: https://arxiv.org/abs/1606.04460. [aBML, MB] Workshop on Contemporary Neural Networks, Philadelphia, PA, August 8–10,
Bobrow, D. G. & Winograd, T. (1977) An overview of KRL, a knowledge repre- 2016. [LRC]
sentation language. Cognitive Science 1:3–46. [aBML] Caligiore, D., Borghi, A., Parisi, D. & Baldassarre, G. (2010) TRoPICALS: A com-
Boden, M. A. (1998) Creativity and artificial intelligence. Artificial Intelligence putational embodied neuroscience model of compatibility effects. Psychological
103:347–56. [aBML] Review 117(4):1188–228. [GB]
Boden, M. A. (2006) Mind as machine: A history of cognitive science. Oxford Uni- Caligiore, D., Pezzulo, G., Baldassarre, G., Bostan, A. C., Strick, P. L., Doya, K.,
versity Press. [aBML] Helmich, R. C., Dirkx, M., Houk, J., Jörntell, H., Lago-Rodriguez, A., Galea, J.
Bonawitz, E., Denison, S., Griffiths, T. L. & Gopnik, A. (2014) Probabilistic models, M., Miall, R. C., Popa, T., Kishore, A., Verschure, P. F. M. J., Zucca, R. &
learning algorithms, and response variability: Sampling in cognitive develop- Herreros, I. (2016) Consensus paper: Towards a systems-level view of cerebellar
ment. Trends in Cognitive Sciences 18:497–500. [aBML] function: The interplay between cerebellum, basal ganglia, and cortex. The
Bonawitz, E., Shafto, P., Gweon, H., Goodman, N. D., Spelke, E. & Schulz, L. Cerebellum 16(1):203–29. doi: 10.1007/s12311-016-0763-3. [GB]
(2011) The double-edged sword of pedagogy: Instruction limits spontaneous Calimera, A., Macii, E. & Poncino, M. (2013) The human brain project and neuro-
exploration and discovery. Cognition 120(3):322–30. Available at: http://doi.org/ morphic computing. Functional Neurology 28(3):191–96. [KBC]
10.1016/j.cognition.2010.10.001. [MHT] Cangelosi, A. & Schlesinger, M. (2015) Developmental robotics: From babies to
Bostrom, N. (2014) Superintelligence: Paths, dangers, strategies. Oxford University robots. MIT Press. [P-YO, SW]
Press. ISBN 978-0199678112. [KBC] Cardon, A. (2006) Artificial consciousness, artificial emotions, and autonomous
Bottou, L. (2014) From machine learning to machine reasoning. Machine Learning robots. Cognitive Processes 7(4):245–67. [KBC]
94(2):133–49. [aBML] Carey, S. (1978) The child as word learner. In: Linguistic theory and psychological
Botvinick, M. M. & Cohen, J. D. (2014) The computational and neural basis of reality, ed. J. Bresnan, G. Miller & M. Halle, pp. 264–93. MIT Press. [aBML]
cognitive control: Charted territory and new frontiers. Cognitive Science Carey, S. (2004) Bootstrapping and the origin of concepts. Daedalus 133(1):59–68.
38:1249–85. [MB] [aBML]
Botvinick, M., Weinstein, A., Solway, A. & Barto, A. (2015) Reinforcement learning, Carey, S. (2009) The origin of concepts. Oxford University Press. [arBML, KDF]
efficient coding, and the statistics of natural tasks. Current Opinion in Behav- Carey, S. (2011) The origin of concepts: A précis. Behavioral and Brain Sciences 34
ioral Sciences 5:71–77. [MB] (03):113–62. [EJL]
Bouton, M. E. (2004) Context and behavioral processes in extinction. Learning & Carey, S. & Bartlett, E. (1978) Acquiring a single new word. Papers and Reports on
Memory 11:485–94. [aBML] Child Language Development 15:17–29. [aBML]
Boyd, R., Richerson, P. J. & Henrich, J. (2011) The cultural niche: Why social Chavajay, P. & Rogoff, B. (1999) Cultural variation in management of attention by
learning is essential for human adaptation. Proceedings of the National Academy children and their caregivers. Developmental Psychology 35(4):1079. [JMC]
of Sciences of the United States of America 108(suppl 2):10918–25. [MHT] Chen, X. & Yuille, A. L. (2014) Articulated pose estimation by a graphical model with
Braud, R., Mostafaoui, G., Karaouzene, A. & Gaussier, P. (2014). Simulating the image dependent pairwise relations. In: Advances in neural information pro-
emergence of early physical and social interactions: A developmental route cessing systems 27 (NIPS 2014), ed. Z. Ghahramani, M. Welling, C. Cortes, N.
through low level visuomotor learning. In: From Animal to Animats 13: Proceed- D. Lawrence & K. Q. Weinberger, pp. 1736–44. Neural Information Processing
ings of the 13th International Conference on Simulation of Adaptive Behavior, Systems Foundation. [rBML]
Castellon, Spain, July 2014, ed. A. P. del Pobil, E. Chinalleto, E. Martinez-Martin, Chen, Z. & Klahr, D. (1999) All other things being equal: Acquisition and
J. Hallam, E. Cervera & A. Morales, pp. 154–65. Springer. [LM] transfer of the control of variables strategy. Child Development 70(5):1098–120.
Breazeal, C. & Scassellati, B. (2002). Robots that imitate humans. Trends in Cog- [KDF]
nitive Sciences 6(11):481–87. [LM] Chernova, S. & Thomaz, A. L. (2014) Robot learning from human teachers. Synthesis
Brenner, L. (2016) Exploring the psychosocial impact of Ekso Bionics Technology. lectures on artificial intelligence and machine learning. Morgan & Claypool. [P-YO]
Archives of Physical Medicine and Rehabilitation 97(10):e113. [DEM] Chi, M. T., Slotta, J. D. & De Leeuw, N. (1994) From things to processes: A theory
Briegel, H. J. (2012) On creative machines and the physical origins of freedom. of conceptual change for learning science concepts. Learning and Instruction 4
Scientific Reports 2:522. [KBC] (1):27–43. [EJL]
Briggs, F. & Usrey, W. M. (2007) A fast, reciprocal pathway between the lateral Chiandetti, C., Spelke, E. S. & Vallortigara, G. (2014) Inexperienced newborn chicks
geniculate nucleus and visual cortex in the macaque monkey. The Journal of use geometry to spontaneously reorient to an artificial social partner. Develop-
Neuroscience 27(20):5431–36. [DG] mental Science 18(6):972–78. doi:10.1111/desc.12277. [ESS]
Buchsbaum, D., Gopnik, A., Griffiths, T. L. & Shafto, P. (2011) Children’s imitation Chouard, T. (2016) The Go files: AI computer wraps up 4–1 victory against human
of causal action sequences is influenced by statistical and pedagogical evidence. champion. (Online; posted March 15, 2016.) [aBML]
Cognition 120(3):331–40. Available at: http://doi.org/10.1016/j.cognition.2010. Christiansen, M. H. & Chater, N. (2016) Creating language: Integrating evolution,
12.001. [MHT] acquisition, and processing. MIT Press. [NC, SW]
Buckingham, D. & Shultz, T. R. (2000) The developmental course of distance, time, Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian,
and velocity concepts: A generative connectionist model. Journal of Cognition P., Ryu, S. I. & Shenoy, K. V. (2012) Neural population dynamics during
and Development 1(3):305–45. [aBML] reaching. Nature 487:51–56. [GB]
Buesing, L., Bill, J., Nessler, B. & Maass, W. (2011) Neural dynamics as sampling: A Ciresan, D., Meier, U. & Schmidhuber, J. (2012) Multi-column deep neural net-
model for stochastic computation in recurrent networks of spiking neurons. works for image classification. In: 2012 IEEE Conference on Computer Vision
PLoS Computational Biology 7:e1002211. [aBML] and Pattern Recognition (CVPR), Providence, RI, June 16–21, 2012, pp. 3642–
Burrell, J. (2016) How the machine ‘thinks’: Understanding opacity in machine 49. IEEE. [aBML]
learning algorithms. Big Data & Society 3(1):1–12. doi:10.1177/ Clark, K. B. (2012) A statistical mechanics definition of insight. In: Computational
2053951715622512. [PMP] intelligence, ed. A. G. Floares, pp. 139–62. Nova Science. ISBN 978-1-62081-
Buscema, M. (1995) Self-reflexive networks: Theory – topology – Applications. 901-2. [KBC]
Quality and Quantity 29(4):339–403. [MBu] Clark, K. B. (2014) Basis for a neuronal version of Grover’s quantum algorithm.
Buscema, M. (1998) Metanet*: The theory of independent judges. Substance Use Frontiers in Molecular Neuroscience 7:29. [KBC]
and Misuse 32(2):439–61. [MBu] Clark, K. B. (2015) Insight and analysis problem solving in microbes to machines.
Buscema, M. (2013) Artificial adaptive system for parallel querying of multiple Progress in Biophysics and Molecular Biology 119:183–93. [KBC]
databases. In: Intelligent data mining in law enforcement analytics, ed. M. Clark, K. B. (in press-a) Classical and quantum Hebbian learning in modeled cog-
Buscema & W. J. Tastle, pp. 481–511. Springer. [MBu] nitive processing. Frontiers in Psychology. [KBC]
Buscema, M., Grossi, E., Montanini, L. & Street, M. E. (2015) Data mining of Clark, K. B. (in press-b) Neural field continuum limits and the partitioning of cog-
determinants of intrauterine growth retardation revisited using novel algorithms nitive-emotional brain networks. Molecular and Cellular Neuroscience. [KBC]
generating semantic maps and prototypical discriminating variable profiles. Clark, K. B. (in press-c) Psychometric “Turing test” of general intelligences in social
PLoS One 10(7):e0126020. [MBu] robots. Information Sciences. [KBC]

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms of use, available at 61
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
Clark, K. B. & Hassert, D. L. (2013) Undecidability and opacity of metacognition in learning. arXiv preprint:1611.01843. Available at: https://arxiv.org/abs/1611.
animals and humans. Frontiers in Psychology 4:171. [KBC] 01843. [MB]
Cleeremans, A. (1993) Mechanisms of implicit learning: Connectionist models of Dennett, D. C. (1987) The intentional stance. MIT Press. [JMC]
sequence processing. MIT Press. [LRC] Dennett, D. C. (2013) Aching voids and making voids [Review of the book Incom-
Clegg, J. M., Wen, N. J. & Legare, C. H. (2017) Is non-conformity WEIRD? Cul- plete nature: How mind emerged from matter by T. Deacon]. The Quarterly
tural variation in adults’ beliefs about children’s competency and conformity. Review of Biology 88(4):321–24. [DCD]
Journal of Experimental Psychology: General 146(3):428–41. [JMC] Dennett, D. C. (2017) From bacteria to Bach and back: The evolution of minds.
Cohen, E. H. & Tong, F. (2015) Neural mechanisms of object-based attention. W.W. Norton. [DCD]
Cerebral Cortex 25(4):1080–92. http://doi.org/10.1093/cercor/bht303. [DGe] Denton, E., Chintala, S., Szlam, A. & Fergus, R. (2015) Deep generative image
Colagiuri, B., Schenk, L. A., Kessler, M. D., Dorsey, S. G. & Colloca, L. (2015) The models using a Laplacian pyramid of adversarial networks. Presented at the
placebo effect: from concepts to genes. Neuroscience 307:171–90. [EJL] 2015 Neural Information Processing Systems conference, Montreal, QC,
Colby, K. M. (1975) Artificial paranoia: Computer simulation of paranoid processes. Canada, In: Advances in neural information processing systems 28 (NIPS 2015),
Pergamon. [RJS] ed. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama & R. Garnett [poster].
Collins, A. G. E. & Frank, M. J. (2013) Cognitive control over learning: Creating, Neural Information Processing Systems Foundation. [aBML]
clustering, and generalizing task-set structure. Psychological Review 120 Di, G. Q. & Wu, S. X. (2015) Emotion recognition from sound stimuli based on back-
(1):190–229. [aBML] projection neural networks and electroencephalograms. Journal of the Acoustics
Collins, S., Ruina, A., Tedrake, R. & Wisse, M. (2005) Efficient bipedal robots based Society of America 138(2):994–1002. [KBC]
on passive-dynamic walkers. Science 307(5712):1082–85. [P-YO] DiCarlo, J. J., Zoccolan, D. & Rust, N. C. (2012) How does the brain solve visual
Cook, C., Goodman, N. D. & Schulz, L. E. (2011) Where science starts: Sponta- object recognition? Neuron 73(3):415–34. [NK]
neous experiments in preschoolers’ exploratory play. Cognition 120(3):341–49. Dick, P. K. (1968) Do androids dream of electric sheep? Del Ray-Ballantine. [DEM]
[arBML] Dietvorst, B. J., Simmons, J. P. & Massey, C. (2015) Algorithm aversion: People
Cooper, R. P. (2016) Executive functions and the generation of “random” sequential erroneously avoid algorithms after seeing them err. Journal of Experimental
responses: A computational account. Journal of Mathematical Psychology Psychology: General 144(1):114–26. [DCD]
73:153–68. doi: 10.1016/j.jmp.2016.06.002. [RPK] Dietvorst, B. J., Simmons, J. P. & Massey, C. (2016) Overcoming algorithm aversion:
Correa-Chávez, M. & Rogoff, B. (2009) Children’s attention to interactions directed People will use imperfect algorithms if they can (even slightly) modify them.
to others: Guatemalan Mayan and European American patterns. Developmental Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2616787.
Psychology 45(3):630. [JMC] [DCD]
Corriveau, K. H. & Harris, P. L. (2010) Preschoolers (sometimes) defer to the Diuk, C., Cohen, A. & Littman, M. L. (2008) An object-oriented representation for
majority when making simple perceptual judgments. Developmental Psychology efficient reinforcement learning. In: Proceedings of the 25th International
26:437–45. [JMC] Conference on Machine Learning (ICML’08), Helsinki, Finland, pp. 240–47.
Corriveau, K. H., Kim, E., Song, G. & Harris, P. L. (2013) Young children’s defer- ACM. [aBML]
ence to a consensus varies by culture and judgment setting. Journal of Cognition DiYanni, C. J., Corriveau, K. H., Kurkul, K., Nasrini, J. & Nini, D. (2015) The role of
and Culture 13(3–4):367–81. [JMC, rBML] consensus and culture in children’s imitation of questionable actions. Journal of
Coutinho, E., Deng, J. & Schuller, B. (2014) Transfer learning emotion manifestation Experimental Child Psychology 137:99–110. [JMC]
across music and speech. In: Proceedings of the 2014 International Joint Confer- Doeller, C. F., Barry, C. & Burgess, N. (2010) Evidence for grid cells in a human
ence on Neural Networks (IJCNN), Beijing, China. pp. 3592–98. IEEE. [SW] memory network. Nature 463(7281):657–61. doi:10.1038/nature08704. [ESS]
Crick, F. (1989) The recent excitement about neural networks. Nature 337:129–32. Doeller, C. F. & Burgess, N. (2008) Distinct error-correcting and incidental learning
[aBML] of location relative to landmarks and boundaries. Proceedings of the National
Csibra, G. (2008) Goal attribution to inanimate agents by 6.5-month-old infants. Academy of Sciences of the United States of America 105(15):5909–14. [ESS]
Cognition 107:705–17. [aBML] Doeller, C. F., King, J. A. & Burgess, N. (2008) Parallel striatal and hippocampal
Csibra, G., Biro, S., Koos, O. & Gergely, G. (2003) One-year-old infants use teleological systems for landmarks and boundaries in spatial memory. Proceedings of the
representations of actions productively. Cognitive Science 27:111–33. [aBML] National Academy of Sciences of the United States of America 105(15):5915–20.
Csibra, G. & Gergely, G. (2009) Natural pedagogy. Trends in Cognitive Sciences 13 doi:10.1073/pnas.0801489105. [ESS]
(4):148–53. [MHT] Dolan, R. J. & Dayan, P. (2013) Goals and habits in the brain. Neuron 80:312–25.
Dalrymple, D. (2016) Differentiable programming. Available at: https://www.edge. [aBML]
org/response-detail/26794. [aBML] Don, H. J., Goldwater, M. B., Otto, A. R. & Livesey, E. J. (2016) Rule abstraction,
Davies, J. (2016) Program good ethics into artificial intelligence. Nature 538(7625). model-based choice, and cognitive reflection. Psychonomic Bulletin & Review
Available at: http://www.nature.com/news/program-good-ethics-into-artificial- 23(5):1615–23. [EJL]
intelligence-1.20821. [KBC] Donahue, J., Hendricks, L. A., Guadarrama, S., Rohrbach, M., Venugopalan, S.,
Davis, E. & Marcus, G. (2014) The scope and limits of simulation in cognition. Saenko, K. & Darrell, T. (2015) Long-term recurrent convolutional networks
arXiv preprint 1506.04956. Available at: arXiv: http://arxiv.org/abs/1506.04956. for visual recognition and description. In: Proceedings of the IEEE Conference
[ED] on Computer Vision and Pattern Recognition, Boston, MA, June 7-12, 2015,
Davis, E. & Marcus, G. (2015) Commonsense reasoning and commonsense knowl- pp. 2625–34. IEEE. [SW]
edge in artificial Intelligence. Communications of the ACM 58(9):92–103. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E. & Darrell, T.
[aBML] (2014) DeCAF: A deep convolutional activation feature for generic visual rec-
Davis, E. & Marcus, G. (2016) The scope and limits of simulation in automated ognition. Presented at the International Conference on Machine Learning,
reasoning. Artificial Intelligence 233:60–72. [ED] Beijing, China, June 22–24, 2014. Proceedings of Machine Learning Research
Davoodi, T., Corriveau, K. H. & Harris, P. L. (2016) Distinguishing between realistic 32(1):647–55. [aBML]
and fantastical figures in Iran. Developmental Psychology 52(2):221. [JMC, rBML] Dörner, D. (2001) Bauplan für eine Seele [Blueprint for a soul]. Rowolt. [CDG]
Daw, N. D., Niv, Y. & Dayan, P. (2005) Uncertainty-based competition between Dörner, D. & Güss, C. D. (2013) PSI: A computational architecture of cognition,
prefrontal and dorsolateral striatal systems for behavioral control. Nature motivation, and emotion. Review of General Psycholog 17:297–317.
Neuroscience 8(12):1704–11. doi:10.1038/nn1560. [aBML, RPK] doi:10.1037/a0032947. [CDG]
Day, S. B. & Gentner, D. (2007) Nonintentional analogical inference in text com- Doshi-Velez, F. & Kim, B. (2017) A roadmap for a rigorous science of interpret-
prehension. Memory & Cognition 35:39–49. [KDF] ability. arXiv preprint 1702.08608. Available at: https://arxiv.org/abs/1702.
Dayan, P., Hinton, G. E., Neal, R. M. & Zemel, R. S. (1995) The Helmholtz 08608. [rBML]
machine. Neural Computation 7(5):889–904. [aBML] Doya, K. (1999) What are the computations of the cerebellum, the basal ganglia and
Deacon, T. (2012) Incomplete nature: How mind emerged from matter. W.W. the cerebral cortex? Neural Networks 12(7–8):961–74. [GB]
Norton. [DCD] Dreyfus, H. & Dreyfus, S. (1986) Mind over machine. Macmillan. [BJM]
Deacon, T. W. (1998) The symbolic species: The co-evolution of language and the Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I. & Abbeel, P. (2016)
brain. WW Norton. [aBML] RL2: Fast reinforcement learning via slow reinforcement learning. arXiv pre-
Dehghani, M., Tomai, E., Forbus, K. & Klenk, M. (2008) An integrated reasoning print 1611.02779. Available at: https://arxiv.org/pdf/1703.07326.pdf. [MB]
approach to moral decision-making. In: Proceedings of the 23rd AAAI National Dunbar, K. (1995) How scientists really reason: Scientific reasoning in real-world
Conference on Artificial Intelligence, vol. 3, pp. 1280–86. AAAI Press. [KDF] laboratories. In: The nature of insight, ed. R. J. Sternberg & J. E. Davidson, pp.
DeJong, G. & Mooney, R. (1986) Explanation-based learning: An alternative view. 365–95. MIT Press. [KDF]
Machine Learning 1(2):145–76. [LRC] Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R. J.
Denil, M., Agrawal, P., Kulkarni, T. D., Erez, T., Battaglia, P. & de Freitas, N. (2015) Model-based reasoning in humans becomes automatic with training.
(2016). Learning to perform physics experiments via deep reinforcement PLoS Computation Biology 11:e1004463. [aBML]

62 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
Edelman, S. (2015) The minority report: Some common assumptions to reconsider Fukushima, K. (1980) Neocognitron: A self-organizing neural network model for a
in the modelling of the brain and behaviour. Journal of Experimental & Theo- mechanism of pattern recognition unaffected by shift in position. Biological
retical Artificial Intelligence 28(4):751–76. [aBML] Cybernetics 36:193–202. [aBML]
Eden, M. (1962) Handwriting and pattern recognition. IRE Transactions on Infor- Fung, P. (2015) Robots with heart. Scientific American 313(5):60–63. [KBC]
mation Theory 8:160–66. [aBML] Funke, J. (2010) Complex problem solving: A case for complex cognition? Cognitive
Eickenberg, M., Gramfort, A., Varoquaux, G. & Thirion, B. (2016) Seeing it all: Processing 11:133–42. [CDG]
Convolutional network layers map the function of the human visual system. Gallese, V. & Lakoff, G. (2005) The brain’s concepts: The role of the sensory-motor
NeuroImage 2017;152:184–94. [NK] system in conceptual knowledge. Cognitive Neuropsychology 22(3–4):455–79.
Eliasmith, C., Stewart, T. C., Choo, X., Bekolay, T., DeWolf, T., Tang & [SW]
Y. Rasmussen, D. (2012) A large-scale model of the functioning brain. Science Gallistel, C. & Matzel, L. D. (2013) The neuroscience of learning: beyond the
338(6111):1202–05. [aBML] Hebbian synapse. Annual Review of Psychology 64:169–200. [aBML]
Eliasmith, C. & Trujillo, O. (2014) The use and abuse of large-scale brain models. Gaussier, P., Moga, S., Quoy, M. & Banquet, J. P. (1998). From perception-action
Current Opinion in Neurobiology 25:1–6. [NK] loops to imitation processes: A bottom-up approach of learning by imitation.
Elman, J. L. (1993) Learning and development in neural networks: The importance Applied Artificial Intelligence 12(7–8):701–27. [LM]
of starting small. Cognition 48(1):71–99. [SW] Gazzaniga, M. (2004) Cognitive neuroscience. MIT Press. [MBu]
Elman, J. L. (2005) Connectionist models of cognitive development: Where next? Gelly, S. & Silver, D. (2008) Achieving master level play in 9 × 9 computer Go. In:
Trends in Cognitive Sciences 9(3):111–17. [aBML] Proceedings of the Twenty-third AAAI Conference on Artificial Intelligence,
Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D. & Plun- Chicago, Illinois, July 13–17, 2008, pp. 1537–40. AAAI Press. [aBML]
kett, K. (1996) Rethinking innateness. MIT Press. [aBML] Gelly, S. & Silver, D. (2011) Monte-Carlo tree search and rapid action value esti-
Eslami, S. M., Heess, N., Weber, T., Tassa, Y., Kavukcuoglu, K. & Hinton, G. E. mation in computer go. Artificial Intelligence 175(11):1856–75. [aBML]
(2016) Attend, infer, repeat: Fast scene understanding with generative models. Gelman, A., Carlin, J. B., Stern, H. S. & Rubin, D. B. (2004) Bayesian data analysis.
Presented at the 2016 Neural Information Processing Systems conference, Chapman & Hall/CRC. [aBML]
Barcelona, Spain, December 5–10, 2016. In: Advances in Neural Information Gelman, A., Lee, D. & Guo, J. (2015) Stan: A probabilistic programming language
Processing Systems 29 (NIPS 2016), ed. D. D. Lee, M. Sugiyama, U. V. for Bayesian inference and optimization. Journal of Educational and Behavioral
Luxburg, I. Guyon & R. Garnett, pp. 3225–33. Neural Information Processing Statistics 40:530–43. [aBML]
Systems Foundation. [arBML, MB] Gelman, S. A. (2009) Learning from others: Children’s construction of concepts.
Eslami, S. M. A., Tarlow, D., Kohli, P. & Winn, J. (2014) Just-in-time learning for fast Annual Review of Psychology 60:115–40. [MHT]
and flexible inference. Presented at the 2014 Neural Information Processing Geman, S., Bienenstock, E. & Doursat, R. (1992) Neural networks and the bias/
Systems conference, Montreal, QC, Canada, December 8–13, 2014. In: variance dilemma. Neural Computation 4:1–58. [aBML]
Advances in neural information processing systems 27 (NIPS 2014), ed. Z. Gentner, D. (1983) Structure-mapping: A theoretical framework for analogy. Cog-
Ghahramani, M. Welling, C. Cortes, N. D. Lawrence & K. Q. Weinberger, nitive Science 7:155–70. (Reprinted in A. Collins & E. E. Smith, eds. Readings
pp. 1736–44. Neural Information Processing Systems Foundation. [aBML] in cognitive science: A perspective from psychology and artificial intelligence.
Fasolo, A. (2011) The theory of evolution and its impact. Springer. [DG] Kaufmann.) [KDF]
Feigenbaum, E. & Feldman, J., eds. (1995) Computers and thought. AAAI Press. Gentner, D. (2010) Bootstrapping the mind: Analogical processes and symbol
[RJS] systems. Cognitive Science 34(5):752–75. [KDF]
Flash, T., Hochner, B. (2005) Motor primitives in vertebrates and invertebrates. Gentner, D., Loewenstein, J., Thompson, L. & Forbus, K. D. (2009) Reviving inert
Current Opinion in Neurobiology 15(6):660–66. [P-YO] knowledge: Analogical abstraction supports relational retrieval of past events.
Fodor, J. A. (1975) The language of thought. Harvard University Press. [aBML] Cognitive Science 33(8):1343–82. [EJL]
Fodor, J. A. (1981) Representations: Philosophical essays on the foundations of George, D. & Hawkins, J. (2009) Towards a mathematical theory of cortical micro-
cognitive science. MIT Press. [LRC] circuits. PLoS Computational Biology 5(10):e1000532. Available at: http://doi.
Fodor, J. A. & Pylyshyn, Z. W. (1988) Connectionism and cognitive architecture: org/10.1371/journal.pcbi.1000532. [DGe]
A critical analysis. Cognition 28(1–2):3–71. [aBML, RPK, SSH] Gershman, S. J. & Goodman, N. D. (2014) Amortized inference in probabilistic
Fogel, D. B. & Fogel, L. J. (1995) Evolution and computational intelligence. IEEE reasoning. In: Proceedings of the 36th Annual Conference of the Cognitive
Transactions on Neural Networks 4:1938–41. [KBC] Science Society, Quebec City, QC, Canada, July 23–26, 2014, pp. 517–522.
Forbus, K. (2011) Qualitative modeling. Wiley Interdisciplinary Reviews: Cognitive Cognitive Science Society. [aBML]
Science 2(4):374–91. [KDF] Gershman, S. J., Horvitz, E. J. & Tenenbaum, J. B. (2015) Computational rationality:
Forbus, K., Ferguson, R., Lovett, A. & Gentner, D. (2017) Extending SME to A converging paradigm for intelligence in brains, minds, and machines. Science
handle large-scale cognitive modeling. Cognitive Science 41(5):1152–201. 34:273–78. [aBML]
doi:10.1111/cogs.12377. [KDF] Gershman, S. J., Markman, A. B. & Otto, A. R. (2014) Retrospective revaluation in
Forbus, K. & Gentner, D. 1997. Qualitative mental models: Simulations or memo- sequential decision making: A tale of two systems. Journal of Experimental
ries? Presented at the Eleventh International Workshop on Qualitative Rea- Psychology: General 143:182–94. [aBML]
soning, Cortona, Italy, June 3–6, 1997. [KDF] Gershman, S. J., Vul, E. & Tenenbaum, J. B. (2012) Multistability and perceptual
Forestier, S. & Oudeyer, P.-Y. (2016) Curiosity-driven development of tool use inference. Neural Computation 24:1–24. [aBML]
precursors: A computational model. In: Proceedings of the 38th Annual Con- Gerstenberg, T., Goodman, N. D., Lagnado, D. A. & Tenenbaum, J. B. (2015) How,
ference of the Cognitive Science Society, Philadelphia, PA, ed. A. Papafragou, whether, why: Causal judgments as counterfactual contrasts. In: Proceedings of
D. Grodner, D. Mirman & J. C. Trueswell, pp. 1859–1864. Cognitive Science the 37th Annual Conference of the Cognitive Science Society, Pasadena, CA,
Society. [P-YO] July 22–25, 2015, ed. D. C. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T.
Fornito, A., Zalesky, A. & Bullmore, E. (2016) Fundamentals of brain network Matlock, C. D. Jennings & P. P. Maglio, pp. 782–787. Cognitive Science
analysis. Academic Press. [DG] Society. [aBML, ED]
Fox, J., Cooper, R. P. & Glasspool, D. W. (2013) A canonical theory of dynamic Ghahramani, Z. (2015) Probabilistic machine learning and artificial intelligence.
decision-making. Frontiers in Psychology 4(150):1–19. doi: 10.3389/ Nature 521:452–59. [aBML]
fpsyg.2013.00150. [RPK] Giambene, G. (2005) Queuing theory and telecommunications networks and appli-
Frank, M. C. & Goodman, N. D. (2014) Inferring word meanings by assuming that cations. Springer Science + Business Media. [DG]
speakers are informative. Cognitive Psychology 75:80–96. [MHT] Gibson, J. J. (1979) The ecological approach to visual perception. Houghton Mifflin.
Frank, M. C., Goodman, N. D. & Tenenbaum, J. B. (2009) Using speakers’ refer- [DCD]
ential intentions to model early cross-situational word learning. Psychological Gick, M. L. & Holyoak, K. J. (1980) Analogical problem solving. Cognitive Psy-
Science 20:578–85. [aBML] chology 12(3):306–55. [EJL]
Franklin, S. (2007) A foundational architecture for artificial general intelligence. In: Gigerenzer, G. (2001) The adaptive toolbox. In: Bounded rationality: The adaptive
Advances in artificial general intelligence: Concepts, architectures and algo- toolbox, ed. G. Gigerenzer & R. Selten, pp. 37–50. MIT Press. [PMP]
rithms: Proceedings of the AGI Workshop 2006, ed. P. Want & B. Goertzel, pp. Gigerenzer, G. & Gaissmaier, W. (2011) Heuristic decision making. Annual Review
36–54. IOS Press. [GB] of Psychology 62:451–82. doi:10.1146/annurev-psych-120709-145346. [PMP]
Freyd, J. (1983) Representing the dynamics of a static form. Memory and Cognition Goldberg, A. E. (1995) Constructions: A construction grammar approach to argu-
11(4):342–46. [aBML] ment structure. University of Chicago Press. [NC]
Freyd, J. (1987) Dynamic mental representations. Psychological Review 94(4):427– Gombrich, E. (1960) Art and illusion. Pantheon Books. [NC]
38. [aBML] Goodfellow, I., Schlens, J. & Szegedy, C. (2015) Explaining and harnessing adver-
Friedman, S. E. and Forbus, K. D. (2010) An integrated systems approach to expla- sarial examples. Presented at International Conference on Learning Repre-
nation-based conceptual change. In: Proceedings of the 24th AAAI Conference on sentations (ICLR), San Diego, CA, May 7–9, 2015. arXiv preprint 1412.6572.
Artificial Intelligence, Atlanta, GA, July 11–15, 2010. AAAI Press. [KDF] Available at: https://arxiv.org/abs/1412.6572. [KDF]

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms of use, available at 63
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
Goodman, N. D. & Frank, M. C. (2016) Pragmatic language interpretation as Artificial Intelligence , Catalina Island , CA, ed. N. de Freitas & K. Murphy ,
probabilistic inference. Trends in Cognitive Sciences 20(11):818–29. [MHT] pp. 306–15. AUAI Press. [aBML]
Goodman, N. D., Mansinghka, V. K., Roy, D. M., Bonawitz, K. & Tenenbaum, J. B. Güçlü, U. & van Gerven, M. A. J. (2015) Deep neural networks reveal a gradient in
(2008) Church: A language for generative models. In: Proceedings of the the complexity of neural representations across the ventral stream. Journal of
Twenty-Fourth Annual Conference on Uncertainty in Artificial Intelligence, Neuroscience 35(27):10005–14. [NK]
Helsinki, Finland, July 9–12, 2008, pp. 220–29. AUAI Press. [aBML] Guergiuev, J., Lillicrap, T. P. & Richards, B. A. (2016) Toward deep learning with
Goodman, N. D., Tenenbaum, J. B., Feldman, J. & Griffiths, T. L. (2008) A rational segregated dendrites. arXiv preprint 1610.00161. Available at: http://arxiv.org/
analysis of rule-based concept learning. Cognitive Science 32(1):108–54. [rBML] pdf/1610.00161.pdf. [AHM]
Goodman, N. D., Tenenbaum, J. B. & Gerstenberg, T. (2015). Concepts in a proba- Gülçehre, Ç. & Bengio, Y. (2016) Knowledge matters: Importance of prior infor-
bilistic language of thought. In: The conceptual mind: New directions in the study mation for optimization. Journal of Machine Learning Research 17(8):1–32.
of concepts, ed. E. Margolis & S. Laurence, pp. 623–54. MIT Press. [rBML] [SSH]
Goodman, N. D., Ullman, T. D. & Tenenbaum, J. B. (2011) Learning a theory of Guo, X., Singh, S., Lee, H., Lewis, R. L. & Wang, X. (2014) Deep learning for real-
causality. Psychological Review 118(1):110–19. [rBML] time Atari game play using offline Monte-Carlo tree search planning. In:
Gopnik, A., Glymour, C., Sobel, D. M., Schulz, L. E., Kushnir, T. & Danks, D. Advances in neural information processing systems 27 (NIPS 2014), ed. Z.
(2004) A theory of causal learning in children: Causal maps and Bayes nets. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence & K. Q. Weinberger
Psychological Review 111(1):3–32. [arBML] [poster]. Neural Information Processing Systems Foundation. [aBML]
Gopnik, A. & Meltzoff, A. N. (1999) Words, thoughts, and theories. MIT Press. Güss, C. D., Tuason, M. T. & Gerhard, C. (2010) Cross-national comparisons of
[aBML] complex problem-solving strategies in two microworlds. Cognitive Science
Gottlieb, J., Oudeyer, P-Y., Lopes, M. & Baranes, A. (2013) Information seeking, 34:489–520. [CDG]
curiosity and attention:Computational and neural mechanisms. Trends in Gweon, H., Tenenbaum, J. B. & Schulz, L. E. (2010) Infants consider both the sample
Cognitive Science 17(11):585–96. [P-YO] and the sampling process in inductive generalization. Proceedings of the National
Graham, D. J. (2014) Routing in the brain. Frontiers in Computational Neuroscience Academy of Sciences of the United States of America 107:9066–71. [aBML]
8:44. [DG] Hafenbrädl, S., Waeger, D., Marewski, J. N. & Gigerenzer, G. (2016) Applied
Graham, D. J. and Rockmore, D. N. (2011) The packet switching brain. Journal of decision making with fast-and-frugal heuristics. Journal of Applied Research in
Cognitive Neuroscience 23(2):267–76. [DG] Memory and Cognition 5(2):215–31. doi:10.1016/j.jarmac.2016.04.011. [PMP]
Granger, R. (2006) Engines of the brain: The computational instruction set of human Hall, E. T. (1966) The hidden dimension. Doubleday. [SW]
cognition. AI Magazine 27(2):15. [DG] Halle, M. & Stevens, K. (1962) Speech recognition: A model and a program for
Graves, A. (2014) Generating sequences with recurrent neural networks. arXiv research. IRE Transactions on Information Theory 8(2):155–59. [aBML]
preprint 1308.0850. Available at: http://arxiv.org/abs/1308.0850. [aBML] Hamlin, K. J. (2013) Moral judgment and action in preverbal infants and toddlers:
Graves, A., Mohamed, A.-R. & Hinton, G. (2013) Speech recognition with deep Evidence for an innate moral core. Current Directions in Psychological Science
recurrent neural networks. In: IEEE International Conference on Acoustics, 22:186–93. [aBML]
Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, May 26–31, Hamlin, K. J., Ullman, T., Tenenbaum, J., Goodman, N. D. & Baker, C. (2013) The
2013, pp. 6645–49. IEEE. [aBML] mentalistic basis of core social cognition: Experiments in preverbal infants and a
Graves, A., Wayne, G. & Danihelka, I. (2014) Neural Turing machines. arXiv pre- computational model. Developmental Science 16:209–26. [aBML]
print 1410.5401v1. Available at: http://arxiv.org/abs/1410.5401v1. [arBML] Hamlin, K. J., Wynn, K. & Bloom, P. (2007) Social evaluation by preverbal infants.
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, Nature 450:57–60. [aBML]
A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A. P., Hamlin, K. J., Wynn, K. & Bloom, P. (2010) Three-month-olds show a negativity bias
Hermann, K. M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., in their social evaluations. Developmental Science 13:923–29. [aBML]
Blunsom, P., Kayukcuoglu, K. & Hassabis, D. (2016) Hybrid computing using a Hamper, B. (2008) Rivethead: Tales from the assembly line. Grand Central. [DEM]
neural network with dynamic external memory. Nature 538(7626):471–76. Hamrick, J. B., Ballard, A. J., Pascanu, R., Vinyals, O., Heess, N. & Battaglia, P. W.
[arBML, MB] (2017) Metacontrol for adaptive imagination-based optimization. In: Proceed-
Gray, H. M., Gray, K. & Wegner, D. M. (2007) Dimensions of mind perception. ings of the 5th International Conference on Learning Representations (ICLR).
Science 315(5812):619. [rBML, SW] [MB]
Gray, K. & Wegner, D. M. (2012) Feeling robots and human zombies: Mind per- Han, M. J., Lin, C. H. & Song, K. T. (2013) Robotic emotional expression generation
ception and the uncanny valley. Cognition 125(1):125–30. [rBML] based on mood transition and personality model. IEEE Transactions on
Graybiel, A. M. (2005) The basal ganglia: learning new tricks and loving it. Current Cybernetics 43(4):1290–303. [KBC]
Opinion in Neurobiology 15(6):638–44. [GB] Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R.,
Grefenstette, E., Hermann, K. M., Suleyman, M. & Blunsom, P. (2015). Learning to Satheesh, S., Shubho, S., Coates, A. & Ng, A. Y. (2014) Deep speech: Scaling up
transduce with unbounded memory. Presented at the 2015 Neural Information end-to-end speech recognition. arXiv preprint 1412.5567. Available at: https://
Processing Systems conference. In: Advances in Neural Information Processing arxiv.org/abs/1412.5567. [aBML]
Systems 28, ed. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama & R. Hanson, S. J., (1995) Some comments and variations on back-propagation. In: The
Garnett. Neural Information Processing Systems Foundation. [arBML] handbook of back-propagation, ed. Y. Chauvin & D. Rummelhart, pp. 292–323.
Gregor, K., Besse, F., Rezende, D. J., Danihelka, I. & Wierstra, D. (2016) Towards Erlbaum. [LRC]
conceptual compression. Presented at the 2016 Neural Information Processing Hanson, S. J. (2002) On the emergence of rules in neural networks. Neural Com-
Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in putation 14(9):2245–68. [LRC]
Neural Information Processing Systems 29 (NIPS 2016), ed. D. D. Lee, M. Hanson, S. J. & Burr, D. J., (1990) What connectionist models learn: Toward a theory
Sugiyama, U. V. Luxburg, I. Guyon & R. Garnett [poster]. Neural Information of representation in connectionist networks. Behavioral and Brain Sciences
Processing Systems Foundation. [aBML] 13:471–518. [LRC]
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J. & Wierstra, D. (2015) DRAW: Hanson, S. J., Caglar, L. R. & Hanson, C. (under review) The deep history of deep
A recurrent neural network for image generation. Presented at the 32nd Annual learning. [LRC]
International Conference on Machine Learning (ICML’15), Lille, France, July Harkness, S., Blom, M., Oliva, A., Moscardino, U., Zylicz, P. O., Bermudez, M. R. &
7–9, 2015. Proceedings of Machine Learning Research 37:1462–71. [aBML] Super, C. M. (2007) Teachers’ ethnotheories of the ‘ideal student’ in five
Griffiths, T. L., Chater, N., Kemp, C., Perfors, A. & Tenenbaum, J. B. (2010) western cultures. Comparative Education 43(1):113–35. [JMC]
Probabilistic models of cognition: Exploring representations and inductive Harlow, H. F. (1949) The formation of learning sets. Psychological Review 56(1):51–
biases. Trends in Cognitive Sciences 14(8):357–64. [arBML] 65. [aBML]
Griffiths, T. L. & Tenenbaum, J. B. (2005) Structure and strength in causal induc- Harlow, H. F. (1950) Learning and satiation of response in intrinsically motivated
tion. Cognitive Psychology 51(4):334–84. [rBML] complex puzzle performance by monkeys. Journal of Comparative and Physi-
Griffiths, T. L. & Tenenbaum, J. B. (2009) Theory-based causal induction. Psycho- ological Psychology 43:289–94. [aBML]
logical Review 116(4):661–716. [rBML] Harris, P. L. (2012) Trusting what you’re told: How children learn from others.
Griffiths, T. L., Vul, E. & Sanborn, A. N. (2012) Bridging levels of analysis for Belknap Press of Harvard University Press. [JMC]
probabilistic models of cognition. Current Directions in Psychological Science Haslam, N. (2006) Dehumanization: An integrative review. Personality and Social
21:263–68. [aBML] Psychology Review 10(3):252–64. [rBML]
Grossberg, S. (1976) Adaptive pattern classification and universal recoding: I. Parallel Hasnain, S.K., Mostafaoui, G. & Gaussier, P. (2012). A synchrony-based perspective
development and coding of neural feature detectors. Biological Cybernetics for partner selection and attentional mechanism in human-robot interaction.
23:121–34. [aBML] Paladyn, Journal of Behavioral Robotics 3(3):156–71. [LM]
Grosse, R., Salakhutdinov, R., Freeman, W. T. & Tenenbaum, J. B. (2012) Hasnain, S. K., Mostafaoui, G., Salesse, R., Marin, L. & Gaussier, P. (2013) Intuitive
Exploiting compositionality to explore a large space of model structures. In: human robot interaction based on unintentional synchrony: A psycho-experi-
Proceedings of the Twenty-Eighth Annual Conference on Uncertainty in mental study. In: Proceedings of the IEEE 3rd Joint Conference on Development

64 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
and Learning and on Epigenetic Robotics, Osaka, Japan, August 2013, pp. 1–7. Houk, J. C., Adams, J. L. & Barto, A. G. (1995) A model of how the basal ganglia
Hal Archives-ouvertes. [LM] generate and use neural signals that predict reinforcement. In: Models of
Hauser, M. D., Chomsky, N. & Fitch, W. T. (2002) The faculty of language: what is information processing in the basal ganglia, ed. J. C. Houk, J. L. Davids & D. G.
it, who has it, and how did it evolve? Science 298:1569–79. [aBML] Beiser, pp. 249–70. MIT Press. [GB]
Hayes, P. J. (1974) Some problems and non-problems in representation theory. In: Huang, Y. & Rao, R. P. (2014) Neurons as Monte Carlo samplers: Bayesian?
Proceedings of the 1st summer conference on artificial intelligence and simula- inference and learning in spiking networks Presented at the 2014 Neural
tion of behaviour, pp. 63–79. IOS Press. [LRC] Information Processing Systems conference, Montreal, QC, Canada.
Hayes-Roth, B. & Hayes-Roth, F. (1979) A cognitive model of planning. Cognitive In: Advances in neural information processing systems 27 (NIPS 2014),
Science 3:275–310. [aBML] ed. Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence & K. Q. Wein-
He, K., Zhang, X., Ren, S. & Sun, J. (2016) Deep residual learning for image recog- berger, pp. 1943–51. Neural Information Processing Systems Foundation.
nition. In Proceedings of the IEEE Conference on Computer Vision and Pattern [aBML]
Recognition. Las Vegas, NV, June 27–30, 2016. pp. 770–78. IEEE. [aBML] Hubel, D. H. & Wiesel, T. N. (1959) Receptive fields of single neurons in the cat’s
Hebb, D. O. (1949) The organization of behavior. Wiley. [aBML] striate cortex. Journal of Physiology 124:574–91. [ESS]
Heess, N., Tarlow, D. & Winn, J. (2013) Learning to pass expectation propagation Hummel, J. E. & Biederman, I. (1992) Dynamic binding in a neural network for
messages. Presented at the 25th International Conference on Neural Infor- shape recognition. Psychological Review 99(3):480–517. [aBML]
mation Processing Systems, Lake Tahoe, NV, December 3–6, 2012. Hurley, M., Dennett, D. C. & Adams, R., (2011) Inside jokes: Using humor to
In: Advances in Neural Information Processing Systems 25 (NIPS 2012), reverse-engineer the mind. MIT Press. [DCD]
ed. F. Pereira, C. J. C. Burges, L. Bottou & K. Q. Weinberger, pp. 3219–27. Hutson, M. (2017) In bots we distrust. Boston Globe, p. K4. [DCD]
Neural Information Processing Systems Foundation. [aBML] Indiveri, G. & Liu, S.-C. (2015) Memory and information processing in neuromor-
Heinrich, S. (2016) Natural language acquisition in recurrent neural architectures. phic systems. Proceedings of the IEEE 103(8):1379–97. [KBC]
Ph.D. thesis, Universität Hamburg, DE. [SW] Indurkhya, B. & Misztal-Radecka, J. (2016) Incorporating human dimension in
Henrich, J. (2015) The secret of our success: How culture is driving human evolution, autonomous decision-making on moral and ethical issues. In: Proceedings of
domesticating our species, and making us smarter. Princeton University Press. the AAAI Spring Symposium: Ethical and Moral Considerations in Non-
[JMC] human Agents, Palo Alto, CA, ed. B. Indurkhya & G. Stojanov. AAAI Press.
Henrich, J., Heine, S. J. & Norenzayan, A. (2010) The weirdest people in the world? [PMP]
Behavioral and Brain Sciences 33(2–3):61–83. [JMC] Irvine, A. D. & Deutsch, H. (2016) Russell’s paradox. In: The Stanford encyclopedia
Herrmann, E., Call, J., Hernandez-Lloreda, M. V., Hare, B. & Tomasello, M. (2007) of philosophy (Winter 2016 Edition), ed. E. N. Zalta. Available at: https://plato.
Humans have evolved specialized skills of social cognition: The cultural intel- stanford.edu/archives/win2016/entries/russell-paradox. [NC]
ligence hypothesis. Science 317(5843):1360–66. [DCD] Jackendoff, R. (2003) Foundations of language. Oxford University Press. [aBML]
Herrmann, E., Hernandez-Lloreda, M. V., Call, J., Hare, B. & Tomasello, M. (2010) Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D. &
The structure of individual differences in the cognitive abilities of children and Kavukcuoglu, K. (2016) Reinforcement learning with unsupervised auxiliary
chimpanzees. Psychological Science 21(1):102–10. [DCD] tasks. Presented at the 5th International Conference on Learning Representa-
Hertwig, R. & Herzog, S. M. (2009) Fast and frugal heuristics: Tools of social ratio- tions, Palais des Congrès Neptune, Toulon, France, April 24–26, 2017. arXiv
nality. Social Cognition 27(5):661–98. doi:10.1521/soco.2009.27.5.661. [PMP] preprint 1611.05397. Available at: https://arxiv.org/abs/1611.05397. [P-YO]
Hespos, S. J. & Baillargeon, R. (2008) Young infants’ actions reveal their developing Jain, A., Tompson, J., Andriluka, M., Taylor, G. W. & Bregler, C. (2014). Learning
knowledge of support variables: Converging evidence for violation-of-expecta- human pose estimation features with convolutional networks. Presented at the
tion findings. Cognition 107:304–16. [aBML] International Conference on Learning Representations (ICLR), Banff, Canada,
Hespos, S. J., Ferry, A. L. & Rips, L. J. (2009) Five-month-old infants have different April 14–16, 2014. arXiv preprint 1312.7302. Available at: https://arxiv.org/abs/
expectations for solids and liquids. Psychological Science 20(5):603–11. [aBML] 1312.7302. [rBML]
Hinrichs, T. & Forbus, K. (2011) Transfer learning through analogy in games. AI Jara-Ettinger, J., Gweon, H., Schulz, L. E. & Tenenbaum, J. B. (2016) The naïve
Magazine 32(1):72–83. [KDF] utility calculus: Computational principles underlying commonsense psychology.
Hinton, G. E. (2002) Training products of experts by minimizing contrastive diver- Trends in Cognitive Sciences 20(8):589–604. doi:10.1016/j.tics.2016.05.011.
gence. Neural Computation 14(8):1771–800. [aBML] [PMP]
Hinton, G. E., Dayan, P., Frey, B. J. & Neal, R. M. (1995) The “wake-sleep” Jara-Ettinger, J., Gweon, H., Tenenbaum, J. B. & Schulz, L. E. (2015) Children’s
algorithm for unsupervised neural networks. Science 268(5214):1158–61. understanding of the costs and rewards underlying rational action. Cognition
[aBML] 140:14–23. [aBML]
Hinton, G. E., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., Senior, A., Jern, A. & Kemp, C. (2013) A probabilistic account of exemplar and category gen-
Vanhoucke, V., Nguyen, P., Sainath, T. & Kingsbury, B. (2012) Deep neural eration. Cognitive Psychology 66(1):85–125. [aBML]
networks for acoustic modeling in speech recognition. IEEE Signal Processing Jern, A. & Kemp, C. (2015) A decision network account of reasoning about other
Magazine 29:82–97. [aBML] peoples choices. Cognition 142:12–38. [aBML]
Hinton, G. E., Osindero, S. & Teh, Y. W. (2006) A fast learning algorithm for deep Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z. & Hughes, M.
belief nets. Neural Computation 18:1527–54. [aBML] (2016) Google’s multilingual neural machine translation system: Enabling zero-
Hiolle, A., Lewis, M. & Cañamero, L. (2014) Arousal regulation and affective shot translation. arXiv preprint 1611.04558. Available at: https://arxiv.org/abs/
adaptation to human responsiveness by a robot that explores and learns a novel 1611.04558. [SSH]
environment. Frontiers in Neurorobotics 8:17. [KBC] Johnson, S. C., Slaughter, V. & Carey, S. (1998) Whose gaze will infants follow? The
Ho, Y-C. & Pepyne, D. L. (2002) Simple explanation of the no-free-lunch theorem elicitation of gaze-following in 12-month-olds. Developmental Science 1:233–38.
and its implications. Journal of Optimization Theory and Applications [aBML]
115:549–70. [AHM] Jonge, M. de & Racine, R. J. (1985) The effects of repeated induction of long-term
Hochreiter, S. A., Younger, S. & Conwell, P. R. (2001) Learning to learn using gradient potentiation in the dentate gyrus. Brain Research 328:181–85. [aBML]
descent. In: International Conference on Artificial Neural Network—ICANN Juang, B. H. & Rabiner, L. R. (1990) Hidden Markov models for speech recognition.
2001, ed. G. Dorffner, H. Bischoff & K. Hornik, pp. 87–94. Springer. [MB] Technometric 33(3):251–72. [aBML]
Hoffman, D. D. (2000) Visual intelligence: How we create what we see. W. Kahneman, D. (2011) Thinking, fast and slow. Macmillan. [MB]
W. Norton. [NC] Kahou, S. E., Pal, C., Bouthillier, X., Froumenty, P., Gülçehre, Ç., Memisevic, R.,
Hoffman, D. D. & Richards, W. A. (1984) Parts of recognition. Cognition 18:65–96. Vincent, P., Courville, A. & Bengio, Y. (2013) Combining modality specific deep
[aBML] neural networks for emotion recognition in video. In: Proceedings of the 15th
Hoffman, M., Yoeli, E. & Nowak, M. A. (2015) Cooperate without looking: Why we ACM International Conference on Multimodal Interaction, Koogee Beach,
care what people think and not just what they do. Proceedings of the National Sydney, Australia, pp. 543–50. ACM. [rBML]
Academy of Sciences of the United States of America 112(6):1727–32. Kaipa, K. N., Bongard, J. C. & Meltzoff, A. N. (2010) Self discovery enables robot
doi:10.1073/pnas.1417904112. [PMP] social cognition: Are you my teacher? Neural Networks 23(8–9):1113–24.
Hofstadter, D. R. (1985) Metamagical themas: Questing for the essence of mind and [KBC]
pattern. Basic Books. [aBML] Karpathy, A. & Fei-Fei, L. (2017) Deep visual-semantic alignments for generating
Hofstadter, D. R. (2001) Epilogue: Analogy as the core of cognition. In: The ana- image descriptions. IEEE Transactions on Pattern Analysis and Machine
logical mind: perspectives from cognitive science, ed. D. Gentner, K. J. Holyoak Intelligence 39(4):664–76. [aBML]
& B. N. Kokinov, pp. 499–538. MIT Press. [NC] Kawato, M., Kuroda, S. & Schweighofer, N. (2011) Cerebellar supervised learning
Horgan, T. & J. Tienson, (1996) Connectionism and the philosophy of psychology. revisited: biophysical modeling and degrees-of-freedom control. Current
MIT Press. [LRC] Opinion in Neurobiology 21(5):791–800. [GB]
Horst, J. S. & Samuelson, L. K. (2008) Fast mapping but poor retention by Keller, N. & Katsikopoulos, K. V. (2016) On the role of psychological heuristics in
24-month-old infants. Infancy 13(2):128–57. [aBML] operational research; and a demonstration in military stability operations.

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms of use, available at 65
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
European Journal of Operational Research 249(3):1063–73. doi:10.1016/j. Lake, B. M., Lawrence, N. D. & Tenenbaum, J. B. (2016) The emergence of
ejor.2015.07.023. [PMP] organizing structure in conceptual representation. arXiv preprint 1611.09384.
Kellman, P. J. & Spelke, E. S. (1983) Perception of partly occluded objects in Available at: http://arxiv.org/abs/1611.09384. [MB, rBML]
infancy. Cognitive Psychology 15(4):483–524. [rBML] Lake, B. M., Lee, C.-Y., Glass, J. R. & Tenenbaum, J. B. (2014) One-shot learning of
Kemp, C. (2007) The acquisition of inductive constraints. Unpublished doctoral generative speech concepts. In: Proceedings of the 36th Annual Conference of
dissertation, Massachusetts Institute of Technology. [aBML] the Cognitive Science Society, Quebec City, QC, Canada, July 23–26, 2014,
Kemp, C., Perfors, A. & Tenenbaum, J. B. (2007) Learning overhypotheses pp. 803–08. Cognitive Science Society. [aBML]
with hierarchical Bayesian models. Developmental Science 10(3):307–21. Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. (2012) Concept learning as
[rBML] motor program induction: A large-scale empirical study. In: Proceedings of the
Kemp, C. & Tenenbaum, J. B. (2008) The discovery of structural form. Proceedings 34th Annual Conference of the Cognitive Science Society, Sapporo, Japan,
of the National Academy of Sciences of the United States of America 105 August 1–4, 2012, pp. 659–64. Cognitive Science Society. [aBML]
(31):10687–92. [rBML] Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. (2015a) Human-level concept
Keramati, M., Dezfouli, A. & Piray, P. (2011) Speed/accuracy trade-off between the learning through probabilistic program induction. Science 350(6266):1332–38.
habitual and the goal-directed processes. PLoS Computational Biology 7: [arBML, MB, ED, NK]
e1002055. [aBML] Lake, B. M., Zaremba, W., Fergus, R. & Gureckis, T. M. (2015b) Deep neural
Khaligh-Razavi, S. M. & Kriegeskorte, N. (2014) Deep supervised, but not unsu- networks predict category typicality ratings for images. In: Proceedings of the
pervised, models may explain IT cortical representation. PLoS Computational 37th Annual Meeting of the Cognitive Science Society, Pasadena, CA, July
Biology 10(11):e1003915. [aBML, NK] 22–25, 2015. Cognitive Science Society. ISBN: 978-0-9911967-2-2. [aBML]
Kidd, C., Piantadosi, S. T. & Aslin, R. N. (2012) The Goldilocks effect: Human Lakoff, G. & Johnson, M. (2003) Metaphors we live by, 2nd ed. University of Chicago
infants allocate attention to visual sequences that are neither too simple nor too Press. [SW]
complex. PLoS One 7(5):e36399. [P-YO] Lambert, A (2011) The gates of hell: Sir John Franklin’s tragic quest for the North-
Kiddon, C., Zettlemoyer, L. & Choi, Y. (2016). Globally coherent text generation west Passage. Yale University Press. [MHT]
with neural checklist models. In: Proceedings of the 2016 Conference on Empirical Landau, B., Smith, L. B. & Jones, S. S. (1988) The importance of shape in early
Methods in Natural Language Processing, Austin, Texas, November 1–5, 2016, pp. lexical learning. Cognitive Development 3(3):299–321. [aBML]
329–39. Association for Computational Linguistics. [rBML] Lande, T. S., ed. (1998) Neuromorphic systems engineering: Neural networks in
Kilner, J. M., Friston, K. J. & Frith, C. D. (2007) Predictive coding: An account of the silicon. Kluwer International Series in Engineering and Computer Science, vol.
mirror neuron system. Cognitive Processing 8(3):159–66. [aBML] 447. Kluwer Academic. ISBN 978-0-7923-8158-7. [KBC]
Kingma, D. P., Rezende, D. J., Mohamed, S. & Welling, M. (2014) Semi-supervised Langley, P., Bradshaw, G., Simon, H. A. & Zytkow, J. M. (1987) Scientific discovery:
learning with deep generative models. Presented at the 2014 Neural Informa- Computational explorations of the creative processes. MIT Press. [aBML]
tion Processing Systems conference, Montreal, QC, Canada. In: Advances in Laptev, I., Marszalek, M., Schmid, C. & Rozenfeld, B. (2008) Learning realistic
neural information processing systems 27 (NIPS 2014), ed. Z. Ghahramani, M. human actions from movies. In: Proceedings of the IEEE Conference on Com-
Welling, C. Cortes, N. D. Lawrence & K. Q. Weinberger [spotlight]. Neural puter Vision and Pattern Recognition, Anchorage, AK, June 23–28, 2008 (CVPR
Information Processing Systems Foundation. [aBML] 2008), pp. 1–8. IEEE. [SW]
Kiraly, I., Csibra, G. & Gergely, G. (2013) Beyond rational imitation: Learning Larson, H. J., Cooper, L. Z., Eskola, J., Katz, S. L. & Ratzan, S. (2011) Addressing
arbitrary means actions from communicative demonstrations. Journal of the vaccine confidence gap. The Lancet 378(9790):526–35. [EJL]
Experimental Child Psychology 116(2):471–86. [DCD] Lázaro-Gredilla, M., Liu, Y., Phoenix, D. S. & George, D. (2016) Hierarchical
Kline, M. A. (2015) How to learn about teaching: An evolutionary framework for the compositional feature learning. arXiv preprint 1611.02252. Available at: http://
study of teaching behavior in humans and other animals. Behavioral and Brain arxiv.org/abs/1611.02252. [DGe]
Sciences 2015;38:e31. [JMC, MHT] LeCun, Y., Bengio, Y. & Hinton, G. (2015) Deep learning. Nature 521:436–44.
Koch, G., Zemel, R. S. & Salakhutdinov, R. (2015) Siamese neural networks for one- [aBML]
shot image recognition. Presented at the Deep Learning Workshop at the 2015 LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. &
International Conference on Machine Learning, Lille, France. Available at: Jackel, L. D. (1989) Backpropagation applied to handwritten zip code recog-
https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf. [aBML] nition. Neural Computation 1:541–51. [arBML]
Kodratoff, Y. & Michalski, R. S. (2014) Machine: earning: An artificial intelligence LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998) Gradient-based learning
approach, vol. 3. Morgan Kaufmann. [aBML] applied to document recognition. Proceedings of the IEEE 86(11):2278–323.
Kolodner, J. (1993) Case-based reasoning. Morgan Kaufmann. [NC] [aBML]
Koza, J. R. (1992) Genetic programming: On the programming of computers by Lee, T. S. (2015) The visual system’s internal model of the world. Proceedings of the
means of natural selection, vol. 1. MIT press. [aBML] IEEE 103(8):1359–78. Available at: http://doi.org/10.1109/JPROC.2015.
Kriegeskorte, N. (2015) Deep neural networks: A new framework for modeling 2434601. [DGe]
biological vision and brain information processing. Annual Review of Vision Legare, C. H. & Harris, P. L. (2016) The ontogeny of cultural learning. Child
Science 1:417–46. [aBML] Development 87(3):633–42. [JMC]
Kriegeskorte, N. & Diedrichsen, J. (2016) Inferring brain-computational mech- Lenat, D. & Guha, R. V. (1990) Building large. Knowledge based systems: Repre-
anisms with models of activity measurements. Philosophical Transactions of the sentation and inference in the Cyc project. Addison-Wesley. [LRC]
Royal Society of London Series B: Biological Sciences 371(1705):489–95. [NK] Lenat, D., Miller, G. & Yokoi, T (1995) CYC, WordNet, and EDR: Critiques and
Kriegeskorte, N., Mur, M. & Bandettini, P. (2008) Representational similarity responses. Communications of the ACM 38(11):45–48. [LRC]
analysis – Connecting the branches of systems neuroscience. Frontiers in Lerer, A., Gross, S. & Fergus, R. (2016) Learning physical intuition of block towers
Systems Neuroscience 2:4. doi: 10.3389/neuro.06.004.2008. [NK] by example. Presented at the 33rd International Conference on Machine
Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). ImageNet classification with Learning. Proceedings of Machine Learning Research 48:430–08. [aBML]
deep convolutional neural networks. Presented at the 25th International Con- Levy, R. P., Reali, F. & Griffiths, T. L. (2009) Modeling the effects of memory on
ference on Neural Information Processing Systems, Lake Tahoe, NV, human online sentence processing with particle filters. Presented at the 2008
December 3–6, 2012. In: Advances in Neural Information Processing Systems Neural Information Processing Systems conference. Vancouver, BC, Canada,
25 (NIPS 2012), ed. F. Pereira, C. J. C. Burges, L. Bottou & K. Q. Weinberger, December 8–10, 2008. In: Advances in neural information processing systems
pp. 1097–105. Neural Information Processing Systems Foundation. [arBML, 21 (NIPS 2008), pp. 937–44. Neural Information Processing Systems. [aBML]
MB, NK, SSH] Lewandowsky, S., Ecker, U. K., Seifert, C. M., Schwarz, N. & Cook, J. (2012)
Kulkarni, T. D., Kohli, P., Tenenbaum, J. B. & Mansinghka, V. (2015a) Picture: A Misinformation and its correction continued influence and successful debiasing.
probabilistic programming language for scene perception. In: Proceedings of the Psychological Science in the Public Interest 13(3):106–31. [EJL]
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Lewis-Kraus, G. (2016) Going neural. New York Times Sunday Magazine 40–49+,
Boston, MA, June 7–12, 2015, pp. 4390–99. IEEE. [aBML] December 18, 2016. [DEM]
Kulkarni, T. D., Narasimhan, K. R., Saeedi, A. & Tenenbaum, J. B. (2016) Hierar- Liang, C. and Forbus, K. (2015) Learning plausible inferences from semantic web
chical deep reinforcement learning: Integrating temporal abstraction and knowledge by combining analogical generalization with structured logistic
intrinsic motivation. arXiv preprint 1604.06057. Available at: https://arxiv.org/ regression. In: Proceedings of the 29th AAAI Conference on Artificial Intelli-
abs/1604.06057. [aBML, P-YO] gence, Austin, TX. AAAI Press. [KDF]
Kulkarni, T. D., Whitney, W., Kohli, P. & Tenenbaum, J. B. (2015b) Deep convo- Liao, Q., Leibo, J. Z. & Poggio, T. (2015) How important is weight symmetry in
lutional inverse graphics network. arXiv preprint 1503.03167. Available at: backpropagation? arXiv preprint arXiv:1510.05067. Available at: https://arxiv.
https://arxiv.org/abs/1503.03167. [aBML] org/abs/1510.05067. [aBML]
Lake, B. M. (2014) Towards more human-like concept learning in machines: Com- Liberman, A. M., Cooper, F. S., Shankweiler, D. P. & Studdert-Kennedy, M.
positionality, causality, and learning-to-learn. Unpublished doctoral disserta- (1967) Perception of the speech code. Psychological Review 74(6):431–61.
tion, Massachusetts Institute of Technology. [aBML] [aBML]

66 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
Lillicrap, T. P., Cownden, D., Tweed, D. B. & Akerman, C. J. (2014) Random Marin, L., Issartel, J. & Chaminade, T. (2009). Interpersonal motor coordination:
feedback weights support learning in deep neural networks. arXiv pre- From human-human to human-robot interactions. Interaction Studies 10
print:1411.0247. Available at: https://arxiv.org/abs/1411.0247. [aBML] (3):479–504. [LM]
Lindeman, M. (2011) Biases in intuitive reasoning and belief in complementary and Markman, A. B. & Makin, V. S. (1998) Referential communication and category
alternative medicine. Psychology and Health 26(3):371–82. [EJL] acquisition. Journal of Experimental Psychology: General 127(4):331–54.
Lisman, J. E. & Grace, A. A. (2005) The hippocampal-VTA loop: Controlling the [aBML]
entry of information into long-term memory. Neuron 46:703–13. [GB] Markman, A. B. & Ross, B. H. (2003) Category use and category learning. Psycho-
Liu, D., Wellman, H. M., Tardif, T., & Sabbagh, M. A. (2008). Theory of mind logical Bulletin 129(4):592–613. [aBML]
development in Chinese children: A meta-analysis of false-belief understanding Markman, E. M. (1989) Categorization and naming in children. MIT Press. [aBML]
across cultures and languages. Developmental Psychology 44(2):523–31. Avail- Marr, D. (1982/2010). Vision. MIT Press. [ESS]
able at: http://dx.doi.org/10.1037/0012-1649.44.2.523. [JMC, rBML] Marr, D. (1982) Vision: A computational investigation into the human representation
Lloyd, J., Duvenaud, D., Grosse, R., Tenenbaum, J. & Ghahramani, Z. (2014) and processing of visual information. MIT Press. [SSH]
Automatic construction and natural-language description of nonparametric Marr, D. (1983) Vision. W. H. Freeman. [KDF]
regression models. In: Proceedings of the national conference on artificial Marr, D. C. (1982) Vision. W. H. Freeman. [aBML]
intelligence 2:1242–50. [aBML] Marr, D. C. & Nishihara, H. K. (1978) Representation and recognition of the spatial
Logan, G. D. (1988) Toward an instance theory of automatization. Psychological organization of three-dimensional shapes. Proceedings of the Royal Society of
Review 95(4):492–527. [NC] London Series B: Biological Sciences 200(1140):269–94. [aBML]
Lombrozo, T. (2009) Explanation and categorization: How “why?” informs “what?”. Mascalzoni, E., Regolin, L. & Vallortigara, G. (2010). Innate sensitivity for self-
Cognition 110(2):248–53. [aBML] propelled causal agency in newly hatched chicks. Proceedings of the National
Lombrozo, T. (2016) Explanatory preferences shape learning and inference. Trends Academy of Sciences of the United States of America 107(9):4483–85. [ESS]
in Cognitive Sciences 20(10):748–59. [rBML] Maslow, A. (1954) Motivation and personality. Harper & Brothers. [CDG]
Lopes, M. & Oudeyer, P.-Y. (2012) The strategic student approach for life-long Matute, H., Blanco, F., Yarritu, I., Díaz-Lago, M., Vadillo, M. A. & Barberia, I.
exploration and learning. In: IEEE International Conference on Development (2015) Illusions of causality: How they bias our everyday thinking and how they
and Learning and Epigenetic Robotics (ICDL), San Diego, CA, November 7–9, could be reduced. Frontiers in Psychology 6:888. doi: 10.3389/
2012, pp. 1–8. IEEE. [P-YO] fpsyg.2015.00888. [EJL]
Lopes, M. & Santos-Victor, J. (2007). A developmental roadmap for learning by Mayer, J. D. & Salovey, P. (1993) The intelligence of emotional intelligence. Intel-
imitation in robots. IEEE Transactions on Systems, Man, and Cybernetics, Part ligence 17:442–43. [RJS]
B: Cybernetics 37(2):308–21. [LM] Mazur, J. E. & Hastie, R. (1978) Learning as accumulation: A reexamination of the
Lopez-Paz, D., Bottou, L., Scholköpf, B. & Vapnik, V. (2016) Unifying distillation learning curve. Psychological Bulletin 85:1256–74. [LRC]
and privileged information. Presented at the International Conference on McCarthy, J. (1959) Programs with common sense at the Wayback machine
Learning Representations (ICLR), San Juan, Puerto Rico, May 2–4, 2016. arXiv (archived October 4, 2013). In: Proceedings of the Teddington Conference on
preprint 1511.03643v3. Available at: https://arxiv.org/abs/1511.03643. [aBML] the Mechanization of Thought Processes, pp. 756–91. AAAI Press. [LRC]
Lopez-Paz, D., Muandet, K., Scholköpf, B. & Tolstikhin, I. (2015) Towards a McCarthy, J. & Hayes, P. J. (1969) Some philosophical problems from the standpoint
learning theory of cause-effect inference. Presented at the 32nd International of artificial intelligence. In: Machine Intelligence 4, ed. B. Meltzer & D. Michie,
Conference on Machine Learning (ICML), Lille, France, July 7–9, 2015. Pro- pp. 463–502. Edinburgh University Press. [LRC]
ceedings of Machine Learning Research 37:1452–61. [aBML] McClelland, J. L. (1988) Parallel distributed processing: Implications for cognition
Loughnan, S. & Haslam, N. (2007) Animals and androids implicit associations and development [technical report]. Defense Technical Information Center
between social categories and nonhumans. Psychological Science 18(2):116–21. document. Available at: http://www.dtic.mil/get-tr-doc/pdf?AD=ADA219063.
[rBML] [aBML]
Lovett, A. & Forbus, K. (2017) Modeling visual problem solving as analogical rea- McClelland, J. L., Botvinick, M. M., Noelle, D. C., Plaut, D. C., Rogers, T. T.,
soning. Psychological Review 124(1):60–90. [KDF] Seidenberg, M. S. & Smith, L. B. (2010) Letting structure emerge: Connec-
Lungarella, M., Metta, G., Pfeifer, R. & Sandini, G. (2003) Developmental robotics: tionist and dynamical systems approaches to cognition. Trends in Cognitive
A survey. Connection Science 15:151–90. [BJM] Sciences 14(8):348–56. [arBML]
Luong, M.-T., Le, Q. V., Sutskever, I., Vinyals, O. & Kaiser, L. (2015) Multi-task McClelland, J. L., McNaughton, B. L. & O’Reilly, R. C. (1995) Why there are
sequence to sequence learning. arXiv preprint 1511.06114. Available at: https:// complementary learning systems in the hippocampus and neocortex: Insights
arxiv.org/pdf/1511.06114.pdf. [aBML] from the successes and failures of connectionist models of learning and
Lupyan, G. & Bergen, B. (2016) How language programs the mind. Topics in memory. Psychological Review 102(3):419–57. [arBML]
Cognitive Science 8(2):408–24. [aBML] McClelland, J. L. & Rumelhart, D. E. (1986) Parallel distributed processing:
Lupyan, G. & Clark, A. (2015) Words and the world: Predictive coding and the Explorations in the microstructure of cognition, Vol. 2. MIT Press. [aBML]
language perception-cognition interface. Current Directions in Psychological McFate, C. & Forbus, K. (2016) An analysis of frame semantics of continuous pro-
Science 24(4):279–84. [aBML] cesses. Proceedings of the 38th Annual Conference of the Cognitive Science
Macindoe, O. (2013) Sidekick agents for sequential planning problems. Unpublished Society, Philadelphia, PA, ed. A. Papafragou, D. Grodner, D. Mirman & J. C.
doctoral dissertation, Massachusetts Institute of Technology. [aBML] Trueswell, pp. 836–41. Cognitive Science Society. [KDF]
Mackenzie, D. (2012) A flapping of wings. Science 335(6075):1430–33. [DEM] McFate, C. J., Forbus, K. & Hinrichs, T. (2014) Using narrative function to extract
Magid, R. W., Sheskin, M. & Schulz, L. E. (2015) Imagination and the generation of qualitative information from natural language texts. Proceedings of the 28th
new ideas. Cognitive Development 34:99–110. [aBML] AAAI Conference on Artificial Intelligence, Québec City, Canada, July 27–31,
Mahoor, Z. MacLennan, B. & MacBride, A. (2016) Neurally plausible motor bab- 2014, pp. 373–379. AAAI Press. [KDF]
bling in robot reaching. In: The 6th Joint IEEE International Conference on McShea, D. W. (2013) Machine wanting. Studies on the History and Philosophy of
Development and Learning and on Epigenetic Robotics, September 19–22, Biological and Biomedical Sciences 44(4 pt B):679–87. [KBC]
2016, Cergy-Pontoise/Paris, pp. 9–14. IEEE. [BJM] Medin, D. L. & Ortony, A. (1989). Psychological essentialism. In: Similarity and
Malle, B. F. & Scheutz, M. (2014) Moral competence in social robots. In: Proceedings analogical reasoning, ed. S. Vosniadou & A. Ortony, pp. 179–95. Cambridge
of the 2014 IEEE International Symposium on Ethics in Science, Technology and University Press. [rBML]
Engineering. IEEE. doi:10.1109/ETHICS.2014.6893446. [PMP] Medin, D. L. & Schaffer, M. M. (1978) Context theory of classification learning.
Mannella, F. & Baldassarre, G. (2015) Selection of cortical dynamics for motor Psychological Review 85(3):207–38. [NC]
behaviour by the basal ganglia. Biological Cybernetics 109:575–95. [GB] Mejía-Arauz, R., Rogoff, B. & Paradise, R. (2005) Cultural variation in children’s
Mannella, F., Gurney, K. & Baldassarre, G. (2013) The nucleus accumbens as a observation during a demonstration. International Journal of Behavioral
nexus between values and goals in goal-directed behavior: A review and a new Development 29(4):282–91. [JMC]
hypothesis. Frontiers in Behavioral Neuroscience 7(135):e1–29. [GB] Meltzoff, A. N. (2007). ‘Like me’: A foundation for social cognition. Developmental
Mansinghka, V., Selsam, D. & Perov, Y. (2014) Venture: A higher-order probabilistic Science 10(1):126–34. [LM]
programming platform with programmable inference. arXiv preprint Meltzoff, A. N., Kuhl, P. M., Movellan, J. & Sejnowski, T. J. (2009) Foundations for a
1404.0099. Available at: https://arxiv.org/abs/1404.0099 [aBML] new science of learning. Science 325(5938):284–88. [KBC]
Marblestone, A. H., Wayne, G. & Kording, K. P. (2016) Toward an integration of Meltzoff, A. N. & Moore, M. K. (1995) Infants’ understanding of people and things:
deep learning and neuroscience. Frontiers in Computational Neuroscience From body imitation to folk psychology. In: The body and the self, ed. J. L.
10:94. [AHM, NK] Bermúdez, A. Marcel & N. Eilan, pp. 43–70. MIT Press. [JMC]
Marcus, G. (1998) Rethinking eliminative connectionism. Cognitive Psychology 282 Meltzoff, A. N. & Moore, M. K. (1997) Explaining facial imitation: a theoretical
(37):243–82. [aBML] model. Early Development and Parenting 6:179–92. [BJM]
Marcus, G. (2001) The algebraic mind: Integrating connectionism and cognitive Mesoudi, A., Chang, L., Murray, K. & Lu, H. J. (2015) Higher frequency of social
science. MIT Press. [aBML] learning in China than in the West shows cultural variation in the dynamics of

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms of use, available at 67
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
cultural evolution. Proceeding of the Royal Society of London Series B: Bio- Moser, E., Kropff, E. & Moser, M. B. (2008). Place cells, grid cells, and the brain’s
logical Sciences 282(1798):20142209. [JMC] spatial representation system. Annual Review of Neuroscience 31:69–89.
Metcalfe, J., Cottrell, G. W. & Mencl, W. E. (1992) Cognitive binding: A compu- [ESS]
tational-modeling analysis of a distinction between implicit and explicit memory. Moulin-Frier, C., Nguyen, M. & Oudeyer, P.-Y. (2014) Self-organization of early
Journal of Cognitive Neuroscience 4(3):289–98. [LRC] vocal development in infants and machines: The role of intrinsic motivation.
Mikolov, T., Joulin, A. & Baroni, M. (2016) A roadmap towards machine intelligence. Frontiers in Psychology 4:1006. Available at: http://dx.doi.org/10.3389/fpsyg.
arXiv preprint 1511.08130. Available at: http://arxiv.org/abs/1511.08130. [arBML] 2013.01006. [P-YO]
Mikolov, T., Sutskever, I. & Chen, K. (2013) Distributed representations of words Murphy, G. L. (1988) Comprehending complex concepts. Cognitive Science 12
and phrases and their compositionality. Presented at the 2013 Neural Infor- (4):529–62. [aBML]
mation Processing Systems conference, Lake Tahoe, NV, December 5–10, Murphy, G. L. & Medin, D. L. (1985) The role of theories in conceptual coherence.
2013. In: Advances in Neural Information Processing Systems 26 (NIPS), ed Psychological Review 92(3):289–316. [arBML]
C. J. C. Burges, L. Bottou, M. Welling, Z. Ghagramani & K. Q. Weinberger Murphy, G. L. & Ross, B. H. (1994) Predictions from uncertain categorizations.
[poster].Neural Information Processing Systems Foundation. [aBML] Cognitive Psychology 27:148–93. [aBML]
Miller, E. G., Matsakis, N. E. & Viola, P. A. (2000) Learning from one example Nagai, Y., Kawai, Y. & Asada, M. (2011). Emergence of mirror neuron system:
through shared densities on transformations. In: Proceedings of the IEEE Immature vision leads to self-other correspondence. Proceedings of the 1st Joint
Conference on Computer Vision and Pattern Recognition, Hilton Head Island, IEEE International Conference on Development and Learning and on Epige-
SC, June 15, 2000. IEEE. [aBML] netic Robotics, Vol. 2, pp. 1–6. IEEE. [LM]
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D. & Miller, K. J. (1990) Intro- Nakayama, K., Shimojo, S. & Silverman, G. H. (1989) Stereoscopic depth: Its rela-
duction to WordNet: An on-line lexical database. International Journal of tion to image segmentation, grouping, and the recognition of occluded objects.
Lexicography 3(4):235–44. [LRC] Perception 18:55–68. [rBML]
Miller, G. A. & Johnson-Laird, P. N. (1976) Language and perception. Belknap Neisser, U. (1966) Cognitive psychology. Appleton-Century-Crofts. [aBML]
Press. [aBML] Newell, A. (1990) Unified theories of cognition. Harvard University Press. [RPK]
Milner, D. & Goodale, M. (2006) The visual brain in action. Oxford University Press. Newell, A., Shaw, J. C. & Simon, H. A. (1957) Problem solving in humans and
[GB] computers. Carnegie Technical 21(4):34–38. [RJS]
Minsky, M. (1986) The society of mind. Simon and Schuster. [MBu] Newell, A. & Simon, H. (1956) The logic theory machine. A complex information
Minsky, M. (2003) Semantic information processing. MIT Press. [RJS] processing system. IRE Transactions on Information Theory 2(3):61–79.
Minsky, M. & Papert, S. A. (1987) Perceptrons: An introduction to computational [LRC]
geometry, expanded edn. MIT Press. [RJS] Newell, A. & Simon, H. A. (1961) GPS, A program that simulates human thought.
Minsky, M. L. (1974) A framework for representing knowledge. MIT-AI Laboratory Defense Technical Information Center. [aBML]
Memo 306. [aBML] Newell, A. & Simon, H. A. (1972) Human problem solving. Prentice-Hall. [aBML]
Minsky, M. L. & Papert, S. A. (1969) Perceptrons: An introduction to computational Nguyen, M. & Oudeyer, P.-Y. (2013) Active choice of teachers, learning strategies
geometry. MIT Press. [aBML] and goals for a socially guided intrinsic motivation learner, Paladyn Journal of
Mirolli, M., Mannella, F. & Baldassarre, G. (2010) The roles of the amygdala in the Behavioural Robotics 3(3):136–46. [P-YO]
affective regulation of body, brain and behaviour. Connection Science 22 Nguyen-Tuong, D. & Peters, J. (2011) Model learning for robot control: A survey.
(3):215–45. [GB] Cognitive Processing 12(4):319–40. [P-YO]
Mišić, B., Sporns, O. & McIntosh, A. R. (2014) Communication efficiency and Nisbett, R. E. & Ross, L. (1980) Human inference: Strategies and shortcomings of
congestion of signal traffic in large-scale brain networks. PLoS Computational social judgment. Prentice-Hall. ISBN 0-13-445073-6. [KBC, NC]
Biology 10(1):e1003427. [DG] Niv, Y. (2009) Reinforcement learning in the brain. Journal of Mathematical Psy-
Mitchell, T. M., Keller, R. R. & Kedar-Cabelli, S. T. (1986) Explanation-based chology 53:139–54. [aBML]
generalization: A unifying view. Machine Learning 1:47–80. [aBML] Norman, D. A. & Shallice, T. (1986) Attention to action: willed and automatic control
Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S. & Floridi, L. (2016) The ethics of behaviour. In: Advances in research: Vol. IV. Consciousness and self regu-
of algorithms: Mapping the debate. Big Data & Society 3(2):1–21. doi:10.1177/ lation, ed. R. Davidson, G. Schwartz & D. Shapiro. Plenum. [RPK]
2053951716679679. [PMP] Oaksford, M. & Chater, N. (1991) Against logicist cognitive science. Mind and
Mix, K. S. (1999) Similarity and numerical equivalence: Appearances count. Cog- Language 6(1):1–38. [NC]
nitive Development 14:269–97. [KDF] O’Donnell, T. J. (2015) Productivity and reuse in language: A theory of linguistic
Mnih, A. & Gregor, K. (2014) Neural variational inference and learning in belief computation and storage. MIT Press. [aBML]
networks. Presented at the 31st International Conference on Machine Learn- O’Keefel (2014). Nobel lecture: Spatial cells in the hippocampal formation. Available
ing, Beijing, China, June 22–24, 2014. Proceedings of Machine Learning at: http://www.nobelprize.org/nobel_prizes/medicine/laureates/2014/okeefe-
Research 32:1791–99. [aBML] lecture.html. [ESS]
Mnih, V., Heess, N., Graves, A. & Kavukcuoglu, K. (2014). Recurrent models of O’Keefe, J. & Nadel, L. (1978). The hippocampus as a cognitive map. Oxford Uni-
visual attention. Presented at the 28th Annual Conference on Neural Infor- versity Press. [ESS]
mation Processing Systems, Montreal, Canada. In: Advances in Neural Infor- Olshausen, B. A., Anderson, C. H. & Van Essen, D. C. (1993) A neurobiological
mation Processing Systems 27(NIPS 2014), ed. Z. Ghahramani, M. Welling, model of visual attention and invariant pattern recognition based on dynamic
C. Cortes, N. D. Lawrence & K. Q. Weinberger. Neural Information Processing routing of information. The Journal of Neuroscience 13(11):4700–19. [DG]
Systems Foundation. [arBML] Ong, D. C., Zaki, J. & Goodman, N. D. (2015) Affective cognition: Exploring lay
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. & theories of emotion. Cognition 143:141–62. [rBML]
Riedmiller, M. (2013) Playing Atari with deep reinforcement learning. arXiv O’Regan, J.K. (2011). Why red doesn’t sound like a bell: Understanding the feel of
preprint 1312.5602. Available at: https://arxiv.org/abs/1312.5602. [SSH] consciousness. Oxford University Press. [LM]
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Osherson, D. N. & Smith, E. E. (1981) On the adequacy of prototype theory as a
Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, theory of concepts. Cognition 9(1):35–58. [aBML]
C., Sadik, A., Antonoglous, I., King, H., Kumaran, D., Wierstra, D. & Hassabis, Otto, A. R., Skatova, A., Madlon-Kay, S. & Daw, N. D. (2015) Cognitive control
D. (2015) Human-level control through deep reinforcement learning. Nature predicts use of model-based reinforcement learning. Journal of Cognitive
518(7540):529–33. [arBML, MB, DGe] Neuroscience 27:319–33. [EJL]
Moeslund, T. B., Hilton, A. & Krüger, V. (2006) A survey of advances in vision-based Oudeyer, P.-Y. (2016) What do we learn about development from baby robots?
human motion capture and analysis. Computer Vision and Image Understand- WIREs Cognitive Science 8(1–2):e1395. Available at: http://www.pyoudeyer.
ing 104(2):90–126. [rBML] com/oudeyerWiley16.pdf. doi: 10.1002/wcs.1395. [P-YO]
Mogenson, G. J., Jones, D. L. & Yim, C. Y. (1980) From motivation to action: Oudeyer, P.-Y., Baranes, A. & Kaplan, F. (2013) Intrinsically motivated learning of
Functional interface between the limbic system and the motor system. Progress real-world sensorimotor skills with developmental constraints. In: Intrinsically
in Neurobiology 14(2–3):69–97. [GB] motivated learning in natural and artificial systems, ed. G. Baldassarre & M.
Mohamed, S. & Rezende, D. J. (2015) Variational information maximisation for Mirolli, pp. 303–65. Springer. [P-YO]
intrinsically motivated reinforcement learning. Presented at the 2015 Neural Oudeyer, P.-Y., Kaplan, F. & Hafner, V. (2007) Intrinsic motivation systems for
Information Processing Systems conference, Montreal, QC, Canada, December autonomous mental development. IEEE Transactions on Evolutionary Com-
7–12, 2015. Advances in Neural Information Processing Systems 28 (NIPS 2015), putation 11(2):265–86. [P-YO]
ed. C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama & R. Garnett, pp. 2125– Oudeyer, P-Y. & Smith, L. (2016) How evolution may work through curiosity-driven
33. Neural Information Processing Systems Foundation. [aBML] developmental process. Topics in Cognitive Science 8(2):492–502. [P-YO]
Moreno-Bote, R., Knill, D. C. & Pouget, A. (2011) Bayesian sampling in visual Palmer, S. (1999) Vision science: Photons to phenomenology. MIT Press. [KDF]
perception. Proceedings of the National Academy of Sciences of the United Parisotto, E., Ba, J. L. & Salakhutdinov, R. (2016) Actor-mimic: Deep multitask and
States of America 108:12491–96. [aBML] transfer reinforcement learning. Presented at the International Conference on

68 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
Learning Representations (ICLR), San Juan, Puerto Rico. May 2–5, 2016. arXiv Rehling, J. A. (2001) Letter spirit (part two): Modeling creativity in a visual domain.
preprint 1511.06342v4. Available at: https://www.google.com/search?q=arXiv% Unpublished doctoral dissertation, Indiana University. [aBML]
3A+preprint+1511.06342v4&ie=utf-8&oe=utf-8. [aBML] Rezende, D. J., Mohamed, S., Danihelka, I., Gregor, K. & Wierstra, D. (2016) One-
Parker, S. T. & McKinney, M. L. (1999) Origins of intelligence: The evolution of shot generalization in deep generative models. Presented at the International
cognitive development in monkeys, apes and humans. Johns Hopkins University Conference on Machine Learning, New York, NY, June 20–22, 2016. Pro-
Press. ISBN 0-8018-6012-1. [KBC] ceedings of Machine Learning Research 48:1521–29. [arBML, MB]
Pecevski, D., Buesing, L. & Maass, W. (2011) Probabilistic inference in general Rezende, D. J., Mohamed, S. & Wierstra, D. (2014) Stochastic backpropagation and
graphical models through sampling in stochastic networks of spiking neurons. approximate inference in deep generative models. Presented at the Interna-
PLoS Computational Biology 7:e1002294. [aBML] tional Conference on Machine Learning (ICML), Beijing, China, June 22–24,
Penhune, V. B. & Steele, C. J. (2012) Parallel contributions of cerebellar, striatal and 2014. Proceedings of Machine Learning Research 32:1278–86. [aBML]
M1 mechanisms to motor sequence learning. Behavioural Brain Research 226 Richland, L. E. & Simms, N. (2015) Analogy, higher order thinking, and education.
(2):579–91. [GB] Wiley Interdisciplinary Reviews: Cognitive Science 6(2):177–92. [KDF]
Peterson, J. C., Abbott, J. T. & Griffiths, T. L. (2016) Adapting deep network fea- Rips, L. J. (1975) Inductive judgments about natural categories. Journal of Verbal
tures to capture psychological representations. In: Proceedings of the 38th Learning and Verbal Behavior 14(6):665–81. [aBML]
Annual Conference of the Cognitive Science Society, Philadelphia, Pennsylva- Rips, L. J. & Hespos, S. J. (2015) Divisions of the physical world: Concepts of objects
nia, August 10–13, 2016, ed. A. Papafragou, Daniel J. Grodner, D. Mirman & and substances. Psychological Bulletin 141:786–811. [aBML]
J. Trueswell, pp. 2363–68. Cognitive Science Society. [aBML] Rock, I. (1983) The logic of perception. MIT Press. [NC]
Pfeifer, R. & Gómez, G. (2009) Morphological computation–connecting brain, body, Rogers, T. T. & McClelland, J. L. (2004) Semantic cognition. MIT Press. [aBML]
and environment. In: Creating brain-like intelligence, ed. B. Sendhoff, E Rogoff, B. (2003) The cultural nature of human development. Oxford University
Körner, H. Ritter & K. Doya, pp. 66–83. Springer. [GB] Press. [JMC]
Pfeifer, R., Lungarella, M. & Iida, F. (2007) Self-organization, embodiment, and Rohlfing, K. J. & Nomikou, I. (2014) Intermodal synchrony as a form of maternal
biologically inspired robotics. Science 318(5853):1088–93. [P-YO] responsiveness: Association with language development. Language, Interaction
Piantadosi, S. T. (2011) Learning and the language of thought. Unpublished doctoral and Acquisition 5(1):117–36. [SW]
dissertation, Massachusetts Institute of Technology. [aBML] Romanes, G. J. (1884) Animal intelligence. Appleton. [KBC]
Pinker, S. (2007) The stuff of thought: Language as a window into human nature. Rosenblatt, F. (1958) The perceptron: A probabilistic model for information
Penguin. [aBML] storage and organization in the brain. Psychological Review 65:386–408. [aBML]
Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel Rougier, N. P., Noelle, D. C., Braver, T. S., Cohen, J. D. & O’Reilly, R. C. (2005)
distributed processing model of language acquisition. Cognition 28:73–193. Prefrontal cortex and flexible cognitive control: Rules without symbols. Pro-
[aBML] ceedings of the National Academy of Sciences of the United States of America
Poggio, T. (1984) Routing thoughts. Massachusetts Institute of Technology Artificial 102(20):7338–43. [aBML]
Intelligence Laboratory Working Paper 258. [DG] Rozenblit, L. & Keil, F. (2002) The misunderstood limits of folk science: An illusion
Power, J. M., Thompson, L. T., Moyer, J. R. & Disterhoft, J. F. (1997) Enhanced of explanatory depth. Cognitive Science 26(5):521–62. [EJL, NC]
synaptic transmission in ca1 hippocampus after eyeblink conditioning. Journal Ruciński, M. (2014) Modelling learning to count in humanoid robots. Ph.D. thesis,
of Neurophysiology 78:1184–87. [aBML] University of Plymouth, UK. [SW]
Prasada, S. & Pinker, S. (1993) Generalizations of regular and irregular morphology. Rumelhart, D. E., Hinton, G. & Williams, R. (1986a) Learning representations by
Language and Cognitive Processes 8(1):1–56. [LRC] back-propagating errors. Nature 323(9):533–36. [aBML]
Pratt, G. (2016, December 6). Presentation to Professor Deb Roy’s class on machine Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English
learning and society at the MIT Media Lab. Class presentation that was vid- verbs. In: Parallel distributed processing: Explorations in the microstructure of
eotaped but has not been made public. [DCD] cognition, Vol. 1, ed. Rumelhart, D. F., McClelland, J. L. & PDP Research
Premack, D. & Premack, A. J. (1997) Infants attribute value to the goal-directed Group, pp. 216–71. MIT Press. [aBML]
actions of self-propelled objects. Cognitive Neuroscience 9(6):848–56. doi: Rumelhart, D. E., McClelland, J. L. & PDP Research Group. (1986b) Parallel dis-
10.1162/jocn.1997.9.6.848. [aBML] tributed processing: Explorations in the microstructure of cognition, Vol. 1. MIT
Putnam, H. (1967) Psychophysical predicates. In: Art, mind, and religion, ed. W. Press. [aBML]
Capitan & D. Merrill. University of Pittsburgh Press. (Reprinted in 1975 as The Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Kar-
nature of mental states, pp. 429–40. Putnam.) [LRC] pathy, A., Khosla, A., Bernstein, M., Berg, A.C. & Fei-Fei, L. (2015) ImageNet
Ranzato, M., Szlam, A., Bruna, J., Mathieu, M., Collobert, R. & Chopra, S. (2016) large scale visual recognition. International Journal of Computer Vision 115
Video (language) modeling: A baseline for generative models of natural videos. (3):211–52. [aBML]
arXiv preprint 1412.6604. Available at: https://www.google.com/search? Russell, S. & Norvig, P. (2003) Artificial intelligence: A modern approach. Prentice–
q=arXiv+preprint+1412.6604&ie=utf-8&oe=utf-8. [MB] Hall. [aBML]
Raposo, D., Santoro, A., Barrett, D. G. T., Pascanu, R., Lillicrap, T. & Battaglia, P. Rusu, A. A., Rabinowitz, N. C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavuk-
(2017) Discovering objects and their relations from entangled scene representa- cuoglu, K., Pascanu, R. & Hadsell, R. (2016) Progressive neural networks. arXiv
tions. Presented at the Workshop Track at the International Conference on preprint 1606.04671. Available at: http://arxiv.org/abs/1606.04671. [aBML]
Learning Representations, Toulon, France, April 24–26, 2017. arXiv preprint Ryan, R. M. & Deci, E. L. (2007) Intrinsic and extrinsic motivations: classic defini-
1702.05068. Available at: https://openreview.net/pdf?id=Bk2TqVcxe. [MB, rBML] tions and new directions. Contemporary Educational Psychology 25:54–67.
Ravi, S. & Larochelle, H. (2017) Optimization as a model for few-shot learning. [aBML]
Presented at the International Conference on Learning Representations, Salakhutdinov, R., Tenenbaum, J. & Torralba, A. (2012) One-shot learning with a
Toulon, France, April 24–26, 2017. Available at: https://openreview.net/pdf? hierarchical nonparametric Bayesian model. JMLR Workshop on Unsupervised
id=rJY0-Kcll. [MB] and Transfer Learning 27:195–207. [aBML]
Read, S. J., Monroe, B. M., Brownstein, A. L., Yang, Y., Chopra, G. & Miller, L. C. Salakhutdinov, R., Tenenbaum, J. B. & Torralba, A. (2013) Learning with hierar-
(2010) A neural network model of the structure and dynamics of human per- chical-deep models. IEEE Transactions on Pattern Analysis and Machine
sonality. Psychological Reviews 117(1):61–92. [KBC] Intelligence 35(8):1958–71. [aBML]
Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y. L., Le, Q. & Kurakin, A. Salakhutdinov, R., Torralba, A. & Tenenbaum, J. (2011) Learning to share visual
(2017) Large-scale evolution of image classifiers. arXiv preprint 1703.01041. appearance for multiclass object detection. In: Proceedings of the 2011 IEEE
Available at: https://arxiv.org/abs/1703.01041. [rBML] Conference on Computer Vision and Pattern Recognition (CVPR), Colorado
Redgrave, P. & Gurney, K. (2006) The short-latency dopamine signal: a role in Springs, CO, June 20–25, 2011, pp. 1481–88. IEEE. [aBML]
discovering novel actions? Nature Reviews Neuroscience 7:967–75. [GB] Sanborn, A. N., Mansingkha, V. K. & Griffiths, T. L. (2013) Reconciling intuitive
Reed, S. & de Freitas, N. (2016) Neural programmer-interpreters. Presented at the physics and Newtonian mechanics for colliding objects. Psychological Review
4th International Conference on Learning Representations (ICLR), San Juan, 120(2):411–37. [aBML, ED]
Puerto Rico, May 2–5, 2016. arXiv preprint 1511.06279. Available at: https:// Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. (2016). Meta-
arxiv.org/abs/1511.06279. [arBML, MB] learning with memory-augmented neural networks. Presented at the 33rd
Regolin, L., Vallortigara, G. & Zanforlin, M. (1995). Object and spatial representa- International Conference on Machine Learning, New York, NY, June 19–24,
tions in detour problems by chicks. Animal Behaviour 49:195–99. [ESS] 2016. Proceedings of Machine Learning Research 48:1842–50. [MB, rBML]
Rehder, B. (2003) A causal-model theory of conceptual representation and catego- Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016). One-
rization. Journal of Experimental Psychology: Learning, Memory, and Cogni- shot learning with memory-augmented neural networks. arXiv preprint
tion 29(6):1141–59. [aBML] 1605.06065. Available at: https://arxiv.org/abs/1605.06065. [SSH]
Rehder, B. & Hastie, R. (2001) Causal knowledge and categories: The effects of Santucci, V. G., Baldassarre, G. & Mirolli, M. (2016), GRAIL: A goal-discovering
causal beliefs on categorization, induction, and similarity. Journal of Experi- robotic architecture for intrinsically-motivated learning, IEEE Transactions on
mental Psychology: General 130(3):323–60. [aBML] Cognitive and Developmental Systems 8(3):214–31. [GB]

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms of use, available at 69
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
Saxe, A. M., McClelland, J. L. & Ganguli, S. (2013) Dynamics of learning in deep Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V. D.,
linear neural networks. Presented at the NIPS 2013 Deep Learning Workshop, Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman,
Lake Tahoe, NV, December 9, 2013. [LRC] S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach,
Saxe, A. M., McClelland, J. L. & Ganguli, S. (2014) Exact solutions to the nonlinear M., Kavukcuoglu, K, Graepel, T. & Hassabis, D. (2016) Mastering the game of
dynamics of learning in deep linear neural networks. Presented at the Inter- go with deep neural networks and tree search. Nature 529(7585):484–89.
national Conference on Learning Representations, Banff, Canada, April 14–16, [arBML, MB]
2014. arXiv preprint 1312.6120. Available at:https://arxiv.org/abs/1312.6120. Silver, D., van Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-
[LRC] Arnold, G. Reichert, D., Rabinowitz, N., Barreto, A. & Degris, T. (2017) The
Scellier, B. & Bengio, Y. (2016) Towards a biologically plausible backprop. arXiv predictron: End-to-end learning and planning. In: Proceedings of the 34rd
preprint 1602.05179. Available at: https://arxiv.org/abs/1602.05179v2. [aBML] International Conference on Machine Learning, Sydney, Australia, ed. M. F.
Schank, R. C. (1972) Conceptual dependency: A theory of natural language under- Balcan & K. Q. Weinberger. [MB]
standing. Cognitive Psychology 3:552–631. [aBML] Silverman, R. D. & Hendrix, K. S. (2015) Point: Should childhood vaccination against
Schaul, T., Quan, J., Antonoglou, I. & Silver, D. (2016) Prioritized experience replay. measles be a mandatory requirement for attending school? Yes. CHEST Journal
Presented at International Conference on Learning Representations (ICLR), 148(4):852–54. [EJL]
San Diego, CA, May 7–9, 2015. arXiv preprint 1511.05952. Available at: https:// Simon, H. A. (1967) Motivational and emotional controls of cognition. Psychological
arxiv.org/abs/1511.05952. [aBML, MB] Review 74:29–39. [CDG]
Schlegel, A., Alexander, P. & Peter, U. T. (2015) Information processing in the Sizemore, A., Giusti, C., Betzel, R. F. & Bassett, D. S. (2016) Closures and cavities in
mental workspace is fundamentally distributed. Journal of Cognitive Neuro- the human connectome. arXiv preprint 1608.03520. Available at: https://arxiv.
science 28(2):295–307. [DG] org/abs/1608.03520. [DG]
Schlottmann, A., Cole, K., Watts, R. & White, M. (2013) Domain-specific perceptual Smith, L. B., Jones, S. S., Landau, B., Gershkoff-Stowe, L. & Samuelson, L. (2002)
causality in children depends on the spatio-temporal configuration, not motion Object name learning provides on-the-job training for attention. Psychological
onset. Frontiers in Psychology 4:365. [aBML] Science 13(1):13–19. [aBML]
Schlottmann, A., Ray, E. D., Mitchell, A. & Demetriou, N. (2006) Perceived physical Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y. & Potts, C.
and social causality in animated motions: Spontaneous reports and ratings. Acta (2013) Recursive deep models for semantic compositionality over a sentiment
Psychologica 123:112–43. [aBML] treebank. In: Proceedings of the Conference on EmpiricalMethods in Natural
Schmidhuber, J. (1991) Curious model-building control systems. Proceedings of the Language Processing (EMNLP), Seattle, WA, vol. 1631, p. 1642. Association for
IEEE International Joint Conference on Neural Networks 2:1458–63. [P-YO] Computational Linguistics. [rBML]
Schmidhuber, J. (2015) Deep learning in neural networks: An overview. Neural Solomon, K., Medin, D. & Lynch, E. (1999) Concepts do more than categorize.
Networks 61:85–117. [aBML] Trends in Cognitive Sciences 3(3):99–105. [aBML]
Schmidt, R.C. & Richardson, M.J. (2008). Dynamics of interpersonal coordination. Spelke, E. S. (1990) Principles of object perception. Cognitive Science 14(1):29–56.
In: Coordination: Neural, behavioural and social dynamics, ed. A. Fuchs & V. [aBML, JMC]
Jirsa, pp. 281–307. Springer-Verlag. [LM] Spelke, E. S. (2003) What makes us smart? Core knowledge and natural language.
Scholl, B. J. & Gao, T. (2013) Perceiving animacy and intentionality: Visual pro- Spelke ES. What makes us smart? Core knowledge and natural language. In:
cessing or higher-level judgment? In: Social perception: detection and inter- Language in mind: Advances in the Investigation of language and thought, ed.
pretation of animacy, agency, and intention, ed. M. D. Rutherford & V. A. D. Gentner & S. Goldin-Meadow, pp. 277–311. MIT Press. [arBML]
Kuhlmeier. MIT Press Scholarship Online. [aBML] Spelke, E. S., Gutheil, G. & Van de Walle, G. (1995) The development of object
Schuller, I. K., Stevens, R. & Committee Chairs (2015) Neuromorphic computing: perception. In: An invitation to cognitive science: vol. 2. Visual cognition, 2nd
From materials to architectures. Report of a roundtable convened to consider ed. pp. 297–330. Bradford. [aBML]
neuromorphic computing basic research needs. Office of Science, U.S. Spelke, E. S. & Kinzler, K. D. (2007) Core knowledge. Developmental Science 10
Department of Energy. [KBC] (1):89–96. [arBML]
Schultz, W., Dayan, P. & Montague, P. R. (1997) A neural substrate of prediction Spelke, E. S. & Lee, S. A. (2012). Core systems of geometry in animal minds.
and reward. Science 275:1593–99. [aBML] Philosophical Transactions of the Royal Society, B: Biological Sciences 367
Schulz, L. (2012a) Finding new facts; thinking new thoughts. Rational constructivism (1603):2784–93. [ESS]
in cognitive development. Advances in Child Development and Behavior Squire, L. (1992). Memory and the hippocampus: A synthesis from findings with rats,
43:269–94. [rBML] monkeys and humans. Psychological Review 99(2):195–231. [ESS]
Schulz, L. (2012b) The origins of inquiry: Inductive inference and exploration in Srivastava, N. & Salakhutdinov, R. (2013) Discriminative transfer learning with tree-
early childhood. Trends in Cognitive Sciences 16(7):382–89. [arBML] based priors. Presented at the 2013 Neural Information Processing Systems
Schulz, L. E., Gopnik, A. & Glymour, C. (2007) Preschool children learn about conference, Lake Tahoe, NV, December 5–10, 2013. In: Advances in Neural
causal structure from conditional interventions. Developmental Science 10:322– Information Processing Systems 26 (NIPS 2013), ed. C J. C. Burges, L. Bottou,
32. [aBML] M. Welling, Z. Ghagramani & K. Q. Weinberger [poster]. Neural Information
Scott, R. (Director). (2007) Blade Runner: The Final Cut: Warner Brothers (original Processing Systems Foundation. [aBML]
release, 1982). [DEM] Stadie, B. C., Levine, S. & Abbeel, P. (2016) Incentivizing exploration in rein-
Scott, S. H. (2004) Optimal feedback control and the neural basis of volitional motor forcement learning with deep predictive models. arXiv preprint 1507.00814.
control. Nature Reviews Neuroscience 5(7):532–46. [GB] Available at: http://arxiv.org/abs/1507.00814. [aBML]
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R. & LeCun, Y. (2014) Stahl, A. E. & Feigenson, L. (2015) Observing the unexpected enhances infants’
OverFeat: Integrated recognition, localization and detection using convolu- learning and exploration. Science 348(6230):91–94. [aBML]
tional networks. Presented at the International Conference on Learning Rep- Stanley, K. O. & Miikkulainen, R. (2002) Evolving neural networks through aug-
resentations (ICLR), Banff, Canada, April 14–16, 2014. arXiv preprint menting topologies. Evolutionary Computation 10(2):99–127. [rBML]
1312.6229v4. Available at: https://arxiv.org/abs/1312.6229. [aBML] Stanovich, K. E. (2009) What intelligence tests miss: The psychology of rational
Shadmehr, R. & Krakauer, J. W. (2008) A computational neuroanatomy for motor thought. Yale University Press. [RJS]
control. Experimental Brain Research 185(3):359–81. [GB] Sterelny, K. (2012) The evolved apprentice. MIT Press. [DCD]
Shafto, P., Goodman, N. D. & Frank, M. C. (2012) Learning from others: The Sterelny, K. (2013) The informational commonwealth. In: Arguing about human
consequences of psychological reasoning for human learning. Perspectives on nature: Contemporary debates, ed. L S. M. Downes & E. Machery, pp. 274–88.
Psychological Science 7(4):341–51. [MHT] Routledge, Taylor & Francis. [DCD]
Shafto, P., Goodman, N. D. & Griffiths, T. L. (2014) A rational account of peda- Sternberg, R. J. (1997) What does it mean to be smart? Educational Leadership 54
gogical reasoning: Teaching by, and learning from, examples. Cognitive Psy- (6):20–24. [RJS]
chology 71:55–89. [aBML] Sternberg, R. J., ed. (2002) Why smart people can be so stupid. Yale University Press.
Shahaeian, A., Peterson, C. C., Slaughter, V. & Wellman, H. M. (2011) Culture and [RJS]
the sequence of steps in theory of mind development. Developmental Psy- Sternberg, R. J. & Davidson, J. E. (1995) The nature of insight. MIT Press. [aBML]
chology 47(5):1239–47. [JMC] Sternberg, R. J. & Jordan, J., eds. (2005) Handbook of wisdom: Psychological per-
Shallice, T. & Cooper, R. P. (2011) The organisation of mind. Oxford University spectives. Cambridge University Press. [RJS]
Press. [RPK] Stuhlmüller, A., Taylor, J. & Goodman, N. D. (2013) Learning stochastic inverses.
Shultz, T. R. (2003) Computational developmental psychology. MIT Press. [aBML] Presented at the 2013 Neural Information Processing Systems conference,
Siegler, R. (1976) Three aspects of cognitive development. Cognitive Psychology 8 Lake Tahoe, NV, December 5–10, 2013. In: Advances in Neural Information
(4):481–520. [ED] Processing Systems 26 (NIPS 2013), ed. C J. C. Burges, L. Bottou, M. Welling,
Siegler, R. S. & Chen, Z. (1998) Developmental differences in rule learning: A Z. Ghagramani & K. Q. Weinberger, pp. 3048–56. Neural Information Pro-
microgenetic analysis. Cognitive Psychology 36(3):273–310. [aBML] cessing Systems Foundation. [aBML]

70 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
Sukhbaatar, S., Szlam, A., Weston, J. & Fergus, R. (2015) End-to-end memory Tsividis, P., Tenenbaum, J. B. & Schulz, L. E. (2015) Constraints on hypothesis
networks. Presented at the 2015 Neural Information Processing Systems con- selection in causal learning. Proceedings of the 37th Annual Conference of the
ference, Montreal, QC, Canada, December 7–12, 2015. In: Advances in neural Cognitive Sciences, Pasadena, CA, July 23–25, 2015, pp. 2434–439. Cognitive
information processing systems 28 (NIPS 2015), ed. C. Cortes, N. D. Lawrence, Science Society. [aBML]
D. D. Lee, M. Sugiyama & R. Garnett [oral presentation]. Neural Information Tsividis, P. A., Pouncy, T., Xu, J. L., Tenenbaum, J. B. & Gershman, S. J. (2017)
Processing Systems Foundation. [arBML] Human learning in Atari. In: Proceedings of the Association for the Advance-
Sun, R. (2016) Anatomy of the mind. Oxford University Press. [CDG] ment of Artificial Intelligence (AAAI) Spring Symposium on Science of Intelli-
Super, C. M. & Harkness, S. (2002) Culture structures the environment for devel- gence: Computational Principles of Natural and Artificial Intelligence, Stanford
opment. Human Development 45(4):270–74. [JMC] University, Palo Alto, CA, March 25–27, 2017. AAAI Press. [MHT, rBML]
Sutton, R. S. (1990) Integrated architectures for learning, planning, and reacting Turing, A. M. (1950) Computing machine and intelligence. Mind 59:433–60.
based on approximating dynamic programming. In: Proceedings of the 7th Available at: http://mind.oxfordjournals.org/content/LIX/236/433. [aBML]
International Workshop on Machine Learning (ICML), Austin, TX, pp. 216–24. Turovsky, B. (2016) Found in translation: More accurate, fluent sentences in Google
International Machine Learning Society. [aBML] Translate. Available at: https://blog.google/products/translate/found-translation-
Svedholm, A. M. & Lindeman, M. (2013) Healing, mental energy in the physics more-accurate-fluent-sentences-google-translate/. [DEM]
classroom: Energy conceptions and trust in complementary and alternative Tversky, B. & Hemenway, K. (1984) Objects, parts, and categories. Journal of
medicine in grade 10–12 students. Science & Education 22(3):677–94. [EJL] Experimental Psychology: General 113(2):169–91. [aBML]
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Ullman, S., Harari, D. & Dorfman, N. (2012a) From simple innate biases to complex
Vanhoucke, V. & Rabinovich, A. (2014) Going deeper with convolutions. In: visual concepts. Proceedings of the National Academy of Sciences of the United
2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, States of America 109(44):18215–20. [aBML]
MA, June 7–12, 2015, pp. 1–9. IEEE. [aBML] Ullman, T. D., Baker, C. L., Macindoe, O., Evans, O., Goodman, N. D. & Tenen-
Tan, L. H., Spinks, J. A., Eden, G. F., Perfetti, C. A. & Siok, W. T. (2005) Reading baum, J. B. (2009). Help or hinder: Bayesian models of social goal inference.
depends on writing, in Chinese. Proceedings of the National Academy of Sci- Presented at the 2009 Annual Conference on Neural Information Systems
ences of the United States of America 102(24):8781–85. [ED] Processing, Vancouver, BC, Canada, December 7–10, 2009. In: Advances in
Tauber, S. & Steyvers, M. (2011) Using inverse planning and theory of mind for Neural Information Processing Systems 22 (NIPS 2009), ed. Y. Bengio,
social goal inference. In: Proceedings of the 33rd Annual Conference of the D. Schuumans, J. D. Lafferty, C. K. I. Williams & A. Culotta. Neural
Cognitive Science Society, Boston, MA, July 20–23, 2011, pp. 2480–85. Cog- Information Processing Systems Foundation. [rBML]
nitive Science Society. [aBML] Ullman, T. D., Goodman, N. D. & Tenenbaum, J. B. (2012b) Theory learning as
Taylor, E. G. & Ahn, W.-K. (2012) Causal imprinting in causal structure learning. stochastic search in the language of thought. Cognitive Development 27(4):455–
Cognitive Psychology 65:381–413. [EJL] 80. [aBML]
Téglás, E., Vul, E., Girotto, V., Gonzalez, M., Tenenbaum, J. B. & Bonatti, L. L. U.S. Postal Service Historian (2016) Pieces of mail handled, number of post offices,
(2011) Pure reasoning in 12-month-old infants as probabilistic inference. income, and expenses since 1789. Available at: https://about.usps.com/who-we-
Science 332(6033):1054–59. [aBML] are/postal-history/pieces-of-mail-since-1789.htm. [DG]
Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. (2011) How to grow van den Hengel, A., Russell, C., Dick, A., Bastian, J., Pooley, D., Fleming, L. &
a mind: Statistics, structure, and abstraction. Science 331(6022):1279–85. [aBML] Agapitol, L. (2015) Part-based modelling of compound scenes from images. In:
Thomaz, A. L. & Cakmak, M. (2013) Active social learning in humans and robots. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Social learning theory: Phylogenetic considerations across animal, plant, and Boston, MA, June 7–12, 2015, pp. 878–86. IEEE. [aBML]
microbial taxa, ed. K. B. Clark, pp. 113–28. Nova Science. ISBN 978-1-62618- van den Oord, A., Kalchbrenner, N. & Kavukcuoglu, K. (2016). Pixel recurrent
268-4. [KBC] neural networks. Presented at the 33rd International Conference on Machine
Thorwart, A. & Livesey E. J. (2016) Three ways that non-associative knowledge may Learning, New York, NY. Proceedings of Machine Learning Research 48:1747–
affect associative learning processes. Frontiers in Psychology 7:2024. doi: 56. [MB]
10.3389/fpsyg.2016.02024. [EJL] van Hasselt, H., Guez, A. & Silver, D. (2016) Deep learning with double Q-learning.
Thurstone, L. L. (1919) The learning curve equation. Psychological Monographs 26 In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence and
(3):2–33. [LRC] the Twenty-Eighth Innovative Applications of Artificial Intelligence Conference
Tian, Y. & Zhu, Y. (2016) Better computer Go player with neural network and long- on Artificial Intelligence, Phoenix, AZ. AAAI Press. [aBML]
term prediction. Presented at the International Conference on Learning Rep- Varlet, M., Marin, L., Capdevielle, D., Del-Monte, J., Schmidt, R.C., Salesse, R.N.,
resentations (ICLR), San Juan, Puerto Rico, May 2–4, 2016. arXiv preprint Boulenger, J.-P., Bardy, B. & Raffard, S. (2014). Difficulty leading interpersonal
1511.06410. Available at: https://arxiv.org/abs/1511.06410. [aBML] coordination: Towards an embodied signature of social anxiety disorder. Fron-
Todd, P. M. & Gigerenzer, G. (2007) Environments that make us smart: Ecological tiers in Behavioral Neuroscience 8:1–9. [LM]
rationality. Current Directions in Psychological Science 16(3):167–71. Vinyals, O., Blundell, C., Lillicrap, T. & Wierstra, D. (2016) Matching networks for
doi:10.1111/j.1467-8721.2007.00497.x. [PMP] one shot learning. Vinyals, O., Blundell, C., Lillicrap, T. Kavukcuoglu, K. &
Tomai, E. & Forbus, K. (2008) Using qualitative reasoning for the attribution of Wierstra, D. (2016). Matching networks for one shot learning. Presented at the
moral responsibility. In: Proceedings of the 30th Annual Conference of the 2016 Neural Information Processing Systems conference, Barcelona, Spain,
Cognitive Science Society, Washington, DC, July 23–26, 2008. Cognitive December 5–10, 2016. In: Advances in Neural Information Processing Systems
Science Society. [KDF] 29 (NIPS 2016), ed. D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon &
Tomasello, M. (1999) The cultural origins of human cognition. Harvard University R. Garnett, pp. 3630–38. Neural Information Processing Systems Foundation.
Press. [MHT] [arBML, MB, SSH]
Tomasello, M. (2010) Origins of human communication. MIT Press. [aBML] Vinyals, O., Toshev, A., Bengio, S. & Erhan, D. (2014) Show and tell: A neural image
Tompson, J. J., Jain, A., LeCun, Y. & Bregler, C. (2014). Joint training of a convo- caption generator. arXiv preprint 1411.4555. Available at: https://arxiv.org/abs/
lutional network and a graphical model for human pose estimation. Presented at 1411.4555. [aBML]
the 28th Annual Conference on Neural Information Processing Systems, Viviani, P. & Stucchi, N. (1992). Biological movements look uniform: evidence of
Montreal, Canada. In: Advances in Neural Information Processing Systems 27 motor-perceptual interactions. Journal of Experimental Psychology: Human,
(NIPS 2014), ed. Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence & Perception & Performance 18(3):603–23. [LM]
K. Q. Weinberger, pp. 1799–807. Neural Information Processing Systems Vollmer, A-L., Mühlig, M., Steil, J. J., Pitsch, K., Fritsch, J., Rohlfing, K. & Wrede, B.
Foundation. [rBML] (2014) Robots show us how to teach them: Feedback from robots shapes
Torralba, A., Murphy, K. P. & Freeman, W. T. (2007) Sharing visual features for tutoring behavior during action learning, PLoS One 9(3):e91349. [P-YO]
multiclass and multiview object detection. IEEE Transactions on Pattern Vul, E., Goodman, N., Griffiths, T. L. & Tenenbaum, J. B. (2014) One and done? Optimal
Analysis and Machine Intelligence 29(5):854–69. [aBML] decisions from very few samples. Cognitive Science 38(4):599–637. [aBML]
Toshev, A. & Szegedy, C. (2014). Deeppose: Human pose estimation via deep neural Vygotsky, L. S. (1978) Interaction between learning and development. In: Mind in
networks. In: Proceedings of the IEEE Conference on Computer Vision and society: The development of higher psychological processes, ed. M. Cole, V. John-
Pattern Recognition, Columbus, OH, pp. 1653–60. IEEE. [rBML] Steiner, S. Scribner & E. Souberman, pp. 79–91. Harvard University Press. [RPK]
Tremoulet, P. D. & Feldman, J. (2000) Perception of animacy from the motion of a Wallach, W., Franklin, S. & Allen C. (2010) A conceptual and computational model
single object. Perception 29:943–51. [aBML] of moral decision making in human and artificial agents. Topics in Cognitive
Trettenbrein, P. C. (2016) The demise of the synapse as the locus of memory: A Science 2:454–85. [KBC]
looming paradigm shift? Frontiers in Systems Neuroscience 10:88. [DG] Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R.,
Tsividis, P., Gershman, S. J., Tenenbaum, J. B. & Schulz, L. (2013) Information Blundell, C., Kumaran, D. & Botvinick, M. (2017). Learning to reinforcement
selection in noisy environments with large action spaces. In: Proceedings of the learn. In: Presented at the 39th Annual Meeting of the Cognitive Science
36th Annual Conference of the Cognitive Science Society, Austin, TX, pp. Society, London, July 26–29, 2017. arXiv preprint 1611.05763. Available at:
1622–27. Cognitive Science Society. [aBML] https://arxiv.org/abs/1611.05763. [MB]

Downloaded from https://www.cambridge.org/core. NYU Medical Center: Ehrman Medical Library, on 12 Nov 2017 atBEHAVIORAL AND
16:39:07, subject BRAIN
to the SCIENCES,
Cambridge 40 (2017)
Core terms of use, available at 71
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837
References/Lake et al.: Building machines that learn and think like people
Wang, Z., Schaul, T., Hessel, M., Hasselt, H. van, Lanctot, M. & de Freitas, N. Wolfram, S. (2002) A new kind of science. Wolfram Media. ISBN 1-57955-008-8.
(2016) Dueling network architectures for deep reinforcement learning. arXiv [KBC]
preprint 1511.06581. Available at: http://arxiv.org/abs/1511.06581. [aBML] Wolpert, D. M., Miall, R. C. & Kawato, M. (1998) Internal models in the cerebellum.
Ward, T. B. (1994) Structured imagination: The role of category structure in Trends in Cognitive Science 2(9):338–47. [GB]
exemplar generation. Cognitive Psychology 27:1–40. [aBML] Xu, F. & Tenenbaum, J. B. (2007) Word learning as Bayesian inference. Psycho-
Watkins, C. J. & Dayan, P. (1992) Q-learning. Machine Learning 8:279–92. [aBML] logical Review 114(2):245–72. [aBML]
Weigmann, K. (2006) Robots emulating children. EMBO Reports 7(5):474–76. [KBC] Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R. &
Weizenbaum, J. (1966) ELIZA—A computer program for the study of natural lan- Bengio, Y. (2015) Show, attend and tell: Neural image caption generation
guage communication between man and machine. Communications of the ACM with visual attention. Presented at the 2015 International Conference on
9(1):36–45. [RJS] Machine Learning. Proceedings of Machine Learning Research 37:2048–57.
Wellman, H. M. & Gelman, S. A. (1992) Cognitive development: Foundational the- [arBML]
ories of core domains. Annual Review of Psychology 43:337–75. [arBML] Yamada, Y., Mori, H. & Kuniyoshi, Y. (2010) A fetus and infant developmental
Wellman, H. M. & Gelman, S. A. (1998). Knowledge acquisition in foundational scenario: Self-organization of goal-directed behaviors based on sensory con-
domains. In: Handbook of child psychology: Vol. 2. Cognition, perception, and straints. In: Proceedings of the 10th International Conference on Epigenetic
language development, 5th ed., series ed. W. Damon, vol. ed. D. Kuhn & R. S. Robotics, Őrenäs Slott, Sweden, pp. 145–52. Lund University Cognitive Studies.
Siegler, pp. 523–73. Wiley. [arBML] [P-YO]
Weng, J., McClelland, J., Pentland, A., Sporns, O., Stockman, I., Sur, M. & Thelen, Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D. & DiCarlo, J.
E. (2001) Autonomous mental development by robots and animals. Science 291 J. (2014) Performance-optimized hierarchical models predict neural responses
(5504):599–600. [GB] in higher visual cortex. Proceedings of the National Academy of Sciences of the
Wermter, S., Palm, G., Weber, C. & Elshaw, M. (2005) Towards biomimetic neural United States of America 111(23):8619–24. [aBML, NK]
learning for intelligent robots. In: Biomimetic neural learning for intelligent Yildirim, I., Kulkarni, T. D., Freiwald, W. A. & Tenenbaum, J. (2015) Efficient
robots, ed. S. Wermter, G. Palm & M. Elshaw, pp. 1–18. Springer. [SW] analysis-by-synthesis in vision: A computational framework, behavioral tests,
Weston, J., Bordes, A., Chopra, S., Rush, A. M., van Merriënboer, B., Joulin, A. & and comparison with neural representations. In: Proceedings of the 37th Annual
Mikolov, T. (2015a) Towards AI-complete question answering: A set of pre- Conference of the Cognitive Science Society, Pasadena, CA, July 22–25, 2015.
requisite toy tasks. arXiv preprint 1502.05698. Available at: https://arxiv.org/pdf/ Cognitive Science Society. Available at: https://mindmodeling.org/cogsci2015/
1502.05698.pdf. [SSH] papers/0471/index.html. [aBML, NK]
Weston, J., Chopra, S. & Bordes, A. (2015b) Memory networks. Presented at the Inter- Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. (2014) How transferable are features
national Conference on Learning Representations, San Diego, CA, May 7–9, 2015. in deep neural networks? Presented at the 2014 Neural Information Processing
arXiv:1410.3916. Available at: https://arxiv.org/abs/1410.3916. [arBML] Systems conference, Montreal, QC, Canada. In: Advances in neural information
Williams, J. J. & Lombrozo, T. (2010) The role of explanation in discovery and processing systems 27 (NIPS 2014), ed. Z. Ghahramani, M. Welling, C. Cortes,
generalization: Evidence from category learning. Cognitive Science 34(5):776– N. D. Lawrence & K. Q. Weinberger [oral presentation]. Neural Information
806. [aBML] Processing Systems Foundation. [aBML]
Wills, T. J., Cacucci, F., Burgess, N. & O’Keefe, J. (2010). Development of the Youyou, W., Kosinski, M. & Stillwell, D. (2015) Computer-based personality judg-
hippocampal cognitive map in preweanling rats. Science 328(5985):1573–76. ments are more accurate than those made by humans. Proceedings of the
[ESS] National Academy of Sciences of the United States of America 112(4):1036–40.
Winograd, T. (1972) Understanding natural language. Cognitive Psychology 3:1–191. [KBC]
[aBML, RJS] Zeiler, M. D. & Fergus, R. (2014) Visualizing and understanding convolutional
Winston, P. H. (1975) Learning structural descriptions from examples. In: The networks. In: Computer Vision—ECCV 2014: 13th European Conference,
psychology of computer vision, pp.157–210. McGraw-Hill. [aBML] Zurich, Switzerland, September 6–12, 2014, Proceedings, Part I, ed. D. Fleet,
Wiskott, L. (2006). How does our visual system achieve shift and size invariance? In: T. Pajdla, B. Schiele & T. Tuytelaars, pp. 818–33. Springer. [aBML]
23 Problems in systems neuroscience, ed. J. L. Van Hemmen & T. J. Sejnowski, Zentall, T. R. (2013) Observational learning in animals. In: Social learning theory:
pp. 322–40. Oxford University Press. [DG] Phylogenetic considerations across animal, plant, and microbial taxa, ed.
Wiskott, L. & von der Malsburg, C. (1996) Face recognition by dynamic link K. B. Clark, pp. 3–33. Nova Science. ISBN 978-1-62618-268-4. [KBC]
matching. In: Lateral interactions in the cortex: structure and function, ed. Zhou, H., Friedman, H. S. & von der Heydt, R. (2000) Coding of border own-
J. Sirosh, R. Miikkulainen and Y. Choe, ch 11. The UTCS Neural Networks ership in monkey visual cortex. The Journal of Neuroscience 20:6594–611.
Research Group. [DG] [DGe]

72 BEHAVIORAL ANDNYU
Downloaded from https://www.cambridge.org/core. BRAIN SCIENCES,
Medical 40 (2017)
Center: Ehrman Medical Library, on 12 Nov 2017 at 16:39:07, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0140525X16001837

You might also like