0% found this document useful (0 votes)
6 views

Unsupervised Lifelong Learning for Robots

This document discusses a method for unsupervised lifelong learning in underwater exploration robots using a hierarchical Bayesian model and real-time Gibbs sampling. The approach enables robots to automatically characterize their environment and adaptively plan exploration paths without prior knowledge. Preliminary experiments demonstrate the system's capability to identify various substrate types in real-time from image data collected during underwater missions.

Uploaded by

naadkd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Unsupervised Lifelong Learning for Robots

This document discusses a method for unsupervised lifelong learning in underwater exploration robots using a hierarchical Bayesian model and real-time Gibbs sampling. The approach enables robots to automatically characterize their environment and adaptively plan exploration paths without prior knowledge. Preliminary experiments demonstrate the system's capability to identify various substrate types in real-time from image data collected during underwater missions.

Uploaded by

naadkd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Unsupervised Lifelong Learning for a Curious

Underwater Exploration Robot

Yogesh Girdhar Hanumant Singh


Applied Ocean Physics and Engineering Electrical and Computer Engineering
Woods Hole Oceanographic Institution Northeastern University
Woods Hole. MA Boston. MA
[email protected] [email protected]

I. I NTRODUCTION II. BAYESIAN N ONPARAMETRIC (BNP) S CENE


M ODELING
The challenges in exploration of remote and extreme envi- Given a sequence of images or other observations, we
ronments such as distant planets and the deep seas have much extract discrete features w from these observations, each of
in common. It is very expensive, and inherently dangerous for which has corresponding spatial and temporal coordinates
humans to explore such locations directly, and hence the use (x, t). In case of a simple 2D video the spatial coordinates
of robots is desirable. However, due to communication bot- would just correspond to the pixel coordinates, however in
tlenecks, direct control of robots in such situations is usually presence of 3D data, the spatial coordinates can be 3D.
not possible. Underwater, saltwater prohibits high speed RF
communications, and usually the only communication option We model the likelihood of the observed data in terms of
is extremely low speed acoustic communication. Hence there is the latent topic label variables z:
a need for automatic scene characterization algorithms, which X
can be used to identify relevant locations in the world, and P(w|x, t) = P(w|z = k)P(z = k|x, t). (1)
adaptively plan the exploration path of the robot. We would k∈Kactive
like to deploy these robots in unknown environments so it is
also essential that minimal prior knowledge about the envi- Here the distribution Φ = P(w|z = k), models the
ronment is assumed, and that the learned model automatically appearance of the topic label k, as is shared across all
scales with the complexity of the observed data. spatiotemporal locations. The second part of the equation
Θ = P(z = k|x, t) models the distribution of labels in the
Making decisions based on the environmental context of a spatiotemporal neighborhood of location (x, t). We say that a
robot’s locations requires that we first model the context of the label is active if there is at least one observation which has
robot observations, which in turn might correspond to various been assigned this label. The set of all active labels is Kactive .
semantic or conceptually higher level entities that compose the Let wi = v, be the ith observation word with spatial
world. If we are given an observation model of these entities coordinates xi , and time ti , where i ∈ [1, N ), and the
that compose the world then it is easy to describe a given observation v is discrete and takes an integer value between
scene in terms of these entities using this model; likewise, if [0, V ). Each observation wi is described by latent label variable
we are given a labeling of the world in terms of these entities, zi = k, where k again is an integer.
then it is easy to compute the observation model for each
individual entity. The challenge comes from doing these two
tasks together, unsupervised, and with no prior information. We nv,k + β
propose a hierarchical spatiotemporal Bayesian model, and a P(wi = v|zi = k) = . (2)
N +Vβ−1
realtime Gibbs sampler to solve this problem of assigning high
level labels to low level streaming observations.
Here nv,k is the number of times an observation of type
v has been assigned label k thus far (excluding the ith
This work differs in spirit from lighting invariant local- observation), N is the total number of observations, V is the
ization and mapping efforts such as work by McManus et al. vocabulary size of the observations, and β is the Dirichlet pa-
[5], and Ranganathan et al. [7], where the goal is to learn rameter for controlling the sparsity of the P(w|z) distribution.
location landmarks, rather than a semantic scene descriptor. A lower value of β encourages sparser P(w|z) with peaks on a
Our work is perhaps most related to work by Steinberg et al. smaller number of vocabulary words. This encourages topics to
[9] on characterizing benthic seafloor types using hierarchical describe more specific phenomena, and hence requiring more
Bayesian models, however the focus of our work is on learning topics in general to describe the data. A larger value of β on the
these scene models online, and using them to plan robot path other hand would encourage denser distributions, encouraging
in realtime. The hierarchical Bayesian scene model discussed a topic to describe more general phenomena in the scene.
in this paper was presented in [3], however in this paper we
discuss the realtime Gibbs sampler used for its inference, and In this work we assume that the set of all distinct ob-
demonstrate its use for characterizing seafloor substrate type. servation words is known, and the set has size V , however,
Initialize ∀i, zi ∼ Uniform({1, . . . , K}) while true do
while true do Add new observed words to their corresponding
foreach cell c ∈ C do cells.
foreach word wi ∈ c do T ← 0 (current time)
zi ∼ P(zi = k|wi = v, xi ) Initialize ∀i ∈ MT , zi ∼ Uniform({1, . . . , K})
Update Θ, Φ given the new zi by updating while no new observation do
nvk and nkG t ∼ P(t|T )
end foreach cell c ∈ Mt do
end foreach word wi ∈ c do
end zi ∼ P(zi = k|wi = v, xi )
Algorithm 1: Batch Gibbs sampling Update Θ, Φ given the new zi by
updating nvk and nkG
end
end
the number of labels used to describe the data K is inferred
end
automatically from the data. Through the use of Bayesian
T ←T +1
nonparametric techniques such as Chinese Restaurant Process
end
(CRP), it is possible to model, in a principled way, how new
categories are formed[11], [10]. Using CRP, we model whether Algorithm 2: Realtime Gibbs sampler
a word is best explained via an existing label, or by a new,
previously unseen label; allowing us to build models that can
grow automatically with the growth in the size and complexity We discuss the choice of P(t|T ) in the following sections.
of the data.
A. Now Gibbs Sampling
 nk,g +α The simplest way of processing streaming observation data
i
 C(i,k)
 k ∈ Kactive to ensure that the topic labels from the last observation have
γ
P(zi = k|z1 , . . . , zN ) = C(i,k) k = knew (3) converged is to only refine topics from the last observation till
the next observation has arrived.

0 otherwise.


1, if t = T
Here nk,gi is the total number of observations in the P(t|T ) = (4)
0, otherwise
spatiotemporal neighborhood of the ith observation, excluding
itself; dirichlet prior α controls the sparsity of the scene’s topic We call this the Now Gibbs sampler. This is analogous to o-
distribution; CRP parameter γ controls
PN the growth of the num- LDA approach by Banerjee and Basu [1]. This approach only
ber of topic labels; and C(i, k) = i (1[nk >0] nk,gi +α)+γ−1 requires keeping the last observation in memory, and hence
is the normalizing constant. has constant memory footprint.

III. G IBBS S AMPLING FOR L IFELONG L EARNING B. Uniform Gibbs Sampling


Given a word observation wi , its location xi , and its A conceptually opposite strategy is to uniform randomly
neighborhood Gi = G(c(xi )), we use a Gibbs sampler to pick an observation from all the observations thus far, and
assign a new topic label to the word, by sampling from the refine the topic labels for all the words in this observation.
posterior topic distribution P(zi = k|wi = v, xi ). Algorithm 1
shows a simple iterative technique to compute the topic labels P(t|T ) = 1/T (5)
for the observed words in batch mode.
This is analogous to the incremental Gibbs sampler for
In the context of robotics we are interested in the online LDA proposed by Canini et al.[2]. This approach requires
refinement of observation data. After each new observation, linear memory footprint, however for lifelong data stream, it
we only have a constant amount of time to do topic label can be approximated to keep only a constant number of most
refinement. Hence, any online refinement algorithm that has recent observations that is permitted by the system memory.
computational complexity which increases with new data, is
not useful. Moreover, if we are to use the topic labels of an C. Mixed Gibbs Sampling
incoming observation for making realtime decisions, then it is
essential that the topic labels for the last observation converge We expect both Now Gibbs samplers to be good at ensuring
before the next observation arrives. Since the total amount the topic labels for the last observation converges quickly (to a
of data collected grows linearly with time, we must use a locally optimal solution), before the next observation arrives,
refinement strategy that efficiently handles global (previously whereas Uniform Gibbs sampler is better at finding globally
observed) data and local (recently observed) data. Our general optimal results. One way to balance both these performance
strategy is described by Algorithm 2. At each time step we goals is to combine these global and a local strategies.
add the new observations to the model, and then randomly
pick observation times t ∼ P(t|T ), where T is the current 
time, for which we resample the topic labels and update the η, if t = T
P(t|T ) = (6)
topic model. (1 − η)/(T − 1), otherwise
Fig. 1. Automatic unsupervised characterization of substrate types using image data collected in a transect by an AUV off the west coast of Panama. The
plot shows the distribution of topics (substrate types) at each timestep. Different colors in the timeline correspond to different topic labels. We see four different
substrate types being represented distinctly by the topic model, without any prior training. These results were computed by realtime processing of the data.

Here 0 ≤ η ≤ 1 is the mixing proportion between the local information gain, which can be measure by computing the
and the global strategies. perplexity score of the observations in the context of the data
observed thus far. We assume that robot can make observations
IV. E XPERIMENT at potential next steps in the exploration path. The problem of
getting stuck in a local optima can be addressed by adding a
We simulated our proposed topic model to run in realtime repulsive potential from previously visited locations [4]. We
at 5 frames per second, on an intel core i7 CPU. The original hypothesize that such exploration can be used for collecting
data was collected at 0.2 frames per second. We used a param- observation data that can be used to study patchy and transient
eters values α = 0.1, β = 10.0, γ = 1e − 6, η = 0.5 for our phenomena, such as animal mating aggregations underwater,
experiment.Our preliminary experiments have suggested that or feeding events, which are otherwise hard to study using data
topic modeling based techniques can be used to characterize collected over pre-determined trajectories.
previously unknown benthic scenes. The dataset contains over
2000 images. Results shown in Fig. 1 demonstrate that we
can characterize various the substrate types in a completely VI. C ONCLUSION
unsupervised manner and in realtime. These results are from We have presented here a realtime technique for auto-
the data collected by SeaBED vehicle [8], on an expedition to matically characterizing the scene as observed by a robot.
explore Hannibal seamount in Panama [6]. We have presented some preliminary results hinting towards
enabling the ability of the marine robot to continuously learn
V. D ISCUSSION
various benthic substrate types in-situ. We believe that such
The proposed life long unsupervised learning approach can capability can allow future exploration robots to adaptively
be utilized for lifelong exploration of many different types. target their data collection towards the more interesting regions
in the world that are represented by locations with high model
Having the ability to characterize the substrate type can perplexity, or rare topic labels, such as demonstrated by our
be used for substrate specific data collection. The exploration prior work [4]. The unsupervised and realtime nature of the
mission an be defined to either target a specific substrate type approach also makes it an ideal tool for post-mission analysis
of interest, or to to spread the data collection efforts equally of large observation datasaets.
amongst various substrate types.
The proposed topic modeling can also be used to perform ACKNOWLEDGMENT
exploration that aims to maximize information gain in semantic
space [4]. At each step in time, we can evaluate the explo- This work was supported by The Investment in Science
ration utility of the neighboring locations in terms of their Fund at WHOI.
R EFERENCES Y. Girdhar, R. C. Holleman, J. Churchill, H. Singh, and D. K. Ralston. A
crab swarm at an ecological hotspot: patchiness and population density
[1] A. Banerjee and S. Basu. Topic Models over Text Streams: A Study from AUV observations at a coastal, tropical seamount. PeerJ, 4:e1770,
of Batch and Online Unsupervised Learning. In SIAM International apr 2016.
Conference on Data Mining, page 6, 2007.
[7] A. Ranganathan, S. Matsumoto, and D. Ilstrup. Towards illumination
[2] K. R. Canini, L. Shi, and T. L. Griffiths. Online Inference of Topics
invariance for visual localization. In 2013 IEEE International Confer-
with Latent Dirichlet Allocation. Proceedings of the International
ence on Robotics and Automation, pages 3791–3798. IEEE, may 2013.
Conference on Artificial Intelligence and Statistics, 5(1999):65–72,
2009. [8] H. Singh, C. Roman, O. Pizarro, R. Eustice, and A. Can. Towards
[3] Y. Girdhar, W. Cho, M. Campbell, J. Pineda, E. Clarke, and H. Singh. High-resolution Imaging from Underwater Vehicles. The International
Anomaly Detection in Unstructured Environments using Bayesian Non- Journal of Robotics Research, 26(1):55–74, jan 2007.
parametric Scene Modeling. In IEEE International Conference on [9] D. M. Steinberg, O. Pizarro, and S. B. Williams. Hierarchical Bayesian
Robotics and Automation (ICRA), 2016. models for unsupervised scene understanding. Computer Vision and
[4] Y. Girdhar and G. Dudek. Modeling curiosity in a mobile robot for long- Image Understanding, 131:128–144, feb 2015.
term autonomous exploration and monitoring. Autonomous Robots, sep [10] Y. W. Teh and M. I. Jordan. Hierarchical Bayesian Nonparametric
2015. Models with Applications. Bayesian nonparametrics, pages 158—-207,
[5] C. McManus, B. Upcroft, and P. Newman. Learning place-dependant 2010.
features for long-term vision-based localisation. Autonomous Robots, [11] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical
39(3):363–387, oct 2015. Dirichlet Processes. Journal of the American Statistical Association,
[6] J. Pineda, W. Cho, V. Starczak, A. F. Govindarajan, H. M. Guzman, 101(476):1566–1581, dec 2006.

You might also like