Ego Motion Estimation Paper
Ego Motion Estimation Paper
Josep Tornero
Fast Ego-motion
Dept. of Control Systems Engineering,
Technical University of Valencia
Estimation with
Camino de Vera, s/n 46022, Valencia, Spain Multi-rate Fusion of
Inertial and Vision1
[email protected], [email protected]
Markus Vincze
Automation and Control Institute
Vienna University of Technology
Gusshausstr. 27.29/361 A-1040, Vienna, Austria
[email protected]
The International Journal of Robotics Research 1 This work has been supported by the Spanish Goverment (Ministerio de
Vol. 26, No. 6, June 2007, pp. 577–589 Ciencia y Tecnologia), research project PDI2000-0362-P4-05 and BIA2005-
DOI: 10.1177/0278364907079283 09377-C03-02 the Austrian Science Foundation (FWF) under grant P15748.
2
c 2007 SAGE Publications The paper was received on 10/04/2006, revised on 23/01/2007 and accepted
Figures 1, 4, 6–9, 11, 12 appear in color online: http://ijr.sagepub.com on 03/03/2007.
577
578 THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / June 2007
approach may decrease the overall system performance since tracking system is shown in 1(b). In addition, this tracking sys-
high frequency dynamics are missed due to the temporal dis- tem will accurately estimate the pose of the end-effector of a
cretization as a consequence of the Nyquist sampling con- manipulator, used for restoration tasks of old-building facades.
straint. This robot will be mounted on a mobile base and a lifting struc-
Multi-rate systems have been extensively treated in the last ture as shown in Figure 1(c). The purpose of this mechanical
four decades and it is possible to find many contributions deal- system is to extend the work area of the manipulator, although
ing with modelling (Albertos 19901 Khargonekar et al. 19851 it has the disadvantage of suffering from perturbations and/or
Kranc 19571 Tornero 1985) and analysis (Goodwin and Feuer oscillations.
1992) as well as control design (Tornero et al. 1999) of pe- This paper presents a precise model description for pose
riodic sampled-data systems. One of the most relevant mod- estimation where jerks and angular accelerations are treated as
elling techniques is the Lifting Technique (Khargonekar et al. noise, including the effects of centripetal accelerations. With
1985), where an isomorphism between a linear periodic system respect to our previous work (Armesto et al. 2004), this pa-
and an enlarged linear-time invariant (LTI) system is defined per is focused on the study of estimation of different tracking
via the lifting operator. Another interesting point of view for velocities in rotational and translational movements.
modelling multi-rate systems is the one provided by Tornero As a main contribution, fusion of vision and inertial mea-
(1985) and Longhi (1994), where two periodic matrices relate surements is performed with a generic multi-rate EKF (MR-
inputs and outputs according to the multi-rate sampling pat- EKF) and multi-rate UKF (MR-UKF), in order to deal with
tern. The main advantage of this approach with respect to the data at different sampling rates. This approach improves
lifting technique is that it is not restricted to linear systems and the overall performance with respect to single-rate methods.
it is implemented at the fastest sampling rate, and therefore it Moreover, this fusion concept is valid for other sensors such
is much more appropriate for “real-time” systems. as laser rangers and encoders in mobile robot localisation
This paper investigates a new, generic approach to multi- and map building (Armesto and Tornero 20041 Armesto et al.
rate tracking combining vision and inertial sensor data. The 2007).
fusion of vision and inertial measurements provides comple- In the paper, the influence of uncertainties associated to
mentary characteristics: visual sensing is very accurate at low both type of sensors has been studied. Results show that the
velocities while inertial sensors can track fast motions but suf- combination of vision and inertial data gives better estimation
fer from drift, particularly at low velocities. The set-up for this in a wide variety of situations (slow and fast motions).
application is an end-effector mounted camera together with Results have been obtained offline using Matlab, although
an inertial sensor based on accelerometers and gyroscopes. real-time results have also been obtained on an implemented
One of the motivations of this tracking system is to estimate version in Labview. Data, Matlab code and Videos can be
arbitrary motions of mobile robots or people. The application found in 2 at the Appendix.
is to use a monocular-camera vision system and an inertial
measurement unit (IMU) mounted on a pan–tilt unit (PTU), as
shown in Figure 1(a). In Figure 1(a), the system diagram of the
Armesto, Tornero, and Vincze / Fast Ego-motion Estimation with Multi-rate Fusion of Inertial and Vision 579
Type Function
1n 23n 4
t6t j 6q
5 6
Interpolation (Lagrange) 3 4
u1t2 q40 t j6l 6t j 6q
u1t j6l 2
l40 q54l
1n 2 7 8n6l 7 8l 6
t6t j t6t j
Approximation (Bezier) 3 4
u1t2 n!
l!1n6l2!
17 t j 6t j 6n t j 6t j 6n
u1t j6l 2
l40
1n 1t6t j 2l
Approximation (Taylor) 3 4
u1t2 l!
ul2 1t j 2
l40
where u1t j 2 denotes an input that has been sampled at time in-
2.2. Multi-rate Samplers
stant t j , where t j6n 5 4 4 4 5 t j 5 t. The primitive function
fl 1t2 generates the continuous signal u1t2, 3 which afterwards A multi-rate sampler is used to interface outputs of a system
is discretized at any desired sampling rate T to provide u3 k , sampled at different sampling rates. As a result, it generates a
with t 4 kT . In order to implement the multi-rate hold, a shift size-varying signal at a fast sampling rate that combines mea-
register 2 j 4 8u j 3 4 4 4 3 u j6n 9 is required to maintain the his- surements at different samplings. For a given time-instant, the
tory of the signal. Algorithm 1 implements the (discrete-time) multi-rate sampler appends to the measurement vector ysk the
multi-rate hold based on a general primitive function, where a measurement yi3k if and only if the sensor has performed a
constant base period T is assumed. If an input arrives during valid acquisition, thus yi3k
ysk , if yi3k is available (or sam-
sampling period T , lines 3 to 5 are executed, otherwise lines 7 pled), where i denotes the ith measurement.
to 10 perform an extrapolation process based on the registered Although this concept is not new, it is important in sensor
history of the signal 2 j , while j represents the time instant fusion techniques to coherently integrate measurements at dif-
of the most recent sample added to 2 j . Asynchronous holds, ferent sampling rates into the estimation.
with variable sampling periods, can also be found in Armesto Take Figure 3 as an example of periodic sampling with N 4
(2007). 6, where N is the periodicity ratio within the frame-period,
Table 1 summarizes some primitive functions that that can the sampling period where signals are periodically repeated.
be used in multi-rate holds. In particular, for Taylor holds, According to this, the resulting size-varying vector ysk is:
derivatives are computed using the backward approximation:
ysj N 4 [y13 j N 3 y23 j N ]T 3 ysj N 71 4 []3
un612 1t j 2 6 un612 1t j61 2
un2 1t j 2 4 (2)
t j 6 t j61 ysj N 72 4 [y23 j N 72 ]3 ysj N 73 4 [y13 j N 73 ]3
where n2 means the nth derivative. ysj N 74 4 [y23 j N 74 ]3 ysj N 75 4 []4 (3)
Armesto, Tornero, and Vincze / Fast Ego-motion Estimation with Multi-rate Fusion of Inertial and Vision 581
In a similar way, a Multi-rate Unscented Kalman filter can be ak . The orientation is represented with quaternions qk and an-
implemented as described in Algorithm 3. First the multi-rate gular velocities 1k . Previous results (Huster and Rock 20011
hold estimates the control input (line 2) and the multi-rate sam- Chroust and Vincze 2003) have shown an improvement if the
pler is implemented as before (lines 3 to 10). The Unscented biases of the acceleration measurements bk are included and
Transform is implemented in lines 11–22, prediction equations estimated on-line. The output vector yk is formed with the
measured accelerations am k and angular velocities 1 k from the
m
of the Unscented Kalman Filter are implemented in lines 23 m
to 30 and update equations are implemented in lines 32–43. inertial sensor and the Cartesian positions pk and quaternions
Again, if no measurement has been received, the prediction is qmk from the vision system, which have been obtained after the
taken as the best guess for the next iteration (lines 45 and 46). image processing procedure. Jerks jk , in the Cartesian coor-
dinates, angular accelerations 2k and velocity biases bk are
considered as the system noise. As usual, the output vector has
3. Motion model associated a measurement noise vector 3 k . In this case, the
system has no inputs, since command movements are assumed
The state of the tracking system is composed of position and to be unknown.
orientation variables. Position is described with Cartesian po-
sitions pk together with their velocities vk and accelerations xk 4 [pT vT aT bT qT 9
T ]kT (6)
582 THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / June 2007
where,
7 8
cos 12 k
7 8 3 4k 54 0
4k 1k
exp 4 sin 12 k 1 (16)
k
2
4 5
T
1 0 0 0 3 4k 4 0
qk (18)
T
1 k 7 T
2
2k
2 k 7 2 3k The proposed tracking system is tested with a predefined set of
rotational and translational movements at different speeds. In
T T
qk71 4 qk cos 1k 7 2k
particular, rotational movements consist of a sequence of turns
2 2 in roll , pitch and yaw angles:
9
1 1
N
Jp 4 1pre f 6 p22 3
N i40
1 1
N
Jq 4 1qre f 6 q22 4 (24)
N i40
!
1063 I33 3 1064 I33 3 "
Rk 4 diag (25)
1067 I 3 1066 I #
33 44
!
047447 I33 3 0438 I33 3 "
Qk 4 diag 4 (26)
0419 1066 I #
Fig. 8. Covariance influence on the performance index J p un- 33
der different translational movements with EKF.
Figures 11(a), 11(b), 11(c) and 11(d) show the estimation
results for rotational movements using EKF and UKF. Both
filters give nearly the same results for selected covariance val-
ues, and therefore their responses overlap.
In Figures 11(a) to 11(d), vision measurements are repre-
sented with dots1 large gaps between dots are due to failures
of the detection process of the vision system, which are much
more frequent with faster motions. Figure 10 shows the cap-
tured blurred image from the vision system on a detection fail-
ure. It can be appreciated that the number of detected features
is too low to provide an accurate pose estimation and therefore
the image is rejected. Despite this fact, the fusion integration of
inertial measurements aims to re-construct the correct motion.
Similarly, results for translational movements are depicted
in Figures 12(a), 12(b), 12(c) and 12(d), where the same con-
clusions as before can be derived.
According to these results, for this particular application,
EKF gives better performance than UKF, since both filters pro-
vide nearly the same estimation, but the computational cost of
UKF is about 7 times higher.
Finally, a study of the benefits of fusing vision and inertial
measurements is performed. In that sense, the performance in-
Fig. 9. Covariance influence on the performance index J p un- dex is also calculated for pure inertial or pure vision. Figure 13
der different translational movements with UKF. shows that the fusion of inertial and vision gives better perfor-
mance results than single estimations. Fusion introduces more
benefits to pure inertial than to pure vision estimation. This is
mainly due to the double integration performed on the inertial
measurements where bias correction could not be performed.
spect to rotational movements is that, in this case, vision is
much more crucial, since accelerations require a double inte-
gration to compute the pose, while angular velocities require 5. Conclusions
only a single integration to compute the orientation.
Based on the results obtained we have selected the follow- The tracking of fast movements is a difficult task, particularly
ing values for Rk and Qk common to all velocities: if it is performed with a pure vision or pure inertial system.
586 THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / June 2007
Fig. 11. Estimation results of rotational movements (continuous lines) and with vision measurements (dots).
This problem can be solved if both sensors are fused, since fusion for mobile robot self-localization and map building
they provide complementary properties. using an asynchronous multi-rate FastSLAM (Armesto et al
In this paper, we have presented a tracking system for 2007).
ego-motion estimation, which can recover vision failures dur- In addition to this, the paper investigates the influence of
ing fast motions. The fusion is performed by considering a the covariance matrices of noise. Conclusions from this analy-
EKF and UKF with multi-rate sampling of measurements. sis lead to determination of appropriate tuning values between
They have shown to be robust enough for this application and vision and inertial measurements. This aspect is crucial since
therefore they have been tested alongside other non-Gaussian it has a direct influence on the estimation performance. It has
filters. In both fusion techniques (EKF and UKF), each sensor been shown that a common set of covariance values exist that
is sampled at the highest frequency at which they can provide give good performance over a range of motion speeds. A set-up
measurements. The approach considered in this paper uses based on an industrial robot arm has been used to validate the
multi-rate holds and samplers to interface signals at differ- estimation. This set-up allows us to pre-define basic rotational
ent frequencies. Estimations with MR-EKF (multi-rate EKF) and translational motions, which can be combined to generate
and MR-UKF (multi-rate UKF) provided very similar results, complex motions.
without significant differences between them. The computa- Future research is oriented towards estimating even more
tional cost of UKF is about 7 times higher than the cost of complex motions such as natural human movements. In ad-
EKF computation. dition to this, other fusion techniques such as Particle Filters
This approach has not only been validated in Vision/Inertial will be tested as well as SLAM techniques to estimate the pose
fusion, but also has been recently validated in Laser/Encoder and the structure (map). In the context of the RESTAURO Re-
Armesto, Tornero, and Vincze / Fast Ego-motion Estimation with Multi-rate Fusion of Inertial and Vision 587
Fig. 12. Estimation results for translational movements (continuous lines) and with vision measurements (dots).
References
Armesto, L. and Tornero, J. (2004). SLAM based on Kalman Huster, A. and Rock, S. (2003). Relative position sensing
filter for multirate fusion of laser and encoder measure- by fusing monocular vision and inertial rate sensors. In-
ments. IEEE International Conference on Intelligent Ro- ternational Conference on Advanced Robotics, pp. 1562–
bots and Systems, pp. 1860–1865. 1567.
Armesto, L., Chroust, S., Vincze, M., and Tornero, J., (2004). Huster, A. and Rock, S. (2001). Relative position estimation
Multi-rate fusion with vision and inertial sensors. Interna- for interventioncapable AUVS by fusing vision and iner-
tional Conference on Robotics and Automation, pp. 193– tial measurements. International Symposium on Unmanned
199. Unthethered Submersible Technology.
Bradski, G. (2000). An open-source library for processing im- Jekeli, C. (2001). Inertial Navigation Systems with Geo-
age data. Dr Dobbs Journal, November. detic Applications. Walter de Gruyter. John Wiley & Sons,
Carpenter, J., Clifford, P., and Fernhead, P. (1997). An im- Canada.
proved particle filter for non-linear problems. Technical Re- Julier, S. and Uhlmann, J. (2002), Reduced sigma points filters
port, Department of Mathematics, Imperial College. for the propagation of means and covariances through non-
Chai, L., H., Hoff, W.A., and Vincent, T. (2002). Three- linear transformations. American Control Conference, Vol.
dimensional motion and structure estimation using inertial 2, pp. 887–892.
sensors and computer vision for augmented reality. Pres- Khargonekar, P., Poolla, K., and Tannenbaum, A. (1985). Ro-
ence: Teleoperators and Virtual Environments, 11(5): 474– bust control of linear time-invariant plants using periodic
492. compensation. IEEE Transactions on Automatic Control,
Chou, J. (1992). Quaternion kinematic and dynamic differen- AC-30: 1088–1985.
tial equations. IEEE Transactions on Robotics and Automa- Kranc, G. (1957). Input-output analysis of multirate feedback
tion, 8(1): 53–64. systems. IEEE Transactions on Automatic Control, AC-3:
Chroust, S. and Vincze, M. (2003). Fusion of vision and in- 21–28.
ertia data for motion and structure estimation. Journal of Lobo, J. and Dias, J. (1998). Integration of inertial information
Robotics Systems, 21(2): 73–83. with vision. IEEE Industrial Electronic Society, pp. 1263–
Dissanayake, M, Newman, P., Clark, S., Durrant-Whyte, H., 1267.
and Csorba, M. (2001). A solution to the simultaneous lo- Longhi, S. (1994). Structural properties of multirate sampled
calization and map building (SLAM) problem. IEEE Trans- systems. IEEE Transactions on Automatic Control, 39(3):
actions on Robotics and Automation, 17(3): 229–241. 692–696.
Doucet, A., de Freitas, N., Murphy, K., and Russell, S. (2000). Panerai, F., Metta, G., and Sandini, G. (2000). Visuo-
RaoBlackwellised particle filtering for dynamic Bayesian inertial stabilization in space-variant binocular sys-
networks. In Uncertainty in Arti2cial Intelligence. tems. Robotics and Autonomouns Systems, 30(1-2): 195–
Doucet, A., Gordon, N., and Krishnamurthy, V. (2001). Par- 214.
ticle filters for state estimation of jump Markov linear sys- Rehbinder, H. and Ghosh, B. (2001). Multi-rate fusion of vi-
tems. IEEE Transactions on Signal Processing, 49(3): 613– sual and inertial data. International Conference on Multi-
624. Sensor Fusion and Integration for Intelligent Systems, pp.
Gemeiner, P., Armesto, L., Montes, N., Tornero, J., Vincze, M., 97–102.
and Pinz, A. (2006). Visual tracking can recapture after fast Smith, A. and Gelfand, E. (1992). Bayesian statistics without
motions with the use of inertial sensors. Digital Imaging tears: A samplingresampling perspective. American Statis-
and Pattern Recognition, OAGM/AAPR, pp. 141–150. tician, 2: 84–88.
Goddard, J. and Abidi, M. (1998). Pose and motion estimation Tornero, J. (1985). Non-conventional sampled-data systems
using dual quaternion-based extended Kalman filtering. modelling. University of Manchester (UMIST), Control
SPIE, 3313: 189–200. System Center Report, 640/1985.
Goodwin, G. and Feuer, A. (1992), Linear periodic control: A Tornero, J., Albertos, P., and Salt J. (1999). Periodic opti-
frequency domain viewpoint. Systems and Control Letters, mal control of multirate sampled data systems. 14th World
19: 379–390. Congress of IFAC, China, pp. 211–216.
Gordon, N., Salmond, D., and Smith, A. (1993). Novel aproach Tornero, J., Gu, Y., and Tomizuka, M. (1999). Analysis
to nonlinear/non-gaussian Bayesian state estimation. IEE- of multi-rate discrete equivalent of continuous controller.
Proceedings-F, 140(2): 107–113. American Control Conference, pp. 2759–2763.
Grewal, M. S., Weill, L. R., and Andrews, A. P. (2001). Global Tornero, J. and Tomizuka, M. (2000). Dual-rate high order
Positioning Systems, Inertial Navigation, and Integration. hold equivalent controllers. American Control Conference,
John Wiley and Sons, Canada. pp. 175–179.
Gurfil, P and Kasdin, N. (2002). Two-step optimal estima- Tornero, J. and Tomizuka, M. (2002). Modeling, analysis and
tor for three dimensional target tracking. American Control design tools for dual-rate systems. American Control Con-
Conference, pp. 209–214. ference, pp. 4116–4121.
Armesto, Tornero, and Vincze / Fast Ego-motion Estimation with Multi-rate Fusion of Inertial and Vision 589
Ude, A. (1999). Filtering in a unit quaternion space for model- for robust model-based object tracking. International Jour-
based object tracking. IEEE Transactions Robotics and Au- nal of Robotics Research, 7(20): 533–552.
tonomous Systems, 28(2–3): 163–172. Zhang, Z. (1998). A flexible new technique for camera cal-
Vincze, M., Ayromlou, M., Ponweiser, W., and Zillich, M. ibration. Technical Report, Microsoft Research, Microsoft
(2001). Edge projected integration of image and model cues Corporation.